NAME

nasp - NASP (Nucleic Acid Structures Predictor)


SYNOPSIS

nasp [OPTIONS] FILE


DESCRIPTION

NASP takes as input a nucleotide sequence alignment and determines:

(1) The over-all probability that the sequences within the alignment possess a degree of secondary structure that cannot be accounted for by chance;

(2) the list of most probable evolutionarily conserved secondary structures within the sequences.


EXAMPLES

Compute only consensus structures:

nasp alignment.fas

Compute consensus structures and p-values:

nasp -P -it 100 -th 0.05 alignment.fas

Using multiple processors (e.g. 4) in an MPI environment (e.g. OpenMPI):

mpirun -np 4 nasp -P alignment.fas


OPTIONS

Usage: nasp [OPTIONS] <file>

  -h, --help             Print complete help and exit.
  --version              Print version info and exit.
  -mpi, --mpi            Flag to force the use of parallel computation. This assumes that MPI libraries 
                         are installed.
                         While NASP tries to find out whether it is running in an MPI environment or in 
                         serial mode, it might not identify some MPI implementations. This flag enables 
                         the user to enforce the use of MPI in such cases. It is recommended to use it 
                         whenever running NASP in an MPI environment.
  -n, --seqtype          Sequence type: RNA | DNA (case insensitive).
                         The nucleic acid sequence type. Tells the program whether sequences should be 
                         considered as either RNA or DNA, the default is set to RNA. The valid 
                         nucleotides are A, C, G, T and U (case are irrelevant), any other symbol will 
                         be ignored.
                         This option is used in the call to hybrid-ss or hybrid-ss-min from UNAfold.
                         Default is: <RNA>
  -t, --temperature      Folding temperature -- in degree Celsius -- in range [0,100].
                         This option is used in the call to hybrid-ss or hybrid-ss-min from UNAfold.
                         Default is: <37>
  -C, --circular         Flag that specifies that the sequence is circular, as opposed to linear 
                         (default).
                         This option is used in the call to hybrid-ss or hybrid-ss-min from UNAfold.
  -N, --Na               Sodium ion concentration (in mol/L) in range [0,20] (e.g. the concentration of 
                         sodium in human blood plasma is ~ 0.3 mol/L).
                         This option is used in the call to hybrid-ss or hybrid-ss-min from UNAfold.
                         Default is: <1>
  -M, --Mg               Magnesium ion concentration (in mol/L) in range [0,20] (e.g. the concentration 
                         of magnesium in human cells is ~ 0.004 mol/L).
                         This option is used in the call to hybrid-ss or hybrid-ss-min from UNAfold.
                         Default is: <0>
  -P0, --P0              Base-pairing probability threshold in range (0,1].
                         It is the probability value above which a base-pairs is accepted.
                         This value is only used for sequences smaller than 4000
                         Default is: <0.0001>
  -P, --pvalue           Flag that enables the computation of p-values by randomly shuffling the 
                         alignment.
                         The p-values indicate the probability that no additional unaccounted for 
                         structures remain within the analysed sequencesNote that this significantly 
                         increases nasp's running time.
                         The shuffling strategy may be defined by the option --shuffle (-sh).
  -th, --threshold       P-value threshold in range [0,1]. It determines the probability level at which 
                         ends the iterative enrichment for evolutionarily conserved secondary 
                         structures.
                         This option is only used when the flag --pvalue (-P) is also specified.
                         Default is: <0.05>
  -it, --iterations      Number of permutations to perform to compute the p-values.
                         This option is only used when the flag --pvalue (-P) is also specified.
                         Default is: <100>
  -sh, --shuffle         Indicates the alignment column shuffling strategy to be carried out. Possible 
                         strategies are: 1=mono-nucleotide shuffling | 2=di-nucleotide shuffling.
                         This option is only used when the flag --pvalue (-P) is also specified.
                         Default is: <1>
  -split, --split-length Minimum length of the sub-sequences used when computing the p-values.
                         Special values are: -1=do not split | 0=automatic.
                         If --split-length >= 0, then the MFE computation for shuffled sequences is 
                         performed indirectly by splitting them into sub-sequences. The splits are 
                         chosen such that existing structures are not disrupted. The MFEs of the 
                         sub-sequences are computed separately and summed-up to give the MFE of the 
                         complete sequence.
                         Default is: <0>
  -s, --max-structure    Maximum number of structures to consider.
                         Default is: <1000>
  -min, --min-length     The minimum length allowed for a sub-structure (e.g. if set to an integer 
                         greater than 1, it means avoiding isolated base-pairs).
                         Default is: <2>
  -lo, --min-loop        Minimum length of a loop for a struture to be considered.
                         Default is: <3>
  -d, --dvalue           Adjustment depth 'd' to apply to the consensus matrix.
                         All nucleotides falling within 'd' bases up-stream or down-stream of a given 
                         nucleotide are considered to be its potential homologues.
                         Default is: <0>
  -mat, --matrix-only    Flag to only compute the consensus matrix.
  -v, --verbose          Verbosity level. Use value 99 for maximum output (debugging).
                         Default is: <1>
  --restore              Flag to restore previous workspace. Useful to recover from errors and debug 
                         (EXPERIMENTAL)


EXIT STATUS

nasp returns a zero exist status if it succeeds. Non zero is returned in case of failure.


AUTHORS

Jean-Yves Semegni (initial version), Renaud Gaujoux (refactored and optimized version)


REFERENCES

1. J. Y. Semegni; M. Wamalwa; R. Gaujoux; G. W. Harkins; A. Gray; D. P. Martin (2011). NASP: A parallel program for identifying evolutionarily conserved nucleic acid secondary structures from sequence alignments. Bioinformatics; 2011 27(17): 2443-2445. doi:10.1093/bioinformatics/btr417 http://bioinformatics.oxfordjournals.org/content/27/17/2443.

NASP intensively uses UNAFold, please also cite the following reference when using NASP:

2. Markham,N.R. and Zuker,M. (2008). UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol., 453:3-31.


VERSION INFORMATION

This manpage describes nasp 1.5


SEE ALSO

mpirun(1), hybrid-ss(1), hybrid-ss-min(1)


LICENSE

Copyright (C) 2011 Yves Semegni, Renaud Gaujoux

NASP is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

NASP is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with NASP. If not, see <http://www.gnu.org/licenses/>