SDT_Linux32 (Sequence Demarcation Tool - Linux 32 bit) ===================================================== SDT_Linux32 is a free Python version of SDTwhich runs on Linux 32 bit operating systems. Given a FASTA file containing DNA sequences, the program aligns all possible pairs of sequences using MUSCLE (Edgar, 2004), ClustalW2 (Larkin et al., 2007) or MAFFT (Katoh et al., 2009), calculates the sequence identity score for each pair and uses a rooted neighbour joining phylogenetic tree to cluster closely related sequences based on identity scores. It outputs a ".sdt" file that can be opened with SDTv.1.0 for Windows version so as to visualise the plot and identity matrix images and save the outputs in various graphical and numerical formats. It also writes pairwise identity scores (arranged in matrix and column formats) to a flat text file. The identity scores are calculated as 1-(M/N), where M is the number of mismatching nucleotides and N is the total number of positions along the alignment where neither sequence has a gap character. DOWNLOAD AND INSTALLATION ========================= 1. Software requirements: The program requires the following to be installed: - Python2.7 2. Download SDT_Linux32 from http://web.cbio.uct.ac.za/SDT 3. Extract the SDT_Linux32.tar.gz file into the location you want to place the program. The folder contains: - The bin directory which contains the executable files "muscle3.8.31_i86linux32", "clustalw2" and "neighbor". - The Bio directory which contains the Biopython library. - The output directory in which the output files after each run are stored. - SDT_Linux32.py which is the main program script. - test.fas a sample FASTA file that can be used to test the program. 4. Please change the mode of "muscle3.8.31_i86linux32", "clustalw2" and "neighbor" in the bin directory to "executable". 5. Running commands: python SDT_Linux32.py test.fas muscle This will result in the use of MUSCLE as the alignment program. Replace "muscle" by "clustal" or "mafft" to change the alignment program that is used. Before using MAFFT, please install it on your computer, and in the main script change the MAFFT_PATH="/xxx/mafft" to where its executable file is located. When the pairwise alignments and identity score calculations are completed the scores will be written into (1) two text files that will be named after the input FASTA file and saved in the output folder and (2) a ".sdt" file which can be opened by SDTv.1.0 for Windows as to visualise the pairwise identity distribution plot and matrix data. --------------------------------------------------------------------------------------------------------------------------------------------------------------- References ---------- 1. Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32, 1792-1797. 2. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0., Bioinformatics 23, 2947-2948 3. Katoh K, Asimenos G, Toh H. (2009) Multiple Alignment of DNA Sequences with MAFFT, Methods in Molecular Biology 537:39-64 4. Felsenstein, J. (1995)PHYLIP (Phylogeny Inference Package) Version 3.57c, available at http://www.med.nyu.edu/rcr/rcr/phylip/main.html#refs Authors ------- Brejnev Muhire [1] Darren Martin [1] Arvind Varsani [2] [1] Institute of Infectious Diseases and Molecular Medicine (IIDMM) Computational Biology Group, University of Cape Town South Africa [2] School of Biological Sciences University of Canterbury Private Bag 4800 Christchurch, 8140 New Zealand BM is funded by the University of Cape Town website: http://web.cbio.uct.ac.za/SDT email: mhrbre001@myuct.ac.za email: mubrejnev@gmail.com