comparison Mafft/readme.txt @ 0:e4d26cd8be10 draft default tip

Uploaded
author basfplant
date Tue, 05 Mar 2013 04:01:17 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e4d26cd8be10
1 Installation
2 ------------
3
4 1) Download the Mafft software from http://mafft.cbrc.jp/alignment/software/
5 (see license terms: http://mafft.cbrc.jp/alignment/software/license.txt)
6
7 2) Install the Mafft application as standalone on your Galaxy computer. The installation procedure as root (http://mafft.cbrc.jp/alignment/software/source.html) or as non-root user (http://mafft.cbrc.jp/alignment/software/installation_without_root.html) can be found on the Mafft website.
8
9 3) Add the wrapper file "mafft.xml" to Galaxy (reference: http://wiki.g2.bx.psu.edu/Admin/Tools/Add%20Tool%20Tutorial).
10
11 - Make a directory "mafft" under {GALAXY_ROOT_DIR}/tools and copy the wrapper file "mafft.xml" to this directory.
12 - Make GALAXY aware of the new tool: GALAXY knows about installed tools (and also what to display on the left pane) from the file {GALAXY_ROOT_DIR}/tool_conf.xml
13 Use a text editor to add a line for the mafft.xml wrapper to e.g.the Multiple Alignments section.
14
15 <label text="My Tools" id="My tools" />
16 <section name="Multiple Alignments" id="multiple_alignments" >
17 <tool file="mafft/mafft.xml" />
18 </section>
19
20 - start up GALAXY again, open it in the web browser and test
21
22
23 MAFFT functionality
24 -------------------
25
26 MAFFT is a multiple sequence alignment program for proteins and nucleotides using fast fourier transform.
27
28 If no advanced options are selected, the following default paramters will be used:
29 - for proteins: mafft-FFT-NS-2 method (Fast, progressive method), BLOSUM62 substitution matrix, gap opening penalty 1.53 and offset value 0.00
30 - for nucleic acids: mafft-FFT-NS-2 method (Fast, progressive method), 200PAM/kappa=2 substitution matrix, gap opening penalty 1.53 and offset value 0.00
31
32 MAFFT offers a range of multiple alignment methods, classified into three types, (a) the progressive method, (b) the iterative refinement method with the WSP score, and (c) the iterative refinment method using both the WSP and consistency scores. In general, there is a tradeoff between speed and accuracy. The order of speed is a > b > c, whereas the order of accuracy is a < b < c.
33 - Auto (FFT-NS-1, FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size) (a,b or c)
34 - FFT-NS-1 (Very fast, recommended for > 2.000 sequences; progressive method) (a)
35 - FFT-NS-2 (Fast, progressive method) (DEFAULT if no advanced options) (a)
36 - medium (Iterative refinement method, two cycles only) (b)
37 - FFT-NS-i (Slow, iterative refinement method) (b)
38 - E-INS-I (Very slow, recommended for < 2 sequences with multiple conserved domains and long gaps) (c)
39 - L-INS-I (Very slow, recommended for < 200 sequences whith one conserved domain and long gaps) (c)
40 - G-INS-I (Very slow, recommended for < 200 sequences with global homology) (c)
41 - NW-NS-PartTree-1 (recommended for ~10,000 to ~50,000 sequences; progressive method with the PartTree algorithm) (a)
42
43 For nucleotides only, there are still additional alignment methods:
44 - Q-INS-I (Extremely slow; secondary structure of RNA is considered; recommended for a global alignment of highly divergent ncRNAs with < 200 sequences, < 1.000 nucleotides)
45 - X-INS-I (Applicable to up to ~50 sequences to ~1,000 nucleotides. Multiple structural alignment by combining pairwise structural alignments given by an external program.)
46
47 Depending on the nature of the sequences in the input file, the advanced options change. When "protein input" is selected from first drop down list, BLOSUM or JTT substitution matrices can be chosen. The selection "nucleic acid input" only offers substitution matrices of the type PAM / kappa = x. For nucleic acids, two extra strategies are available compared to proteins, namely X-INS-i and Q-INS-i.
48
49
50 Documentation
51 -------------
52
53 Mafft website http://mafft.cbrc.jp/alignment/software/
54 Manpages of Mafft at http://mafft.cbrc.jp/alignment/software/manual/manual.html
55
56 More information about the algorithms can be found at http://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html#GLE.
57
58
59 Author and affiliation
60 ----------------------
61
62 Katrien Bernaerts and Domantas Motiejunas
63 corresponding author: gb-ctk-open-source-support@basf.com
64 21/06/2012
65
66 CropDesign N.V., a BASF Plant Science Company - Technologiepark 3, 9052 Zwijnaarde - Belgium
67
68
69 Terms of use
70 --------------------------
71 Galaxy wrapper for Mafft – multiple aligment tool - Copyright (C) 2012 CropDesign N.V. - this software may be used, copied and redistributed, with or without modification freely, without advance permission, provided that the above Copyright statement is reproduced with each copy.
72 THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE (INCLUDING NEGLIGENCE OR OTHERWISE).
73
74
75 Citation
76 --------
77
78 - Katoh, Toh 2010 (Bioinformatics 26:1899-1900). Parallelization of the MAFFT multiple sequence alignment program.(describes the multithread version; Linux only)
79 - Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64). Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada (outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences)
80 - Katoh, Toh 2008 (BMC Bioinformatics 9:212). Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. (describes RNA structural alignment methods)
81 - Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298). Recent developments in the MAFFT multiple sequence alignment program. (outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch)
82 - Katoh, Toh 2007 (Bioinformatics 23:372-374) Errata. PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. (describes the PartTree algorithm)
83 - Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518). MAFFT version 5: improvement in accuracy of multiple sequence alignment. (describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies)
84 - Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. (describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies)
85