annotate Mafft/readme.txt @ 0:e4d26cd8be10 draft default tip

Uploaded
author basfplant
date Tue, 05 Mar 2013 04:01:17 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
1 Installation
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
2 ------------
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
3
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
4 1) Download the Mafft software from http://mafft.cbrc.jp/alignment/software/
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
5 (see license terms: http://mafft.cbrc.jp/alignment/software/license.txt)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
6
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
7 2) Install the Mafft application as standalone on your Galaxy computer. The installation procedure as root (http://mafft.cbrc.jp/alignment/software/source.html) or as non-root user (http://mafft.cbrc.jp/alignment/software/installation_without_root.html) can be found on the Mafft website.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
8
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
9 3) Add the wrapper file "mafft.xml" to Galaxy (reference: http://wiki.g2.bx.psu.edu/Admin/Tools/Add%20Tool%20Tutorial).
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
10
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
11 - Make a directory "mafft" under {GALAXY_ROOT_DIR}/tools and copy the wrapper file "mafft.xml" to this directory.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
12 - Make GALAXY aware of the new tool: GALAXY knows about installed tools (and also what to display on the left pane) from the file {GALAXY_ROOT_DIR}/tool_conf.xml
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
13 Use a text editor to add a line for the mafft.xml wrapper to e.g.the Multiple Alignments section.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
14
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
15 <label text="My Tools" id="My tools" />
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
16 <section name="Multiple Alignments" id="multiple_alignments" >
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
17 <tool file="mafft/mafft.xml" />
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
18 </section>
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
19
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
20 - start up GALAXY again, open it in the web browser and test
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
21
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
22
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
23 MAFFT functionality
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
24 -------------------
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
25
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
26 MAFFT is a multiple sequence alignment program for proteins and nucleotides using fast fourier transform.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
27
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
28 If no advanced options are selected, the following default paramters will be used:
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
29 - for proteins: mafft-FFT-NS-2 method (Fast, progressive method), BLOSUM62 substitution matrix, gap opening penalty 1.53 and offset value 0.00
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
30 - for nucleic acids: mafft-FFT-NS-2 method (Fast, progressive method), 200PAM/kappa=2 substitution matrix, gap opening penalty 1.53 and offset value 0.00
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
31
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
32 MAFFT offers a range of multiple alignment methods, classified into three types, (a) the progressive method, (b) the iterative refinement method with the WSP score, and (c) the iterative refinment method using both the WSP and consistency scores. In general, there is a tradeoff between speed and accuracy. The order of speed is a > b > c, whereas the order of accuracy is a < b < c.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
33 - Auto (FFT-NS-1, FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size) (a,b or c)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
34 - FFT-NS-1 (Very fast, recommended for > 2.000 sequences; progressive method) (a)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
35 - FFT-NS-2 (Fast, progressive method) (DEFAULT if no advanced options) (a)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
36 - medium (Iterative refinement method, two cycles only) (b)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
37 - FFT-NS-i (Slow, iterative refinement method) (b)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
38 - E-INS-I (Very slow, recommended for < 2 sequences with multiple conserved domains and long gaps) (c)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
39 - L-INS-I (Very slow, recommended for < 200 sequences whith one conserved domain and long gaps) (c)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
40 - G-INS-I (Very slow, recommended for < 200 sequences with global homology) (c)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
41 - NW-NS-PartTree-1 (recommended for ~10,000 to ~50,000 sequences; progressive method with the PartTree algorithm) (a)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
42
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
43 For nucleotides only, there are still additional alignment methods:
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
44 - Q-INS-I (Extremely slow; secondary structure of RNA is considered; recommended for a global alignment of highly divergent ncRNAs with < 200 sequences, < 1.000 nucleotides)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
45 - X-INS-I (Applicable to up to ~50 sequences to ~1,000 nucleotides. Multiple structural alignment by combining pairwise structural alignments given by an external program.)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
46
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
47 Depending on the nature of the sequences in the input file, the advanced options change. When "protein input" is selected from first drop down list, BLOSUM or JTT substitution matrices can be chosen. The selection "nucleic acid input" only offers substitution matrices of the type PAM / kappa = x. For nucleic acids, two extra strategies are available compared to proteins, namely X-INS-i and Q-INS-i.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
48
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
49
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
50 Documentation
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
51 -------------
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
52
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
53 Mafft website http://mafft.cbrc.jp/alignment/software/
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
54 Manpages of Mafft at http://mafft.cbrc.jp/alignment/software/manual/manual.html
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
55
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
56 More information about the algorithms can be found at http://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html#GLE.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
57
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
58
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
59 Author and affiliation
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
60 ----------------------
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
61
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
62 Katrien Bernaerts and Domantas Motiejunas
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
63 corresponding author: gb-ctk-open-source-support@basf.com
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
64 21/06/2012
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
65
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
66 CropDesign N.V., a BASF Plant Science Company - Technologiepark 3, 9052 Zwijnaarde - Belgium
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
67
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
68
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
69 Terms of use
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
70 --------------------------
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
71 Galaxy wrapper for Mafft – multiple aligment tool - Copyright (C) 2012 CropDesign N.V. - this software may be used, copied and redistributed, with or without modification freely, without advance permission, provided that the above Copyright statement is reproduced with each copy.
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
72 THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE (INCLUDING NEGLIGENCE OR OTHERWISE).
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
73
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
74
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
75 Citation
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
76 --------
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
77
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
78 - Katoh, Toh 2010 (Bioinformatics 26:1899-1900). Parallelization of the MAFFT multiple sequence alignment program.(describes the multithread version; Linux only)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
79 - Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64). Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada (outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
80 - Katoh, Toh 2008 (BMC Bioinformatics 9:212). Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. (describes RNA structural alignment methods)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
81 - Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298). Recent developments in the MAFFT multiple sequence alignment program. (outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
82 - Katoh, Toh 2007 (Bioinformatics 23:372-374) Errata. PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. (describes the PartTree algorithm)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
83 - Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518). MAFFT version 5: improvement in accuracy of multiple sequence alignment. (describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
84 - Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. (describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies)
e4d26cd8be10 Uploaded
basfplant
parents:
diff changeset
85