0
|
1 <tool id="MAF_To_Fasta1" name="MAF to FASTA" version="1.0.1">
|
|
2 <description>Converts a MAF formatted file to FASTA format</description>
|
|
3 <command interpreter="python">
|
|
4 #if $fasta_target_type.fasta_type == "multiple" #maf_to_fasta_multiple_sets.py $input1 $out_file1 $fasta_target_type.species $fasta_target_type.complete_blocks
|
|
5 #else #maf_to_fasta_concat.py $fasta_target_type.species $input1 $out_file1
|
|
6 #end if#
|
|
7 </command>
|
|
8 <inputs>
|
|
9 <param format="maf" name="input1" type="data" label="MAF file to convert"/>
|
|
10 <conditional name="fasta_target_type">
|
|
11 <param name="fasta_type" type="select" label="Type of FASTA Output">
|
|
12 <option value="multiple" selected="true">Multiple Blocks</option>
|
|
13 <option value="concatenated">One Sequence per Species</option>
|
|
14 </param>
|
|
15 <when value="multiple">
|
|
16 <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="checked taxa will be included in the output">
|
|
17 <options>
|
|
18 <filter type="data_meta" ref="input1" key="species" />
|
|
19 </options>
|
|
20 </param>
|
|
21 <param name="complete_blocks" type="select" label="Choose to">
|
|
22 <option value="partial_allowed">include blocks with missing species</option>
|
|
23 <option value="partial_disallowed">exclude blocks with missing species</option>
|
|
24 </param>
|
|
25 </when>
|
|
26 <when value="concatenated">
|
|
27 <param name="species" type="select" label="Species to extract" display="checkboxes" multiple="true">
|
|
28 <options>
|
|
29 <filter type="data_meta" ref="input1" key="species" />
|
|
30 </options>
|
|
31 </param>
|
|
32 </when>
|
|
33 </conditional>
|
|
34 </inputs>
|
|
35 <outputs>
|
|
36 <data format="fasta" name="out_file1" />
|
|
37 </outputs>
|
|
38 <tests>
|
|
39 <test>
|
|
40 <param name="input1" value="3.maf" ftype="maf"/>
|
|
41 <param name="fasta_type" value="concatenated"/>
|
|
42 <param name="species" value="canFam1"/>
|
|
43 <output name="out_file1" file="cf_maf2fasta_concat.dat" ftype="fasta"/>
|
|
44 </test>
|
|
45 <test>
|
|
46 <param name="input1" value="4.maf" ftype="maf"/>
|
|
47 <param name="fasta_type" value="multiple"/>
|
|
48 <param name="species" value="hg17,panTro1,rheMac2,rn3,mm7,canFam2,bosTau2,dasNov1"/>
|
|
49 <param name="complete_blocks" value="partial_allowed"/>
|
|
50 <output name="out_file1" file="cf_maf2fasta_new.dat" ftype="fasta"/>
|
|
51 </test>
|
|
52 </tests>
|
|
53 <help>
|
|
54
|
|
55 **Types of MAF to FASTA conversion**
|
|
56
|
|
57 * **Multiple Blocks** converts a single MAF block to a single FASTA block. For example, if you have 6 MAF blocks, they will be converted to 6 FASTA blocks.
|
|
58 * **One Sequence per Species** converts MAF blocks to a single aggregated FASTA block. For example, if you have 6 MAF blocks, they will be converted and concatenated into a single FASTA block.
|
|
59
|
|
60 -------
|
|
61
|
|
62 **What it does**
|
|
63
|
|
64 This tool converts MAF blocks to FASTA format and concatenates them into a single FASTA block or outputs multiple FASTA blocks separated by empty lines.
|
|
65
|
|
66 The interface for this tool contains two pages (steps):
|
|
67
|
|
68 * **Step 1 of 2**. Choose multiple alignments from history to be converted to FASTA format.
|
|
69 * **Step 2 of 2**. Choose the type of output as well as the species from the alignment to be included in the output.
|
|
70
|
|
71 Multiple Block output has additional options:
|
|
72
|
|
73 * **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (all species are selected by default).
|
|
74 * **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then such a block **will not** be included in the output (see **Example 2** below). For example, if you want to extract human, mouse, and rat from a series of alignments and one of the blocks does not contain mouse sequence, then this block will not be converted to FASTA and will not be returned.
|
|
75
|
|
76
|
|
77 -----
|
|
78
|
|
79 **Example 1**:
|
|
80
|
|
81 In the concatenated approach, the following alignment::
|
|
82
|
|
83 ##maf version=1
|
|
84 a score=68686.000000
|
|
85 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
86 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
87 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
88 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
|
89 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
|
90
|
|
91 a score=10289.000000
|
|
92 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
93 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
94 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
95
|
|
96 will be converted to (**note** that because mm8 (mouse) and canFam2 (dog) are absent from the second block, they are replaced with gaps after concatenation)::
|
|
97
|
|
98 >canFam2
|
|
99 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C-------------------------------------
|
|
100 >hg18
|
|
101 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
102 >mm8
|
|
103 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC--------------------------------------------
|
|
104 >panTro2
|
|
105 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
106 >rheMac2
|
|
107 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
108
|
|
109 ------
|
|
110
|
|
111 **Example 2a**: Multiple Block Approach **Include all species** and **include blocks with missing species**:
|
|
112
|
|
113 The following alignment::
|
|
114
|
|
115 ##maf version=1
|
|
116 a score=68686.000000
|
|
117 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
118 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
119 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
120 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
|
121 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
|
122
|
|
123 a score=10289.000000
|
|
124 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
125 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
126 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
127
|
|
128 will be converted to::
|
|
129
|
|
130 >hg18.chr20(+):56827368-56827443|hg18_0
|
|
131 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
132 >panTro2.chr20(+):56528685-56528760|panTro2_0
|
|
133 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
134 >rheMac2.chr10(-):89144112-89144181|rheMac2_0
|
|
135 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
136 >mm8.chr2(+):173910832-173910893|mm8_0
|
|
137 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
|
138 >canFam2.chr24(+):46551822-46551889|canFam2_0
|
|
139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
|
140
|
|
141 >hg18.chr20(+):56827443-56827480|hg18_1
|
|
142 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
143 >panTro2.chr20(+):56528760-56528797|panTro2_1
|
|
144 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
145 >rheMac2.chr10(-):89144181-89144218|rheMac2_1
|
|
146 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
147
|
|
148 -----
|
|
149
|
|
150 **Example 2b**: Multiple Block Approach **Include hg18 and mm8** and **exclude blocks with missing species**:
|
|
151
|
|
152 The following alignment::
|
|
153
|
|
154 ##maf version=1
|
|
155 a score=68686.000000
|
|
156 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
157 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
|
|
158 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
|
|
159 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
|
|
160 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
|
|
161
|
|
162 a score=10289.000000
|
|
163 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
164 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
165 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
|
|
166
|
|
167 will be converted to (**note** that the second MAF block, which does not have mm8, is not included in the output)::
|
|
168
|
|
169 >hg18.chr20(+):56827368-56827443|hg18_0
|
|
170 GACAGGGTGCATCTGGGAGGGCCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC
|
|
171 >mm8.chr2(+):173910832-173910893|mm8_0
|
|
172 AGAAGGATCCACCT---------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC------
|
|
173
|
|
174 ------
|
|
175
|
|
176 .. class:: infomark
|
|
177
|
|
178 **About formats**
|
|
179
|
|
180 **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes.
|
|
181
|
|
182 - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
|
|
183 - Each sequence in an alignment is on a single line.
|
|
184 - Lines starting with # are considered to be comments.
|
|
185 - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
|
|
186 - Some MAF files may contain two optional line types:
|
|
187
|
|
188 - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line;
|
|
189 - An "e" line containing information about the size of the gap between the alignments that span the current block.
|
|
190
|
|
191 ------
|
|
192
|
|
193 **Citation**
|
|
194
|
|
195 If you use this tool, please cite `Blankenberg D, Taylor J, Nekrutenko A; The Galaxy Team. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011 Sep 1;27(17):2426-2428. <http://www.ncbi.nlm.nih.gov/pubmed/21775304>`_
|
|
196
|
|
197
|
|
198 </help>
|
|
199 </tool>
|