annotate tools/maf/maf_to_bed.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
1 <tool id="MAF_To_BED1" name="Maf to BED" force_history_refresh="True">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
2 <description>Converts a MAF formatted file to the BED format</description>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
3 <command interpreter="python">maf_to_bed.py $input1 $out_file1 $species $complete_blocks $__new_file_path__</command>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
4 <inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
5 <param format="maf" name="input1" type="data" label="MAF file to convert"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
6 <param name="species" type="select" label="Select species" display="checkboxes" multiple="true" help="a separate history item will be created for each checked species">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
7 <options>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
8 <filter type="data_meta" ref="input1" key="species" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
9 </options>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
10 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
11 <param name="complete_blocks" type="select" label="Exclude blocks which have a requested species missing">
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
12 <option value="partial_allowed">include blocks with missing species</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
13 <option value="partial_disallowed">exclude blocks with missing species</option>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
14 </param>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
15 </inputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
16 <outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
17 <data format="bed" name="out_file1" />
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
18 </outputs>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
19 <tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
20 <test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
21 <param name="input1" value="4.maf"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
22 <param name="species" value="hg17"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
23 <param name="complete_blocks" value="partial_disallowed"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
24 <output name="out_file1" file="cf_maf_to_bed.dat"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
25 </test>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
26 </tests>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
27 <help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
28
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
29 **What it does**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
30
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
31 This tool converts every MAF block to an interval line (in BED format; scroll down for description of MAF and BED formats) describing position of that alignment block within a corresponding genome.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
32
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
33 The interface for this tool contains two pages (steps):
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
34
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
35 * **Step 1 of 2**. Choose multiple alignments from history to be converted to BED format.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
36 * **Step 2 of 2**. Choose species from the alignment to be included in the output and specify how to deal with alignment blocks that lack one or more species:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
37
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
38 * **Choose species** - the tool reads the alignment provided during Step 1 and generates a list of species contained within that alignment. Using checkboxes you can specify taxa to be included in the output (only reference genome, shown in **bold**, is selected by default). If you select more than one species, then more than one history item will be created.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
39 * **Choose to include/exclude blocks with missing species** - if an alignment block does not contain any one of the species you selected within **Choose species** menu and this option is set to **exclude blocks with missing species**, then coordinates of such a block **will not** be included in the output (see **Example 2** below).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
40
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
41
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
42 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
43
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
44 **Example 1**: **Include only reference genome** (hg18 in this case) and **include blocks with missing species**:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
45
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
46 For the following alignment::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
47
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
48 ##maf version=1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
49 a score=68686.000000
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
50 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
51 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
52 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
53 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
54 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
55
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
56 a score=10289.000000
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
57 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
58 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
59 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
60
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
61 the tool will create **a single** history item containing the following (**note** that field 4 is added to the output and is numbered iteratively: hg18_0, hg18_1 etc.)::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
62
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
63 chr20 56827368 56827443 hg18_0 0 +
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
64 chr20 56827443 56827480 hg18_1 0 +
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
65
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
66 -----
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
67
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
68 **Example 2**: **Include hg18 and mm8** and **exclude blocks with missing species**:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
69
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
70 For the following alignment::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
71
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
72 ##maf version=1
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
73 a score=68686.000000
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
74 s hg18.chr20 56827368 75 + 62435964 GACAGGGTGCATCTGGGAGGG---CCTGCCGGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
75 s panTro2.chr20 56528685 75 + 62293572 GACAGGGTGCATCTGAGAGGG---CCTGCCAGGCCTTTA-TTCAACACTAGATACGCCCCATCTCCAATTCTAATGGAC-
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
76 s rheMac2.chr10 89144112 69 - 94855758 GACAGGGTGCATCTGAGAGGG---CCTGCTGGGCCTTTG-TTCAAAACTAGATATGCCCCAACTCCAATTCTA-------
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
77 s mm8.chr2 173910832 61 + 181976762 AGAAGGATCCACCT------------TGCTGGGCCTCTGCTCCAGCAAGACCCACCTCCCAACTCAAATGCCC-------
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
78 s canFam2.chr24 46551822 67 + 50763139 CG------GCGTCTGTAAGGGGCCACCGCCCGGCCTGTG-CTCAAAGCTACAAATGACTCAACTCCCAACCGA------C
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
79
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
80 a score=10289.000000
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
81 s hg18.chr20 56827443 37 + 62435964 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
82 s panTro2.chr20 56528760 37 + 62293572 ATGTGCAGAAAATGTGATACAGAAACCTGCAGAGCAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
83 s rheMac2.chr10 89144181 37 - 94855758 ATGTGCGGAAAATGTGATACAGAAACCTGCAGAGCAG
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
84
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
85 the tool will create **two** history items (one for hg18 and one fopr mm8) containing the following (**note** that both history items contain only one line describing the first alignment block. The second MAF block is not included in the output because it does not contain mm8):
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
86
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
87 History item **1** (for hg18)::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
88
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
89 chr20 56827368 56827443 hg18_0 0 +
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
90
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
91 History item **2** (for mm8)::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
92
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
93 chr2 173910832 173910893 mm8_0 0 +
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
94
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
95 -------
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
96
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
97 .. class:: infomark
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
98
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
99 **About formats**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
100
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
101 **MAF format** multiple alignment format file. This format stores multiple alignments at the DNA level between entire genomes.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
102
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
103 - The .maf format is line-oriented. Each multiple alignment ends with a blank line.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
104 - Each sequence in an alignment is on a single line.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
105 - Lines starting with # are considered to be comments.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
106 - Each multiple alignment is in a separate paragraph that begins with an "a" line and contains an "s" line for each sequence in the multiple alignment.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
107 - Some MAF files may contain two optional line types:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
108
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
109 - An "i" line containing information about what is in the aligned species DNA before and after the immediately preceding "s" line;
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
110 - An "e" line containing information about the size of the gap between the alignments that span the current block.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
111
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
112 **BED format** Browser Extensible Data format was designed at UCSC for displaying data tracks in the Genome Browser. It has three required fields and a number of additional optional ones:
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
113
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
114 The first three BED fields (required) are::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
115
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
116 1. chrom - The name of the chromosome (e.g. chr1, chrY_random).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
117 2. chromStart - The starting position in the chromosome. (The first base in a chromosome is numbered 0.)
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
118 3. chromEnd - The ending position in the chromosome, plus 1 (i.e., a half-open interval).
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
119
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
120 Additional (optional) fields are::
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
121
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
122 4. name - The name of the BED line.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
123 5. score - A score between 0 and 1000.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
124 6. strand - Defines the strand - either '+' or '-'.
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
125
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
126 ------
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
127
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
128 **Citation**
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
129
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
130 If you use this tool, please cite `Blankenberg D, Taylor J, Nekrutenko A; The Galaxy Team. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011 Sep 1;27(17):2426-2428. &lt;http://www.ncbi.nlm.nih.gov/pubmed/21775304&gt;`_
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
131
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
132
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
133 </help>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
134 <code file="maf_to_bed_code.py"/>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
135 </tool>
9071e359b9a3 Uploaded
xuebing
parents:
diff changeset
136