annotate getalleleseq/getalleleseq.xml @ 1:c16e440a53a0 draft

Changed xml to handle multiple output from current directory instead
author boris
date Tue, 18 Mar 2014 09:51:31 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
1 <tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True">
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
2 <description>Generate major and minor allele sequences from alleles table</description>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
3 <command interpreter="python">getalleleseq.py
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
4 $alleles
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
5 -l $seq_length
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
6 -j $major_seq
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
7 -p $major_seq.id
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
8 </command>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
9 <inputs>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
10 <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
11 <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
12 </inputs>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
13 <outputs>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
14 <data format="fasta" name="major_seq"/>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
15 </outputs>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
16 <tests>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
17 <test>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
18 <param name="alleles" value="test-table-getalleleseq.tab"/>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
19 <param name="seq_length" value="16569"/>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
20 <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
21 </test>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
22 </tests>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
23
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
24 <help>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
25
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
26
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
27 The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position.
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
28 Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence.
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
29
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
30 -----
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
31
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
32 .. class:: infomark
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
33
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
34 **What it does**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
35
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
36 It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample.
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
37 Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample)
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
38 to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses.
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
39 In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation.
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
40
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
41 -----
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
42
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
43 .. class:: warningmark
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
44
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
45 **Note**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
46
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
47 Please, follow the format described below for the input file:
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
48
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
49 -----
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
50
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
51 .. class:: infomark
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
52
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
53 **Formats**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
54
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
55 **Variant Annotator tool output format**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
56
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
57 Columns::
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
58
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
59 1. sample id
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
60 2. chromosome
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
61 3. position
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
62 4 counts for A's
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
63 5. counts for C's
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
64 6. counts for G's
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
65 7. counts for T's
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
66 8. Coverage
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
67 9. Number of alleles passing frequency threshold
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
68 10. Major allele
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
69 11. Minor allele
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
70 12. Minor allele frequency in position
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
71
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
72
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
73 **FASTA multiple alignment**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
74
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
75 See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
76
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
77 -----
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
78
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
79 **Example**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
80
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
81 - For the following dataset::
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
82
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
83 S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
84 S9 chrM 4 3 249 3 0 255 0 C N 0.0
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
85 S9 chrM 5 245 1 1 0 247 1 A N 0.0
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
86 S11 chrM 6 0 292 0 0 292 1 C . 0.0
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
87 S7 chrM 6 0 254 0 0 254 1 C . 0.0
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
88 S9 chrM 6 2 306 2 0 310 0 C N 0.0
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
89 S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
90 S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
91 etc. for all covered positions per sample...
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
92
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
93 - Running this tool with background sequence length 16569 will produce 4 files::
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
94
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
95 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
96 2. minor allele sequence of sample S7
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
97 3. minor allele sequence of sample S9
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
98 4. minor allele sequence of sample S11
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
99
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
100 -----
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
101
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
102 **Citation**
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
103
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
104 If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al (2014). *Acccepted in Biotechniques*
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
105 (boris-at-bx.psu.edu)
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
106
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
107 </help>
c16e440a53a0 Changed xml to handle multiple output from current directory instead
boris
parents:
diff changeset
108 </tool>