Mercurial > repos > boris > getalleleseq
annotate getalleleseq/getalleleseq.xml @ 1:c16e440a53a0 draft
Changed xml to handle multiple output from current directory instead
author | boris |
---|---|
date | Tue, 18 Mar 2014 09:51:31 -0400 |
parents | |
children |
rev | line source |
---|---|
1
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
1 <tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True"> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
2 <description>Generate major and minor allele sequences from alleles table</description> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
3 <command interpreter="python">getalleleseq.py |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
4 $alleles |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
5 -l $seq_length |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
6 -j $major_seq |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
7 -p $major_seq.id |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
8 </command> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
9 <inputs> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
10 <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
11 <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
12 </inputs> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
13 <outputs> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
14 <data format="fasta" name="major_seq"/> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
15 </outputs> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
16 <tests> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
17 <test> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
18 <param name="alleles" value="test-table-getalleleseq.tab"/> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
19 <param name="seq_length" value="16569"/> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
20 <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
21 </test> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
22 </tests> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
23 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
24 <help> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
25 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
26 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
27 The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position. |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
28 Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence. |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
29 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
30 ----- |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
31 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
32 .. class:: infomark |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
33 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
34 **What it does** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
35 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
36 It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample. |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
37 Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample) |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
38 to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses. |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
39 In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation. |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
40 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
41 ----- |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
42 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
43 .. class:: warningmark |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
44 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
45 **Note** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
46 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
47 Please, follow the format described below for the input file: |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
48 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
49 ----- |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
50 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
51 .. class:: infomark |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
52 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
53 **Formats** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
54 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
55 **Variant Annotator tool output format** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
56 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
57 Columns:: |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
58 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
59 1. sample id |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
60 2. chromosome |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
61 3. position |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
62 4 counts for A's |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
63 5. counts for C's |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
64 6. counts for G's |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
65 7. counts for T's |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
66 8. Coverage |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
67 9. Number of alleles passing frequency threshold |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
68 10. Major allele |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
69 11. Minor allele |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
70 12. Minor allele frequency in position |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
71 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
72 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
73 **FASTA multiple alignment** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
74 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
75 See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
76 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
77 ----- |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
78 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
79 **Example** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
80 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
81 - For the following dataset:: |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
82 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
83 S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
84 S9 chrM 4 3 249 3 0 255 0 C N 0.0 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
85 S9 chrM 5 245 1 1 0 247 1 A N 0.0 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
86 S11 chrM 6 0 292 0 0 292 1 C . 0.0 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
87 S7 chrM 6 0 254 0 0 254 1 C . 0.0 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
88 S9 chrM 6 2 306 2 0 310 0 C N 0.0 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
89 S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
90 S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
91 etc. for all covered positions per sample... |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
92 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
93 - Running this tool with background sequence length 16569 will produce 4 files:: |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
94 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
95 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
96 2. minor allele sequence of sample S7 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
97 3. minor allele sequence of sample S9 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
98 4. minor allele sequence of sample S11 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
99 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
100 ----- |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
101 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
102 **Citation** |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
103 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
104 If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al (2014). *Acccepted in Biotechniques* |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
105 (boris-at-bx.psu.edu) |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
106 |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
107 </help> |
c16e440a53a0
Changed xml to handle multiple output from current directory instead
boris
parents:
diff
changeset
|
108 </tool> |