0
|
1 <tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True">
|
|
2 <description>Generate major and minor allele sequences from alleles table</description>
|
|
3 <command interpreter="python">getalleleseq.py
|
|
4 $alleles
|
|
5 -l $seq_length
|
|
6 -j $major_seq
|
|
7 -d $__new_file_path__
|
|
8 -p $major_seq.id
|
|
9 </command>
|
|
10 <inputs>
|
|
11 <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/>
|
|
12 <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/>
|
|
13 </inputs>
|
|
14 <outputs>
|
|
15 <data format="fasta" name="major_seq"/>
|
|
16 </outputs>
|
|
17 <tests>
|
|
18 <test>
|
|
19 <param name="alleles" value="test-table-getalleleseq.tab"/>
|
|
20 <param name="seq_length" value="16569"/>
|
|
21 <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/>
|
|
22 </test>
|
|
23 </tests>
|
|
24
|
|
25 <help>
|
|
26
|
|
27
|
|
28 The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position.
|
|
29 Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence.
|
|
30
|
|
31 -----
|
|
32
|
|
33 .. class:: infomark
|
|
34
|
|
35 **What it does**
|
|
36
|
|
37 It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample.
|
|
38 Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample)
|
|
39 to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses.
|
|
40 In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation.
|
|
41
|
|
42 -----
|
|
43
|
|
44 .. class:: warningmark
|
|
45
|
|
46 **Note**
|
|
47
|
|
48 Please, follow the format described below for the input file:
|
|
49
|
|
50 -----
|
|
51
|
|
52 .. class:: infomark
|
|
53
|
|
54 **Formats**
|
|
55
|
|
56 **Variant Annotator tool output format**
|
|
57
|
|
58 Columns::
|
|
59
|
|
60 1. sample id
|
|
61 2. chromosome
|
|
62 3. position
|
|
63 4 counts for A's
|
|
64 5. counts for C's
|
|
65 6. counts for G's
|
|
66 7. counts for T's
|
|
67 8. Coverage
|
|
68 9. Number of alleles passing frequency threshold
|
|
69 10. Major allele
|
|
70 11. Minor allele
|
|
71 12. Minor allele frequency in position
|
|
72
|
|
73
|
|
74 **FASTA multiple alignment**
|
|
75
|
|
76 See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format
|
|
77
|
|
78 -----
|
|
79
|
|
80 **Example**
|
|
81
|
|
82 - For the following dataset::
|
|
83
|
|
84 S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137
|
|
85 S9 chrM 4 3 249 3 0 255 0 C N 0.0
|
|
86 S9 chrM 5 245 1 1 0 247 1 A N 0.0
|
|
87 S11 chrM 6 0 292 0 0 292 1 C . 0.0
|
|
88 S7 chrM 6 0 254 0 0 254 1 C . 0.0
|
|
89 S9 chrM 6 2 306 2 0 310 0 C N 0.0
|
|
90 S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817
|
|
91 S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004
|
|
92 etc. for all covered positions per sample...
|
|
93
|
|
94 - Running this tool with background sequence length 16569 will produce 4 files::
|
|
95
|
|
96 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11
|
|
97 2. minor allele sequence of sample S7
|
|
98 3. minor allele sequence of sample S9
|
|
99 4. minor allele sequence of sample S11
|
|
100
|
|
101 -----
|
|
102
|
|
103 **Citation**
|
|
104
|
|
105 If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al. *In preparation.*
|
|
106 (boris-at-bx.psu.edu)
|
|
107
|
|
108 </help>
|
|
109 </tool> |