comparison getalleleseq.xml @ 8:698ede7baba9 draft

Uploaded
author boris
date Tue, 18 Mar 2014 12:25:24 -0400
parents
children
comparison
equal deleted inserted replaced
7:654b9e711967 8:698ede7baba9
1 <tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True">
2 <description>Generate major and minor allele sequences from alleles table</description>
3 <command interpreter="python">getalleleseq.py
4 $alleles
5 -l $seq_length
6 -j $major_seq
7 -p $major_seq.id
8 </command>
9 <inputs>
10 <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/>
11 <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/>
12 </inputs>
13 <outputs>
14 <data format="fasta" name="major_seq"/>
15 </outputs>
16 <tests>
17 <test>
18 <param name="alleles" value="test-table-getalleleseq.tab"/>
19 <param name="seq_length" value="16569"/>
20 <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/>
21 </test>
22 </tests>
23
24 <help>
25
26
27 The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position.
28 Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence.
29
30 -----
31
32 .. class:: infomark
33
34 **What it does**
35
36 It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample.
37 Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample)
38 to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses.
39 In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation.
40
41 -----
42
43 .. class:: warningmark
44
45 **Note**
46
47 Please, follow the format described below for the input file:
48
49 -----
50
51 .. class:: infomark
52
53 **Formats**
54
55 **Variant Annotator tool output format**
56
57 Columns::
58
59 1. sample id
60 2. chromosome
61 3. position
62 4 counts for A's
63 5. counts for C's
64 6. counts for G's
65 7. counts for T's
66 8. Coverage
67 9. Number of alleles passing frequency threshold
68 10. Major allele
69 11. Minor allele
70 12. Minor allele frequency in position
71
72
73 **FASTA multiple alignment**
74
75 See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format
76
77 -----
78
79 **Example**
80
81 - For the following dataset::
82
83 S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137
84 S9 chrM 4 3 249 3 0 255 0 C N 0.0
85 S9 chrM 5 245 1 1 0 247 1 A N 0.0
86 S11 chrM 6 0 292 0 0 292 1 C . 0.0
87 S7 chrM 6 0 254 0 0 254 1 C . 0.0
88 S9 chrM 6 2 306 2 0 310 0 C N 0.0
89 S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817
90 S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004
91 etc. for all covered positions per sample...
92
93 - Running this tool with background sequence length 16569 will produce 4 files::
94
95 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11
96 2. minor allele sequence of sample S7
97 3. minor allele sequence of sample S9
98 4. minor allele sequence of sample S11
99
100 -----
101
102 **Citation**
103
104 If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al (2014). *Acccepted in Biotechniques*
105 (boris-at-bx.psu.edu)
106
107 </help>
108 </tool>