annotate getalleleseq.xml @ 0:c542b3075f29 draft

Uploaded repo.tar.gz
author boris
date Mon, 03 Feb 2014 13:07:13 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
1 <tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True">
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
2 <description>Generate major and minor allele sequences from alleles table</description>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
3 <command interpreter="python">getalleleseq.py
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
4 $alleles
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
5 -l $seq_length
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
6 -j $major_seq
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
7 -d $__new_file_path__
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
8 -p $major_seq.id
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
9 </command>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
10 <inputs>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
11 <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
12 <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
13 </inputs>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
14 <outputs>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
15 <data format="fasta" name="major_seq"/>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
16 </outputs>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
17 <tests>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
18 <test>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
19 <param name="alleles" value="test-table-getalleleseq.tab"/>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
20 <param name="seq_length" value="16569"/>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
21 <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
22 </test>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
23 </tests>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
24
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
25 <help>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
26
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
27
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
28 The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position.
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
29 Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence.
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
30
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
31 -----
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
32
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
33 .. class:: infomark
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
34
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
35 **What it does**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
36
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
37 It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample.
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
38 Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample)
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
39 to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses.
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
40 In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation.
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
41
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
42 -----
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
43
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
44 .. class:: warningmark
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
45
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
46 **Note**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
47
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
48 Please, follow the format described below for the input file:
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
49
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
50 -----
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
51
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
52 .. class:: infomark
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
53
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
54 **Formats**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
55
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
56 **Variant Annotator tool output format**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
57
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
58 Columns::
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
59
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
60 1. sample id
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
61 2. chromosome
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
62 3. position
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
63 4 counts for A's
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
64 5. counts for C's
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
65 6. counts for G's
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
66 7. counts for T's
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
67 8. Coverage
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
68 9. Number of alleles passing frequency threshold
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
69 10. Major allele
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
70 11. Minor allele
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
71 12. Minor allele frequency in position
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
72
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
73
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
74 **FASTA multiple alignment**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
75
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
76 See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
77
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
78 -----
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
79
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
80 **Example**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
81
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
82 - For the following dataset::
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
83
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
84 S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
85 S9 chrM 4 3 249 3 0 255 0 C N 0.0
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
86 S9 chrM 5 245 1 1 0 247 1 A N 0.0
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
87 S11 chrM 6 0 292 0 0 292 1 C . 0.0
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
88 S7 chrM 6 0 254 0 0 254 1 C . 0.0
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
89 S9 chrM 6 2 306 2 0 310 0 C N 0.0
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
90 S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
91 S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
92 etc. for all covered positions per sample...
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
93
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
94 - Running this tool with background sequence length 16569 will produce 4 files::
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
95
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
96 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
97 2. minor allele sequence of sample S7
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
98 3. minor allele sequence of sample S9
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
99 4. minor allele sequence of sample S11
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
100
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
101 -----
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
102
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
103 **Citation**
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
104
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
105 If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al. *In preparation.*
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
106 (boris-at-bx.psu.edu)
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
107
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
108 </help>
c542b3075f29 Uploaded repo.tar.gz
boris
parents:
diff changeset
109 </tool>