Mercurial > repos > boris > getalleleseq
diff getalleleseq.xml @ 0:c542b3075f29 draft
Uploaded repo.tar.gz
author | boris |
---|---|
date | Mon, 03 Feb 2014 13:07:13 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/getalleleseq.xml Mon Feb 03 13:07:13 2014 -0500 @@ -0,0 +1,109 @@ +<tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True"> + <description>Generate major and minor allele sequences from alleles table</description> + <command interpreter="python">getalleleseq.py + $alleles + -l $seq_length + -j $major_seq + -d $__new_file_path__ + -p $major_seq.id +</command> + <inputs> + <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/> + <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/> + </inputs> + <outputs> + <data format="fasta" name="major_seq"/> + </outputs> + <tests> + <test> + <param name="alleles" value="test-table-getalleleseq.tab"/> + <param name="seq_length" value="16569"/> + <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/> + </test> + </tests> + + <help> + + +The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position. +Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence. + +----- + +.. class:: infomark + +**What it does** + +It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample. +Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample) +to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses. +In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation. + +----- + +.. class:: warningmark + +**Note** + +Please, follow the format described below for the input file: + +----- + +.. class:: infomark + +**Formats** + +**Variant Annotator tool output format** + +Columns:: + + 1. sample id + 2. chromosome + 3. position + 4 counts for A's + 5. counts for C's + 6. counts for G's + 7. counts for T's + 8. Coverage + 9. Number of alleles passing frequency threshold + 10. Major allele + 11. Minor allele + 12. Minor allele frequency in position + + +**FASTA multiple alignment** + +See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format + +----- + +**Example** + +- For the following dataset:: + + S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137 + S9 chrM 4 3 249 3 0 255 0 C N 0.0 + S9 chrM 5 245 1 1 0 247 1 A N 0.0 + S11 chrM 6 0 292 0 0 292 1 C . 0.0 + S7 chrM 6 0 254 0 0 254 1 C . 0.0 + S9 chrM 6 2 306 2 0 310 0 C N 0.0 + S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817 + S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004 + etc. for all covered positions per sample... + +- Running this tool with background sequence length 16569 will produce 4 files:: + + 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11 + 2. minor allele sequence of sample S7 + 3. minor allele sequence of sample S9 + 4. minor allele sequence of sample S11 + +----- + +**Citation** + +If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al. *In preparation.* +(boris-at-bx.psu.edu) + + </help> +</tool> \ No newline at end of file