Mercurial > repos > boris > getalleleseq
view getalleleseq.xml @ 0:c542b3075f29 draft
Uploaded repo.tar.gz
author | boris |
---|---|
date | Mon, 03 Feb 2014 13:07:13 -0500 |
parents | |
children |
line wrap: on
line source
<tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True"> <description>Generate major and minor allele sequences from alleles table</description> <command interpreter="python">getalleleseq.py $alleles -l $seq_length -j $major_seq -d $__new_file_path__ -p $major_seq.id </command> <inputs> <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/> <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/> </inputs> <outputs> <data format="fasta" name="major_seq"/> </outputs> <tests> <test> <param name="alleles" value="test-table-getalleleseq.tab"/> <param name="seq_length" value="16569"/> <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/> </test> </tests> <help> The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position. Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence. ----- .. class:: infomark **What it does** It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample. Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample) to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses. In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation. ----- .. class:: warningmark **Note** Please, follow the format described below for the input file: ----- .. class:: infomark **Formats** **Variant Annotator tool output format** Columns:: 1. sample id 2. chromosome 3. position 4 counts for A's 5. counts for C's 6. counts for G's 7. counts for T's 8. Coverage 9. Number of alleles passing frequency threshold 10. Major allele 11. Minor allele 12. Minor allele frequency in position **FASTA multiple alignment** See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format ----- **Example** - For the following dataset:: S9 chrM 3 3 0 2 214 219 0 T A 0.013698630137 S9 chrM 4 3 249 3 0 255 0 C N 0.0 S9 chrM 5 245 1 1 0 247 1 A N 0.0 S11 chrM 6 0 292 0 0 292 1 C . 0.0 S7 chrM 6 0 254 0 0 254 1 C . 0.0 S9 chrM 6 2 306 2 0 310 0 C N 0.0 S11 chrM 7 281 0 3 0 284 0 A G 0.0105633802817 S7 chrM 7 249 0 2 0 251 1 A G 0.00796812749004 etc. for all covered positions per sample... - Running this tool with background sequence length 16569 will produce 4 files:: 1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11 2. minor allele sequence of sample S7 3. minor allele sequence of sample S9 4. minor allele sequence of sample S11 ----- **Citation** If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al. *In preparation.* (boris-at-bx.psu.edu) </help> </tool>