Mercurial > repos > boris > getalleleseq

<tool id="getalleleseq" name="FASTA from allele counts" version="0.0.1" force_history_refresh="True">
  <description>Generate major and minor allele sequences from alleles table</description>
  <command interpreter="python">getalleleseq.py
                                   $alleles
                                -l $seq_length
                                -j $major_seq
                                -p $major_seq.id
</command>
  <inputs>
    <param format="tabular" name="alleles" type="data" label="Table containing major and minor alleles base per position" help="must be tabular and follow the Variant Annotator tool output format"/>
    <param name="seq_length" type="integer" value="16569" label="Background sequence length" help="e.g. 16569 for mitochondrial variants"/>
  </inputs>
  <outputs>
    <data format="fasta" name="major_seq"/>
  </outputs>
  <tests>
    <test>
      <param name="alleles" value="test-table-getalleleseq.tab"/>
      <param name="seq_length" value="16569"/>
      <output name="major_seq" file="test-major-allele-out-getalleleseq.fa"/>
    </test>
  </tests>

  <help>


The major allele sequence of a sample is simply the sequence consisting of the most frequent nucleotide per position.
Replacing the major allele for the second most frequent allele at diploid positions generates the minor allele sequence.

-----

.. class:: infomark

**What it does**

It takes the table generated from the Variant Annotator tool to derive a major and minor allele sequence per sample.
Since all sequences share the same length all the major allele sequences are included into a single file (with proper headers per sample)
to create a multiple sequence alignment in FASTA format that can be used for downstream phylogenetic analyses.
In contrast, the minor allele sequences are informed as single FASTA files per sample to ease their downstream manipulation.

-----

.. class:: warningmark

**Note**

Please, follow the format described below for the input file:

-----

.. class:: infomark

**Formats**

**Variant Annotator tool output format**

Columns::

    1.  sample id
    2.  chromosome
    3.  position
    4   counts for A's
    5.  counts for C's
    6.  counts for G's
    7.  counts for T's
    8.  Coverage
    9.  Number of alleles passing frequency threshold
    10. Major allele
    11. Minor allele
    12. Minor allele frequency in position


**FASTA multiple alignment**

See http://www.bioperl.org/wiki/FASTA_multiple_alignment_format

-----

**Example**

- For the following dataset::

    S9	chrM	3	3	0	2	214	219	0	T	A	0.013698630137
    S9	chrM	4	3	249	3	0	255	0	C	N	0.0
    S9	chrM	5	245	1	1	0	247	1	A	N	0.0
    S11	chrM	6	0	292	0	0	292	1	C	.	0.0
    S7	chrM	6	0	254	0	0	254	1	C	.	0.0
    S9	chrM	6	2	306	2	0	310	0	C	N	0.0
    S11	chrM	7	281	0	3	0	284	0	A	G	0.0105633802817
    S7	chrM	7	249	0	2	0	251	1	A	G	0.00796812749004
    etc. for all covered positions per sample...

- Running this tool with background sequence length 16569 will produce 4 files::

    1. Multiple alignment FASTA file containing the major allele sequences of samples S7, S9 and S11
    2. minor allele sequence of sample S7
    3. minor allele sequence of sample S9
    4. minor allele sequence of sample S11

-----

**Citation**

If you use this tool, please cite Dickins B, Rebolledo-Jaramillo B, et al (2014). *Acccepted in Biotechniques*
(boris-at-bx.psu.edu)

  </help>
</tool>
author	boris
date	Tue, 18 Mar 2014 12:26:05 -0400
parents	698ede7baba9
children