view dividePgSnpAlleles.xml @ 3:edf12470a1a6 default tip

Bugfix from Belinda, in vcf2pgSnp.pl
author Cathy Riemer <cathy+hg@bx.psu.edu>
date Thu, 19 Mar 2015 12:06:34 -0400
parents 35c20b109be5
children
line wrap: on
line source

<tool id="dividePgSnp" name="Separate pgSnp Alleles" version="1.1.0" hidden="false">
  <description>: Split allele info into separate columns</description>
  <command interpreter="perl">
    #if $refcol.ref == "yes" #dividePgSnpAlleles.pl -ref=$refcol.ref_column $input1 > $out_file1
    #else #dividePgSnpAlleles.pl $input1 > $out_file1
    #end if
  </command>
  <inputs>
    <param format="interval" name="input1" type="data" label="pgSnp dataset" />
    <conditional name="refcol">
      <param name="ref" type="select" label="Dataset has a column with the reference allele">
        <option value="yes">yes</option>
        <option value="no" selected="true">no</option>
      </param>
      <when value="yes">
      <param name="ref_column" type="data_column" data_ref="input1" label="Column with reference allele" />
      </when>
      <when value="no"> <!-- do nothing -->
      </when>
    </conditional>
  </inputs>
  <outputs>
  <data format="interval" name="out_file1" />
  </outputs>
  <tests>
    <test>
      <param name='input1' value='dividePgSnp_input.pgSnp' ftype='interval' />
      <param name='refcol' value='no' />
      <output name="output" file="dividePgSnp_output.txt" />
    </test>
  </tests>

  <help>

**Dataset formats**

The input dataset is of Galaxy datatype interval_, with the additional columns
required for pgSnp_ format.
Any further columns beyond those defined for pgSnp will be appended to the output.
The output dataset is in interval_ format.  (`Dataset missing?`_)

.. _interval: ./static/formatHelp.html#interval
.. _pgSnp: ./static/formatHelp.html#pgSnp
.. _Dataset missing?: ./static/formatHelp.html

-----

**What it does**

This separates the alleles from a pgSnp dataset into separate columns,
and also the frequencies and scores that go with the alleles.  It will skip
any positions with more than 2 alleles.  If only a single allele is given then "N"
will be used for the second, with a frequency and score of zero.  Or, if a 
column with reference alleles is provided, 
the value in that column will be used in place of the "N".

-----

**Example**

- input pgSnp file::

   chr1    256     257     A/C     2       3,4     10,20
   chr1    56100   56101   A       1       5       30
   chr1    77052   77053   A/G     2       6,7     40,50
   chr1    110904  110905  A       1       8       60
   etc.

- output::

   chr1    256     257     A       3       10       C       4       20
   chr1    56100   56101   A       5       30       N       0       0
   chr1    77052   77053   A       6       40       G       7       50
   chr1    110904  110905  A       8       60       N       0       0
   etc.

</help>
</tool>