Mercurial > repos > miller-lab > genome_diversity

<tool id="gd_coverage_distributions" name="Coverage Distributions" version="1.0.0">
  <description>: Examine sequence coverage for SNPs</description>

  <command interpreter="python">
    #import json
    #import base64
    #import zlib
    #set $ind_names = $input.dataset.metadata.individual_names
    #set $ind_colms = $input.dataset.metadata.individual_columns
    #set $ind_dict = dict(zip($ind_names, $ind_colms))
    #set $ind_json = json.dumps($ind_dict, separators=(',',':'))
    #set $ind_comp = zlib.compress($ind_json, 9)
    #set $ind_arg = base64.b64encode($ind_comp)
    coverage_distributions.py '$input' '0' '$output' '$output.files_path' '$ind_arg'
    #if $individuals.choice == '0'
      'all_individuals'
    #else if $individuals.choice == '1'
      #set $arg = 'individuals:%s' % str($individuals.p1_input)
        '$arg'
    #else if $individuals.choice == '2'
      #for $population in $individuals.populations
        #set $arg = 'population:%s:%s' % (str($population.p_input), str($population.p_input.name))
        '$arg'
      #end for
    #end if
  </command>

  <inputs>
    <param name="input" type="data" format="gd_snp" label="SNP dataset" />

    <conditional name="individuals">
      <param name="choice" type="select" label="Compute for">
        <option value="0" selected="true">All individuals</option>
        <option value="1">Individuals in a population</option>
        <option value="2">Totals of populations</option>
      </param>
      <when value="0" />
      <when value="1">
        <param name="p1_input" type="data" format="gd_indivs" label="Population individuals" />
      </when>
      <when value="2">
        <repeat name="populations" title="Population" min="1">
          <param name="p_input" type="data" format="gd_indivs" label="individuals" />
        </repeat>
      </when>
    </conditional>

    <!--
    <param name="data_source" type="select" label="Data source">
      <option value="0" selected="true">Sequence coverage</option>
      <option value="1">Genotype quality</option>
    </param>
    -->
  </inputs>

  <outputs>
    <data name="output" format="html" />
  </outputs>

  <requirements>
    <requirement type="package" version="0.1">gd_c_tools</requirement>
  </requirements>

  <tests>
    <test>
      <param name="input" value="test_in/sample.gd_snp" ftype="gd_snp" />
      <param name="choice" value="0" />
      <output name="output" file="test_out/coverage_distributions/coverage.html" ftype="html" compare="diff" lines_diff="2">
        <extra_files type="file" name="coverage.pdf" value="test_out/coverage_distributions/coverage.pdf" compare="sim_size" delta = "1000"/>
        <extra_files type="file" name="coverage.txt" value="test_out/coverage_distributions/coverage.txt" />
      </output>
    </test>
  </tests>

  <help>

**Dataset formats**

The input dataset is in gd_snp_ format.
The output is a composite dataset, containing both a text table and a PDF plot.
(`Dataset missing?`_)

.. _gd_snp: ./static/formatHelp.html#gd_snp
.. _Dataset missing?: ./static/formatHelp.html

-----

**What it does**

This tool reports distributions of a SNP reliability indicator, in this case
sequence coverage, for individuals or populations.
The coverage can be computed for all individuals, a subset of individuals,
or totals for populations defined by the Specify Individuals tool.
The results are reported as a text table giving the cumulative distributions,
and as a plot.

-----

**Example**

- input::

    chr1  14929  A  G  999    21  30  1  127   7  11   1  28   7  29   0   5   2  5   1  17  10  14  1  81   17  74  1   42  15  22  1  125   29  84  1   88   6  10  1  11  30  23  1  79  19  1  2  71  24  0   2   99  41  10   2    2
    chr1  17451  C  T  6.88  119   1  2  255  12   0   2  63  35   0   2  59  14  0   2  72  19   1  2  57  101   1  2  255  38   8  1   20  125   0  2  255  13   0  2  62  42   0  2  51  44  0  2  64  26  0   2  108  59   0   2  194
    chr1  30922  G  T  999     0  23  0   66   0   0  -1   0   0   0  -1   0   0  0  -1   0   0   2  0   3    0  14  0   39  14  16  1  153    0  45  0  132   6   0  2  48  19   0  2  87   3  0  2  32   0  0  -1   0    0   0  -1    0
    etc.

- text output::

                0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
     John West  0  0  0  0  0  0  0  0  1  1  1  1  2  2  3  3  4  4  5  6
       NA12892  0  2  5 11 20 31 43 55 67 77 84 90 93 96 97 98 99 99 99 99
       NA12891  0  0  0  0  0  1  1  2  3  5  6  9 11 15 19 23 29 35 41 47
       NA12249  1  4 11 23 38 54 68 79 88 93 96 98 99 99 99 99 99 99 99 99
       NA12342  0  0  1  1  2  4  6  9 13 18 23 29 36 43 50 58 65 71 77 82
           KB1  0  0  0  0  0  0  0  0  0  0  0  0  1  1  1  1  1  1  2  2
           ABT  0  0  0  0  0  0  1  1  1  2  3  4  5  6  8 10 12 14 18 21
       NA18507  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  1
       NA19238  0  0  0  1  2  4  6 10 14 19 25 32 39 47 55 62 69 76 81 86
       NA19239  0  0  0  0  1  1  2  4  5  8 11 15 19 24 31 37 44 51 58 65
            YH  2  4  6  7  8  8  9 10 11 12 14 17 19 22 25 29 32 36 40 45
        KOREAN  0  0  1  1  3  4  5  7 10 12 15 19 22 27 31 37 42 48 54 60
           JPT  0  0  0  0  0  0  0  0  1  1  1  2  2  3  4  5  7  8 10 12
           etc.

graphical output:

.. image:: $PATH_TO_IMAGES/gd_coverage.png

  </help>
</tool>
author	Richard Burhans <burhans@bx.psu.edu>
date	Wed, 20 Nov 2013 16:32:01 -0500
parents	a631c2f6d913
children