view tools/evolution/add_scores.xml @ 0:9071e359b9a3

Uploaded
author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
parents
children
line wrap: on
line source

<tool id="hgv_add_scores" name="phyloP" version="1.0.0">
  <description>interspecies conservation scores</description>

  <command>
    add_scores $input1 ${input1.metadata.dbkey} ${input1.metadata.chromCol} ${input1.metadata.startCol} ${GALAXY_DATA_INDEX_DIR}/add_scores.loc $out_file1
  </command>

  <inputs>
    <param format="interval" name="input1" type="data" label="Dataset">
      <validator type="unspecified_build"/>
      <validator type="dataset_metadata_in_file" filename="add_scores.loc" metadata_name="dbkey" metadata_column="0" message="Data is currently not available for the specified build."/>
    </param>
  </inputs>

  <outputs>
    <data format="input" name="out_file1" />
  </outputs>

  <requirements>
    <requirement type="package">add_scores</requirement>
  </requirements>

  <tests>
    <test>
      <param name="input1" value="add_scores_input1.interval" ftype="interval" dbkey="hg18" />
      <output name="output" file="add_scores_output1.interval" />
    </test>
    <test>
      <param name="input1" value="add_scores_input2.bed" ftype="interval" dbkey="hg18" />
      <output name="output" file="add_scores_output2.interval" />
    </test>
  </tests>

  <help>
.. class:: warningmark

This currently works only for build hg18.

-----

**Dataset formats**

The input can be any interval_ format dataset.  The output is also in interval format.
(`Dataset missing?`_)

.. _interval: ./static/formatHelp.html#interval
.. _Dataset missing?: ./static/formatHelp.html

-----

**What it does**

This tool adds a column that measures interspecies conservation at each SNP 
position, using conservation scores for primates pre-computed by the 
phyloP program.  PhyloP performs an exact P-value computation under a 
continuous Markov substitution model. 

The chromosome and start position
are used to look up the scores, so if a larger interval is in the input,
only the score for the first nucleotide is returned.

-----

**Example**

- input file, with SNPs::

    chr22  16440426  14440427  C/T
    chr22  15494851  14494852  A/G
    chr22  14494911  14494912  A/T
    chr22  14550435  14550436  A/G
    chr22  14611956  14611957  G/T
    chr22  14612076  14612077  A/G
    chr22  14668537  14668538  C
    chr22  14668703  14668704  A/T
    chr22  14668775  14668776  G
    chr22  14680074  14680075  A/T
    etc.

- output file, showing conservation scores for primates::

    chr22  16440426  14440427  C/T  0.509
    chr22  15494851  14494852  A/G  0.427
    chr22  14494911  14494912  A/T  NA
    chr22  14550435  14550436  A/G  NA
    chr22  14611956  14611957  G/T  -2.142
    chr22  14612076  14612077  A/G  0.369
    chr22  14668537  14668538  C    0.419
    chr22  14668703  14668704  A/T  -1.462
    chr22  14668775  14668776  G    0.470
    chr22  14680074  14680075  A/T  0.303
    etc.

  "NA" means that the phyloP score was not available.

-----

**Reference**

Siepel A, Pollard KS, Haussler D. (2006)
New methods for detecting lineage-specific selection.
In Proceedings of the 10th International Conference on Research in Computational
Molecular Biology (RECOMB 2006), pp. 190-205.

  </help>
</tool>