Mercurial > repos > mmaiensc > ember

<tool id="prep_data" name="PreProcess Expression Data" version="1.3">
  <description>Combines gene expression data</description>
  <command interpreter="perl">PreProcess_Expression_Data.pl -i $data -c $compslist -a $annot -o $output -p $thresh -l $log -v n</command>
  <inputs>
    <param format="txt" name="data" type="data" label="Expression data"/>
    <param format="txt" name="compslist" type="data" label="Comparison list"/>
    <param format="txt" name="annot" type="data" label="Annotation file"/>
    <param name="thresh" type="float" min="0" max="1" label="Percentile threshold" value="0.63" optional="true"/>
    <param name="log" type="select" label="Log transform data?">
        <option value="n" selected="true">No</option>
        <option value="y">Yes</option>
    </param>
  </inputs>
  <outputs>
    <data format="txt" name="output"/>
  </outputs>

  <tests>
    <test>
      <param name="data" value="EMBER/expression.txt"/>
      <param name="compslist" value="EMBER/comparisons_list.txt"/>
      <param name="annot" value="EMBER/annotation.txt"/>
      <param name="thresh" value="0.63"/>
      <param name="log" value="n"/>
      <output name="output" file="EMBER/expression_profiles.txt"/>
    </test>
  </tests>

  <help>

This tool discretizes the gene expression data and adds genomic annotations.

More options for the EMBER tools (especially for the main program, EMBER, including searching for multiple expression patterns) are available in the command line version, available at http://dinner-group.uchicago.edu/downloads.html. That package also includes test data and sample outputs.

When using any of the EMBER tools, please cite: M Maienschein-Cline, J Zhou, KP White, R Sciammas, and AR Dinner. Discovering transcription factor regulatory targets using gene expression and binding data. *Bioinformatics*, 28:206-213 (2012).

-----

Description of inputs:

*Expression Data*:

   Microarray data, with data from N experiments (and at least 2 replicates per condition).

   *Format (N+1 columns)*: [ID] [expt 1 value] [expt 2 value] ... [expt N value]

   IMPORTANT: the first line should be a title line, first field "#ID", and subsequent fields giving the condition/replicate for each column, i.e.,

      #ID [condition]#[replicate]...

   where [condition] matches the values in the Comparison List, and replicate tells which number the file is. [condition] and [replicate] are delimited by a "#" (so don't use that character in the condition name).

*Comparison List*:

   List of behavior dimension definitions. [condition] should match the names in the expression data list.

   *Format (2 columns)*: [condition1] [condition2]

*Annotation File*:

   Gives the genomic coordinates of each probe set.

   *Format (6 columns)*: [probe id] [gene name] [chromosome] [start] [end] [strand]

*Percentile Threshold* (p):

   Used to eliminate genes that are consistently expressed at a very low level. All data are concatenated into one list, and the pth percentile of that list is taken as the thresold. Then a probe set is removed if its value is less than the threshold in ALL conditions.

   p = 1.0 means all probes are retained, p = 0.0 means none are. However, note that this does NOT necessarily imply that 0.63 means 63% of probe sets are retained.

*Log Transform*: whether or not to take the log of the data before discretization.

  </help>

</tool>
author	mmaiensc
date	Wed, 29 Feb 2012 15:03:33 -0500
parents
children