view WeightedAverage.xml @ 0:9b7b4009f2c0 draft

Imported from capsule None
author devteam
date Tue, 01 Apr 2014 10:48:52 -0400
parents
children 90611e86a998
line wrap: on
line source

<tool id="wtavg" name="Assign weighted-average" version="1.0.0">
  <description> of the values of features overlapping an interval </description>
  <command interpreter="python">WeightedAverage.py  $genomic_interval $genomic_feature $out_file1 -1 ${genomic_interval.metadata.chromCol},${genomic_interval.metadata.startCol},${genomic_interval.metadata.endCol},${genomic_interval.metadata.strandCol} -2 ${genomic_feature.metadata.chromCol},${genomic_feature.metadata.startCol},${genomic_feature.metadata.endCol},${genomic_feature.metadata.strandCol},${genomic_feature.metadata.nameCol}</command>

  <inputs>
    <param format="interval" name="genomic_interval" type="data" label="Genomic intervals (first dataset)" help="Dataset missing? See Note below."/>
    <param format="interval" name="genomic_feature" label="Genomic features (second dataset)" type="data" help="Make sure the value column is specified. See Note below." />
  </inputs>
  <outputs>
    <data format="input" name="out_file1" metadata_source="genomic_interval" />
  </outputs>
  <tests>
    <!-- Test data with valid values -->
    <test>
      <param name="genomic_interval" value="interval_interpolate.bed"/>
      <param name="genomic_feature" value="value_interpolate.bed"/>
      <output name="out_file1" file="interpolate_result.bed"/>
    </test>
    
  </tests>
  <help>


.. class:: infomark

**What it does**

For each interval in your first dataset, this tool calculates the weighted average value of the overlapping features in your second dataset.

- When a genomic interval partially or totally overlaps a single genomic feature, the value of that genomic feature is assigned to the genomic interval.
- When a genomic interval partially or totally overlaps with more than one genomic features, the average of the values of the overlapping genomic features weighted by the corresponding number of overlapping bases is assigned to the genomic interval.
- When a genomic interval does not overlap with any genomic feature, 'NA' will be assigned as it's value.

-----

.. class:: warningmark

**Note**

The input datasets should be in **bed** or **interval** format. Please use "edit attributes"/pencil icon to specify the column containing the values for the features in the second dataset as **name/identifier** column.

The output will contain all the columns in the first input plus a new column containing the assigned value for each interval.

-----

**Example**

- Suppose our first dataset contains the following **genomic intervals**::
    
    chr  start stop
    chr1 1000  2000
    chr1 3000  5000
    chr1 8000  9000
  
- and our second dataset contains the following **genomic features** each having an associated value (in fourth column) ::

    chr  start stop name
    chr1 900   1200 0.5
    chr1 2900  3100 0.2
    chr1 4800  5100 0.8
  
- For each **genomic interval** in our first dataset, this tool calculates the weighted average value of the overlapping **genomic features** in our second dataset ::

    chr1 1000  2000 0.5
    chr1 3000  5000 0.6
    chr1 8000  9000 NA
  


</help>
</tool>