view galaxy-conf/ReadLengthDistributionMatrix.xml @ 18:b2a69d34385a draft

Performance improvements to Wig file checksumming. Bug fix in Autocorrelation tool.
author timpalpant
date Fri, 15 Jun 2012 15:07:33 -0400
parents eb53be9a09f4
children b43c420a6135
line wrap: on
line source

<tool id="ReadLengthDistributionMatrix" name="Create read length distribution matrix" version="1.0.0">
  <description>across a genomic interval</description>
  <command interpreter="sh">galaxyToolRunner.sh ngs.ReadLengthDistributionMatrix -i $input --chr $chr --start $start --stop $stop --min $min --max $max --bin $bin -o $output</command>
  <inputs>
      <param format="sam,bam,bed,bedgraph" name="input" type="data" label="Mapped reads" />
      <param name="chr" type="text" label="Chromosome" />
      <param name="start" type="integer" value="1" label="Start base pair" />
      <param name="stop" type="integer" value="1000" label="Stop base pair" />
      <param name="min" type="integer" value="1" label="Minimum fragment length (bp)" />
      <param name="max" type="integer" value="200" label="Maximum fragment length (bp)" />
      <param name="bin" type="integer" value="1" label="Fragment length bin size (bp)" />
  </inputs>
  <outputs>
      <data format="tabular" name="output" />
  </outputs>
  
<help>
  
This tool will create a matrix (in matrix2png_ format) with the distribution of read lengths over each base pair. Reads are binned by genomic location and length to create a matrix where each column represents the distribution of read lengths over that base pair. The resulting matrix can be turned into heatmap using the Visualization -> Make heatmap with matrix2png tool.
  
.. _matrix2png: http://bioinformatics.ubc.ca/matrix2png/dataformat.html
  
.. class:: warningmark

This tool requires paired-end SAM, BAM, Bed, or BedGraph formatted data. Using single-end data will result in a constant read length.

-----

**Syntax**

- **Mapped reads** are the mapped paired-end reads used to make the histograms
- **Chromosome** a locus in the genome
- **Start base pair** a locus in the genome
- **Stop base pair** a locus in the genome
- **Minimum fragment length** is the lowest fragment length bin. Reads shorter than this will be ignored.
- **Maximum fragment length** is the highest fragment length bin. Reads longer than this will be ignored.
- **Fragment length bin size** is the bin size used when making the fragment length histograms

-----
  
**Example**

Make a matrix with the read length distribution across the region chrI:5001-6000, looking at reads 100-200bp in length in bins of 1bp:

- **Chromosome:** chrI
- **Start:** 5001
- **Stop:** 6000
- **Minimum fragment length:** 100
- **Maximum fragment length:** 200
- **Fragment length bin size:** 1

The resulting matrix will be 1000x101, with each column representing a base pair and each row representing a read length. The column headers give the base pair and the row headers give the read length.

-----

**Citation**

This tool was inspired by the analysis and figures in 

Floer M, Wang X, Prabhu V, Berrozpe G, Narayan S, Spagna D, Alvarez D, Kendall J, Krasnitz A, Stepansky A, Hicks J, Bryant GO and Ptashne M (2010) A RSC/nucleosome complex determines chromatin architecture and facilitates activator binding. Cell 141: 407–418

</help>
</tool>