13
|
1 <tool id="ReadLengthDistributionMatrix" name="Create read length distribution matrix" version="1.0.0">
|
|
2 <description>across a genomic interval</description>
|
|
3 <command interpreter="sh">galaxyToolRunner.sh ngs.ReadLengthDistributionMatrix -i $input --chr $chr --start $start --stop $stop --min $min --max $max --bin $bin -o $output</command>
|
|
4 <inputs>
|
|
5 <param format="sam,bam,bed,bedgraph" name="input" type="data" label="Mapped reads" />
|
|
6 <param name="chr" type="text" label="Chromosome" />
|
|
7 <param name="start" type="integer" value="1" label="Start base pair" />
|
|
8 <param name="stop" type="integer" value="1000" label="Stop base pair" />
|
|
9 <param name="min" type="integer" value="1" label="Minimum fragment length (bp)" />
|
|
10 <param name="max" type="integer" value="200" label="Maximum fragment length (bp)" />
|
|
11 <param name="bin" type="integer" value="1" label="Fragment length bin size (bp)" />
|
|
12 </inputs>
|
|
13 <outputs>
|
|
14 <data format="tabular" name="output" />
|
|
15 </outputs>
|
|
16
|
|
17 <help>
|
|
18
|
|
19 This tool will create a matrix (in matrix2png_ format) with the distribution of read lengths over each base pair. Reads are binned by genomic location and length to create a matrix where each column represents the distribution of read lengths over that base pair. The resulting matrix can be turned into heatmap using the Visualization -> Make heatmap with matrix2png tool.
|
|
20
|
|
21 .. _matrix2png: http://bioinformatics.ubc.ca/matrix2png/dataformat.html
|
|
22
|
|
23 .. class:: warningmark
|
|
24
|
|
25 This tool requires paired-end SAM, BAM, Bed, or BedGraph formatted data. Using single-end data will result in a constant read length.
|
|
26
|
|
27 -----
|
|
28
|
|
29 **Syntax**
|
|
30
|
|
31 - **Mapped reads** are the mapped paired-end reads used to make the histograms
|
|
32 - **Chromosome** a locus in the genome
|
|
33 - **Start base pair** a locus in the genome
|
|
34 - **Stop base pair** a locus in the genome
|
|
35 - **Minimum fragment length** is the lowest fragment length bin. Reads shorter than this will be ignored.
|
|
36 - **Maximum fragment length** is the highest fragment length bin. Reads longer than this will be ignored.
|
|
37 - **Fragment length bin size** is the bin size used when making the fragment length histograms
|
|
38
|
|
39 -----
|
|
40
|
|
41 **Example**
|
|
42
|
|
43 Make a matrix with the read length distribution across the region chrI:5001-6000, looking at reads 100-200bp in length in bins of 1bp:
|
|
44
|
|
45 - **Chromosome:** chrI
|
|
46 - **Start:** 5001
|
|
47 - **Stop:** 6000
|
|
48 - **Minimum fragment length:** 100
|
|
49 - **Maximum fragment length:** 200
|
|
50 - **Fragment length bin size:** 1
|
|
51
|
|
52 The resulting matrix will be 1000x101, with each column representing a base pair and each row representing a read length. The column headers give the base pair and the row headers give the read length.
|
|
53
|
|
54 -----
|
|
55
|
|
56 **Citation**
|
|
57
|
|
58 This tool was inspired by the analysis and figures in
|
|
59
|
|
60 Floer M, Wang X, Prabhu V, Berrozpe G, Narayan S, Spagna D, Alvarez D, Kendall J, Krasnitz A, Stepansky A, Hicks J, Bryant GO and Ptashne M (2010) A RSC/nucleosome complex determines chromatin architecture and facilitates activator binding. Cell 141: 407–418
|
|
61
|
|
62 </help>
|
|
63 </tool>
|