Mercurial > repos > timpalpant > java_genomics_toolkit
changeset 12:81d5b81fb3c2 draft
Added help for all tools in the toolkit. Many bug fixes and a few new nucleosome tools.
line wrap: on
 line diff
--- a/galaxy-conf/Add.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Add.xml Wed Apr 25 16:53:48 2012 -0400 @@ -16,5 +16,12 @@ </outputs> <help> + +This tool will add all values in the specified Wig files base pair by base pair. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/Autocorrelation.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Autocorrelation.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,5 +1,5 @@ -<tool id="Autocorrelation" name="Compute the autocorrelation" version="1.0.0"> - <description>on data in a Wiggle file</description> +<tool id="Autocovariance" name="Compute the autocovariance" version="2.0.0"> + <description>of data in a Wiggle file</description> <command interpreter="sh">galaxyToolRunner.sh ngs.Autocorrelation -i $input -l $windows -m $max -o $output</command> <inputs> <param format="bigwig,wig" name="input" type="data" label="Input data" /> @@ -11,8 +11,30 @@ </outputs> <help> -.. class:: warningmark + +This tool computes the unnormalized autocovariance_ of intervals of data in a Wig file. + +.. _autocovariance: http://en.wikipedia.org/wiki/Autocorrelation + +----- + +**Syntax** + +- **Input data** is the genomic data on which to compute the autocorrelation. +- **List of intervals:** The autocorrelation will be computed for each genomic interval specified in this list. +- **Maximum shift:** In computing the autocorrelation, the data will be phase-shifted up to this limit. -This tool requires Wiggle/BigWig input data. +----- + +.. class:: infomark + +**TIP:** For more information, see Wikipedia_ (right click to open this link in another window). + +.. _Wikipedia: http://en.wikipedia.org/wiki/Autocorrelation + +.. class:: infomark + +**TIP:** If your input data does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format. Similarly, the intervals must be in either Bed, BedGraph, or GFF format. + </help> </tool>
--- a/galaxy-conf/Average.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Average.xml Wed Apr 25 16:53:48 2012 -0400 @@ -22,5 +22,12 @@ <help> + +This tool will average the values of the provided Wig files, base pair by base pair. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/BaseAlignCounts.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/BaseAlignCounts.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,4 +1,4 @@ -<tool id="BaseAlignCounts" name="Map coverage" version="1.0.0"> +<tool id="BaseAlignCounts" name="Calculate coverage" version="1.0.0"> <description>of sequencing reads</description> <command interpreter="sh">galaxyToolRunner.sh ngs.BaseAlignCounts -i $input -a ${chromInfo} -x $X -o $output</command> <inputs> @@ -10,8 +10,29 @@ </outputs> <help> - .. class:: warningmark - - This tool requires sequencing reads in SAM, BAM, or Bed format. + +This tool produces a new Wig file with the number of reads/intervals overlapping each base pair. Reads can be artificially extended to match known fragment lengths. If you wish to count the number of reads starting at each base pair, set the read extension to 1. If you wish to count the number of intervals overlapping each base pair, set the extension to -1. + +----- + +.. class:: warningmark + +This tool requires sequencing reads in SAM, BAM, Bed, or BedGraph format. If you are artificially extending reads, ensure that the strand is set correctly in SAM, BAM, and Bed files. + +.. class:: infomark + +If you would like to convert valued interval data (e.g. BedGraph files from microarrays) to Wig format, use the Converters -> Interval to Wig converter. + +.. class:: infomark + +**TIP:** If you are going to be using reads in SAM format for multiple analyses, it is often more efficient to first convert it into BAM format using NGS: SAM Tools -> SAM-to-BAM. + +----- + +**Syntax** + +- **Sequencing reads** are mapped reads from a high-throughput sequencing experiment. +- **In silico extension:** Reads will be artificially extended from their 5' end to be this length. + </help> </tool>
--- a/galaxy-conf/DNAPropertyCalculator.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/DNAPropertyCalculator.xml Wed Apr 25 16:53:48 2012 -0400 @@ -80,4 +80,18 @@ </actions> </data> </outputs> + + <help> + +This tool will create a new Wig file with genome-wide calculations of sequence-specific DNA properties determined from local n-nucleotide sequences. DNA properties are calculated using AJT_. + +.. _AJT: http://www.abeel.be/ajt + +----- + +**Example** + +To calculate GC-content, choose your genome assembly and select "GC" as the property. This will create a new Wig file in which G and C nucleotides are represented by 1, while A and T nucleotides are represented by -1. If you would like to compute GC-content in 10-bp windows, use the WigMath -> Moving average tool to compute a moving average with 10bp windows. + + </help> </tool>
--- a/galaxy-conf/Divide.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Divide.xml Wed Apr 25 16:53:48 2012 -0400 @@ -36,5 +36,10 @@ </tests> <help> + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/Downsample.xml Wed Apr 25 16:53:48 2012 -0400 @@ -0,0 +1,28 @@ +<tool id="WigDownsample" name="Downsample" version="1.0.0"> + <description>a (Big)Wig file</description> + <command interpreter="sh">galaxyToolRunner.sh wigmath.Downsample -i $input -m $metric -w $window -o $output</command> + <inputs> + <param format="bigwig,wig" name="input" type="data" label="Original data" /> + <param name="window" type="integer" value="100" label="Window size (bp)" /> + <param name="metric" type="select" label="Downsampling method"> + <option value="mean">Mean</option> + <option value="min">Min</option> + <option value="max">Max</option> + </param> + </inputs> + <outputs> + <data format="wig" name="output" metadata_source="input" /> + </outputs> + + <help> + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + +----- + +This tool can be used to reduce the resolution and file size of Wig files for easier upload to UCSC. Data is downsampled in non-overlapping moving windows starting from the beginning of each chromosome. Each window can be downsampled as the arithmetic mean, minimum, or maximum value of the original data. + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/DynaPro.xml Wed Apr 25 16:53:48 2012 -0400 @@ -0,0 +1,48 @@ +<tool id="DynaPro" name="Compute equilibrium nucleosome positions" version="1.0.0"> + <description>using DynaPro</description> + <command interpreter="sh">galaxyToolRunner.sh nucleosomes.DynaPro -i $input -n $N + #if str( $mean ) != '' + -m $mean + #end if + + #if str( $variance ) != '' + -v $variance + #end if + -o $output + </command> + <inputs> + <param format="bigwig,wig" name="input" type="data" label="Energy landscape" /> + <param name="N" type="integer" value="147" label="Nucleosome size (bp)" /> + <param name="mean" type="float" optional="true" label="Shift energy landscape to have mean" /> + <param name="variance" type="float" optional="true" label="Rescale energy landscape to have variance" /> + </inputs> + <outputs> + <data format="wig" name="output" metadata_source="input" /> + </outputs> + <help> + +.. class:: warningmark + +At present, this tool is only suitable for small genomes (yeast) since entire chromosomes must be loaded into memory. + +----- + +Equilibrium nucleosome distribution is modeled as a one-dimensional fluid of hard rods adsorbing and moving within an external potential. This tool provides a simplified version of the DynaPro_ algorithm for a single factor interacting with hard-core repulsion. + +.. _DynaPro: http://nucleosome.rutgers.edu/nucleosome/ + +----- + +**Syntax** + +- **Energy landscape** is the external potential function for each genomic base pair, and must be in Wig format. +- **Nucleosome size** is the hard-core interaction size. + +----- + +**Citation** + +Morozov AV, Fortney K, Gaykalova DA, Studitsky VM, Widom J and Siggia ED (2009) Using DNA mechanics to predict in vitro nucleosome positions and formation energies. Nucleic Acids Res 37: 4707–4722. + + </help> +</tool>
--- a/galaxy-conf/FilterOutlierRegions.xml Mon Apr 09 11:50:23 2012 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,12 +0,0 @@ -<tool id="WigFilterOutliers" name="Filter outlier regions" version="1.0.0"> - <description>in a (Big)Wig file</description> - <command interpreter="sh">galaxyToolRunner.sh wigmath.FilterOutlierRegions -i $input -w $window -t $threshold -o $output</command> - <inputs> - <param format="bigwig,wig" name="input" type="data" label="Filter outlier regions in" /> - <param name="window" type="integer" value="150" label="Window size" /> - <param name="threshold" type="float" value="3" label="Threshold (fold times the mean)" /> - </inputs> - <outputs> - <data format="wig" name="output" metadata_source="input" /> - </outputs> -</tool>
--- a/galaxy-conf/FindAbsoluteMaxima.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/FindAbsoluteMaxima.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,5 +1,5 @@ <tool id="FindWigMaxima" name="Find absolute maxima" version="1.0.0"> - <description>in windows</description> + <description>in intervals</description> <command interpreter="sh"> galaxyToolRunner.sh ngs.FindAbsoluteMaxima -l $window -o $output #for $input in $inputs @@ -10,15 +10,43 @@ <repeat name="inputs" title="(Big)Wig file"> <param name="file" type="data" format="bigwig,wig" /> </repeat> - <param name="window" type="data" format="bed,bedgraph,gff" label="Windows to find maxima in" /> + <param name="window" type="data" format="bed,bedgraph,gff" label="Intervals to find maxima in" /> </inputs> <outputs> <data name="output" format="tabular" /> </outputs> <help> - .. class:: warningmark + +This tool can be used to find the location of the maximum value in genomic intervals, such as finding the peak summit inside a set of peak calls. - This tool requires input data in Wig format. Regions should be specified as Bed, BedGraph, or GFF format +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. Intervals must be provided in Bed, BedGraph, or GFF format. + +----- + +**Example** + + +if **Intervals** are genes :: + + chr11 5203271 5204877 NM_000518 0 - + chr11 5210634 5212434 NM_000519 0 - + chr11 5226077 5227663 NM_000559 0 - + +and **Wig files** are :: + + Data1.wig + Data2.wig + +this tool will find the location of the maximum value in each interval for each of the provided Wig/BigWig files, and append them in columns in the order that they were added :: + + chr11 5203271 5204877 NM_000518 0 - 5203374 5204300 + chr11 5210634 5212434 NM_000519 0 - 5210638 5212450 + chr11 5226077 5227663 NM_000559 0 - 5226800 5226241 + +where column 7 is the location of the maximum value in that interval for Data1.wig, and column 7 is the location of the maximum value in that interval for Data2.wig. + </help> </tool>
--- a/galaxy-conf/FindBoundaryNucleosomes.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/FindBoundaryNucleosomes.xml Wed Apr 25 16:53:48 2012 -0400 @@ -2,16 +2,29 @@ <description>in windows</description> <command interpreter="sh">galaxyToolRunner.sh nucleosomes.FindBoundaryNucleosomes -i $input -l $loci -o $output</command> <inputs> - <param name="input" type="data" format="nukes" label="Nucleosome calls" /> - <param name="loci" type="data" format="bed" label="List of intervals" /> + <param name="input" type="data" format="tabular" label="Nucleosome calls" /> + <param name="loci" type="data" format="bed,bedgraph,gff" label="List of intervals" /> </inputs> <outputs> <data name="output" format="bed" metadata_source="loci" /> </outputs> <help> - .. class:: warningmark + +.. class:: infomark - Use the Call Nucleosomes tool to create a file of called nucleosomes, then use this tool to identify the first nucleosome's dyad position from the 5' or 3' end. +Use the Call Nucleosomes tool to create a file of called nucleosomes, then use this tool to identify the first nucleosome's dyad position (peak maximum) from the 5' and 3' end of the gene. + +.. class:: infomark + +**TIP:** Nucleosome calls must be in tabular format of the kind produced by the Nucleosomes -> Call nucleosomes tool. Intervals must be in either Bed, BedGraph, or GFF format. + +----- + +**Syntax** + +- **Nucleosome calls** is a list of stereotypic nucleosome position calls. +- **List of intervals:** The 5' and 3' boundary nucleosomes will be found for each interval in this list + </help> </tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/FindOutlierRegions.xml Wed Apr 25 16:53:48 2012 -0400 @@ -0,0 +1,32 @@ +<tool id="WigFindOutliers" name="Find outlier regions" version="1.0.0"> + <description>such as CNVs</description> + <command interpreter="sh">galaxyToolRunner.sh ngs.FindOutlierRegions -i $input -w $window -t $threshold -o $output</command> + <inputs> + <param format="bigwig,wig" name="input" type="data" label="Input data" /> + <param name="window" type="integer" value="150" label="Window size" /> + <param name="threshold" type="float" value="3" label="Threshold (fold times the mean)" /> + </inputs> + <outputs> + <data format="bed" name="output" metadata_source="input" /> + </outputs> + + <help> + +This tool identifies regions of the genome that may be repetitive elements or CNVs by scanning for windows that have an exceptionally high mean relative to the genome-wide mean. + +----- + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + +----- + +**Syntax** + +- **Input data** is Wig or BigWig formatted data from a high-throughput sequencing experiment. +- **Window size** is the size of the moving average to use. +- **Threshold** is the fold times the genome-wide mean that a window's mean must be in order to be considered an outlier region. + + </help> +</tool>
--- a/galaxy-conf/GaussianSmooth.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/GaussianSmooth.xml Wed Apr 25 16:53:48 2012 -0400 @@ -54,5 +54,14 @@ </tests> <help> + +This tool smooths genomic data with an area-preserving Gaussian_ filter. The Gaussian filter is computed out to +/- 3 standard deviations. + +.. _Gaussian: http://en.wikipedia.org/wiki/Gaussian_filter + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/GeneTrackToBedGraph.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/GeneTrackToBedGraph.xml Wed Apr 25 16:53:48 2012 -0400 @@ -10,12 +10,13 @@ <help> -.. class:: warningmark +This tool will sum the counts from the forward and reverse strands in a GeneTrack_ index to create a BedGraph file. -This tool will sum the counts from the forward and reverse strands in a GeneTrack index to create a BedGraph file. +.. _GeneTrack: http://atlas.bx.psu.edu/genetrack/docs/genetrack.html .. class:: warningmark This tool requires GeneTrack formatted data. If you have tabular data that was not correctly autodetected, change the metadata by clicking on the pencil icon for the dataset. + </help> </tool>
--- a/galaxy-conf/GeneTrackToWig.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/GeneTrackToWig.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,8 +1,9 @@ <tool id="GeneTrackToWig" name="GeneTrack to Wig" version="1.0.0"> <description>converter</description> - <command interpreter="sh">galaxyToolRunner.sh converters.GeneTrackToWig -i $input $zero -a ${chromInfo} -o $output</command> + <command interpreter="sh">galaxyToolRunner.sh converters.GeneTrackToWig -i $input -s $shift $zero -a ${chromInfo} -o $output</command> <inputs> <param name="input" type="data" format="genetrack" label="Input GeneTrack index" /> + <param name="shift" type="integer" value="0" optional="true" label="Shift +/- strand counts by this amount when merging" /> <param name="zero" type="boolean" checked="false" truevalue="-z" falsevalue="" label="Assume zero where there is no data (default is NaN)" /> </inputs> <outputs> @@ -10,9 +11,14 @@ </outputs> <help> + +This tool will convert GeneTrack_ format files into Wig files, optionally offsetting the + and - strand counts by a specified value before merging them. + +.. _GeneTrack: http://atlas.bx.psu.edu/genetrack/docs/genetrack.html .. class:: warningmark This tool requires GeneTrack formatted data. If you have tabular data that was not correctly autodetected, change the metadata by clicking on the pencil icon for the dataset. + </help> </tool>
--- a/galaxy-conf/GreedyCaller.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/GreedyCaller.xml Wed Apr 25 16:53:48 2012 -0400 @@ -11,8 +11,39 @@ </outputs> <help> + +Stereotypic nucleosome positions are identified from dyad density maps using an approach similar to the previously reported greedy algorithm in GeneTrack_ (Albert, et al. 2008). Nucleosome calls are identified at peak maxima (p) in the smoothed dyad density map, and then excluded in the surrounding window [p–N, p+N], where N is the assumed nucleosome size in base pairs. This process is continued until all possible sterically hindered nucleosome positions are identified. + +.. _GeneTrack: http://atlas.bx.psu.edu/genetrack/docs/genetrack.html + .. class:: warningmark -This tool requires dyad counts and smoothed dyad counts. +This tool requires dyad counts and smoothed dyad counts in Wig or BigWig format. Smoothed dyad counts can be generated from dyad counts using the WigMath -> Gaussian smooth tool. + +----- + +**Syntax** + +- **Dyad counts** is the relative number of nucleosomes positioned at each base pair. +- **Smoothed dyad counts** should correspond to a smoothed version of the **Dyad counts** +- **Assumed nucleosome size** is the window size used while identifying maxima to restrict overlapping calls. + +----- + +**Output** + +The output format has 10 columns defined as follows + +- 1. **Chromosome:** the chromosome of this nucleosome call +- 2. **Start:** the lower coordinate of the call window, equal to the dyad position - N/2 +- 3. **Stop:** the higher coordinate of the call window, equal to the dyad position + N/2 +- 4. **Length:** the window size (N) of the nucleosome call, equal to the value specified when the tool was run +- 5. **Length standard deviation:** the standard deviation of the nucleosome call length (equal to 0 because it is not currently calculated) +- 6. **Dyad:** the location of the peak maximum (p) in the smoothed dyad density data +- 7. **Dyad standard deviation:** the standard deviation of dyad density around the dyad mean in the dyad counts data +- 8. **Conditional position:** the probability that a nucleosome is at this exact dyad location as opposed to anywhere else in the nucleosome call window [p-N/2, p+N/2] +- 9. **Dyad mean:** the mean of the dyad counts in the window [p-N/2, p+N/2] +- 10. **Occupancy:** the sum of the dyad counts in the window [p-N/2, p+N/2] + </help> </tool>
--- a/galaxy-conf/IntervalAverager.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/IntervalAverager.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,8 +1,6 @@ <tool id="IntervalAverager" name="Average intervals" version="1.0.0"> <description>that have been aligned</description> - <command interpreter="sh"> - galaxyToolRunner.sh visualization.IntervalAverager -i $input -l $loci -o $output - </command> + <command interpreter="sh">galaxyToolRunner.sh visualization.IntervalAverager -i $input -l $loci -o $output</command> <inputs> <param format="wig,bigwig" name="input" type="data" label="Sequencing data" /> <param format="bed" name="loci" type="data" label="List of intervals (with alignment points)" /> @@ -12,5 +10,23 @@ </outputs> <help> + +This tool calculates the average signal for a set of aligned intervals. Intervals are lined up on their alignment point (column 5 in the Bed file), flipped if on the - strand, and averaged. The output is equivalent to aligning the data in a matrix and then taking the columnwise average of the matrix. + +Intervals with alignment points must be provided in the following extended Bed format :: + + chr low high id alignment strand + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + +----- + +**Syntax** + +- **Sequencing data** is the genomic data used to create the average +- **List of intervals** is a list of intervals in Bed format with alignment points + </help> </tool>
--- a/galaxy-conf/IntervalLengthDistribution.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/IntervalLengthDistribution.xml Wed Apr 25 16:53:48 2012 -0400 @@ -9,5 +9,12 @@ </outputs> <help> + +This tool calculates the distribution of interval lengths from a list of intervals or reads in SAM, BAM, Bed, BedGraph, or GFF format. + +.. class:: warningmark + +For paired-end sequencing reads, the length is the length of the fragment (5' end of read 1 to 5' end of read 2) + </help> </tool>
--- a/galaxy-conf/IntervalStats.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/IntervalStats.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,7 +1,7 @@ -<tool id="IntervalStats" name="Compute window statistics" version="1.0.0"> - <description>on data in a Wiggle file</description> +<tool id="IntervalStats" name="Compute mean/min/max of intervals" version="1.0.0"> + <description>of data in a Wiggle file</description> <command interpreter="sh"> - galaxyToolRunner.sh ngs.IntervalStats -l $windows -o $output + galaxyToolRunner.sh ngs.IntervalStats -l $windows -s $stat -o $output #for $input in $inputs ${input.file} #end for @@ -11,21 +11,34 @@ <param name="file" type="data" format="bigwig,wig" /> </repeat> <param format="bed,bedgraph,gff" name="windows" type="data" label="List of intervals" /> - <!-- TODO: Implement other statistics - <param name="stat" type="select" optional="true" label="For each window, compute the"> + <param name="stat" type="select" optional="true" label="For each interval, compute the"> <option value="mean">Mean</option> - <option value="median">Median</option> + <!-- TODO <option value="median">Median</option> --> <option value="max">Max</option> <option value="min">Min</option> - </param> --> + </param> </inputs> <outputs> <data format="tabular" name="output" /> </outputs> <help> -.. class:: warningmark + +This tool calculates the arithmetic mean, maximum, or minimum value for the Wig data in each interval. For each Wig file provided, an additional column is added to the output file in the order that they are added above. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + +----- -This tool requires Wiggle/BigWig input data. +**Example** + +Calculate the mean change in nucleosome occupancy for each gene in the yeast genome: + +- 1. Create a "change in occupancy" dataset by subtracting the normalized occupancy Wig files from your two conditions using the WigMath -> Subtract tool. +- 2. Upload a list of intervals corresponding to the genes in the yeast genome, or pull the data from UCSC using Get Data -> UCSC Main. +- 3. Calculate the mean change in occupancy for each gene using this tool and the datasets from (1) and (2). + </help> </tool>
--- a/galaxy-conf/IntervalToWig.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/IntervalToWig.xml Wed Apr 25 16:53:48 2012 -0400 @@ -11,8 +11,11 @@ <help> +This tool converts data from an interval format, such as Bed, BedGraph or GFF, to Wig format. This can be used to convert data from microarrays to Wig format. The value of each interval is mapped into the Wig file. Intervals that overlap in the original file (multiple-valued base pairs) are averaged, and bases without data in the original interval file are set to NaN. + .. class:: warningmark This tool requires Bed, BedGraph, or GFF formatted data. If you have tabular data that was not correctly autodetected, change the metadata by clicking on the pencil icon for the dataset. + </help> </tool>
--- a/galaxy-conf/KMeans.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/KMeans.xml Wed Apr 25 16:53:48 2012 -0400 @@ -14,5 +14,28 @@ </tests> <help> + +.. class:: warningmark + +This tool requires tabular data in matrix2png format (with column AND row headers). For more information about the required format and usage instructions, see the matrix2png_ website. + +.. _matrix2png: http://bioinformatics.ubc.ca/matrix2png/dataformat.html + +.. class:: infomark + +You can use the "Align values in a matrix" tool to create a matrix, then use this tool to cluster the matrix with k-means. + +.. class:: infomark + +**TIP:** You can use the **min** and **max** columns to cluster a large matrix based on a subset of the columns. For example, you could cluster a 4000x4000 matrix on columns 200-300 by setting min = 200 and max = 300. This will greatly increase the efficiency of distance calculations during the k-means EM, and also allows you to cluster based on specific regions, such as promoters or coding sequences. + +----- + +This tool will cluster the rows in an aligned matrix with KMeans_. The implementation builds upon the KMeansPlusPlusClusterer available in commons-math3_. + +.. _KMeans: http://en.wikipedia.org/wiki/K-means_clustering + +.. _commons-math3: http://commons.apache.org/math/ + </help> </tool>
--- a/galaxy-conf/LogTransform.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/LogTransform.xml Wed Apr 25 16:53:48 2012 -0400 @@ -48,5 +48,10 @@ </tests> <help> + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/MapDyads.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/MapDyads.xml Wed Apr 25 16:53:48 2012 -0400 @@ -26,12 +26,16 @@ </outputs> <help> + +This tool produces a Wig file with the number of dyads at each base pair. For paired-end MNase data, dyads are approximated using the center of the fragment. For Bed/BedGraph formatted input, this means the center of the interval; for SAM/BAM formatted input, this means the middle between the 5' end of mate 1 and the 5' end of mate 2. For single-end data, the estimated mononucleosome fragment length (N) must be specified, which will be used to offset reads from the + and - strands by +/- N/2. + .. class:: warningmark - This tool requires sequencing reads in SAM, BAM, Bed format. +This tool requires sequencing reads in SAM, BAM, Bed, or BedGraph format. .. class:: warningmark -For paired-end MNase data, read centers are approximated using the center of the read. For single-end data, the estimated mononucleosome fragment length must be specified, which will be used to offset reads from the + and - strands. +Since BedGraph format does not contain strand information, all reads in BedGraph format are considered to be on the 5' strand. + </help> </tool>
--- a/galaxy-conf/MatrixAligner.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/MatrixAligner.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,21 +1,10 @@ <tool id="MatrixAligner" name="Align values in a matrix" version="1.0.0"> <description>for a heatmap</description> - <command interpreter="sh"> - galaxyToolRunner.sh visualization.MatrixAligner -i $input -l $loci -m $M -o $output - </command> + <command interpreter="sh">galaxyToolRunner.sh visualization.MatrixAligner -i $input -l $loci -m $M -o $output</command> <inputs> <param format="wig,bigwig" name="input" type="data" label="Sequencing data" /> <param format="bed" name="loci" type="data" label="List of intervals (with alignment points)" /> <param type="integer" name="M" value="4000" label="Maximum row length" /> - <!-- TODO: Bring back optional markers - <conditional name="ladder"> - <param name="draw" type="boolean" checked="false" falsevalue="false" truevalue="true" label="Include marker ladder across X-axis"/> - <when value="true"> - <param name="spacing" type="integer" value="200" label="Draw marker every N base pairs" /> - </when> - <when value="false"> - </when> - </conditional> --> </inputs> <outputs> <data format="tabular" name="output" /> @@ -60,8 +49,32 @@ </tests>--> <help> + +This tool aligns sequencing data into a rectangular matrix for creating a heatmap with matrix2png. Data from each interval is lined up on the specified alignment point (column 5 in the Bed file), and flipped if on the - strand so that all intervals are 5'-to-3' from left-to-right. + +Intervals with alignment points must be provided in the following extended Bed format :: + + chr low high id alignment strand + +The heatmap is created by taking each interval in the **List of Intervals**, retrieving the data for that interval from the Wig file, and adding it as a new row in the matrix. Intervals are processed in their original order. + +----- + +**Syntax** + +- **Sequencing data** is the genomic data used to create the matrix +- **List of intervals** is a list of intervals in Bed format with alignment points +- **Maximum row length** is the maximum allowed width of the matrix. If aligned intervals extend outside of this width, they will be truncated. + +----- + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + .. class:: warningmark -Large heatmap matrices may require a long time to generate. To reduce the size of an MxN matrix with large M, rows (N) can be truncated using the maximum row length parameter. Rows are truncated from the alignment point (symmetrically) if possible, or as nearly symmetrically as possible. +Large heatmap matrices may require a long time to generate in Galaxy because it validates that the output is in correct tab-delimited format. To reduce the size of an MxN matrix with large M, rows (N) can be truncated using the maximum row length parameter. Rows are truncated from the alignment point (symmetrically) if possible, or as nearly symmetrically as possible. + </help> </tool>
--- a/galaxy-conf/MovingAverageSmooth.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/MovingAverageSmooth.xml Wed Apr 25 16:53:48 2012 -0400 @@ -54,5 +54,18 @@ </tests> <help> + +This tool smooths genomic data with a mean_ filter of the specified width. + +.. _mean: http://en.wikipedia.org/wiki/Moving_average + +.. class:: warningmark + +Note that for the moving average to be perfectly symmetric, the window should be an odd number of base pairs. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/Multiply.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Multiply.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,5 +1,5 @@ <tool id="WigMultiply" name="Multiply" version="1.0.0"> - <description>multiple (Big)Wig files</description> + <description>(Big)Wig files</description> <command interpreter="sh"> galaxyToolRunner.sh wigmath.Multiply -o $output #for $input in $inputs @@ -16,5 +16,12 @@ </outputs> <help> + +This tool multiplies Wig or BigWig files base pair by base pair. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/NRLCalculator.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/NRLCalculator.xml Wed Apr 25 16:53:48 2012 -0400 @@ -10,8 +10,26 @@ </outputs> <help> + +This tool calculates the distance between adjacent nucleosome calls (dyads) emanating from the 5' end of an interval. For each interval, the distance is calculated from the +1 to +2 nucleosome, +2 to +3 nucleosome, etc. These distances are appended as additional columns for each interval. :: + + chr start stop id alignment strand +1-to-+2 +2-to-+3 ... + +Each interval will have a different number of columns based on the number of nucleosome calls that were in that interval. + +----- + .. class:: warningmark -This tool requires a set of nucleosome calls as input. +Because the distances are calculated from the 5' end of each gene, as you move into the gene body it becomes more likely that a nucleosome call will be skipped, resulting in an outlier distance (~320bp rather than ~165bp). In addition, nucleosome calls in fuzzy regions tend to be inaccurate, so the NRL distances will not be robust. + +.. class:: warningmark + +This tool requires a set of nucleosome calls as input in the format created by the Nucleosomes -> Call Nucleosomes tool. + +.. class:: warningmark + +Intervals must be provided in Bed, BedGraph, or GFF format. BedGraph intervals are always considered to be on the + strand. + </help> </tool>
--- a/galaxy-conf/PercusDecomposition.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/PercusDecomposition.xml Wed Apr 25 16:53:48 2012 -0400 @@ -10,8 +10,16 @@ </outputs> <help> -.. class:: warningmark + +This tool derives an external potential energy function from experimental nucleosome positioning data by assuming that nucleosomes interact with DNA like a fluid of hard rods. This energy function can then be used to derive sequence-specific nucleosome formation preferences, while accounting for hard-core steric restriction by adjacent nucleosomes. This tool is a reimplementation of the algorithm described in (Locke et al. 2010). + +----- -See Locke G, Tolkunov D, Moqtaderi Z, Struhl K and Morozov AV (2010) High-throughput sequencing reveals a simple model of nucleosome energetics. Proceedings of the National Academy of Sciences 107: 20998–21003 and Percus JK (1976) Equilibrium state of a classical fluid of hard rods in an external field. J Stat Phys 15: 505–511 for derivation. +**Citations** + +Locke G, Tolkunov D, Moqtaderi Z, Struhl K and Morozov AV (2010) High-throughput sequencing reveals a simple model of nucleosome energetics. Proceedings of the National Academy of Sciences 107: 20998–21003 + +Percus JK (1976) Equilibrium state of a classical fluid of hard rods in an external field. J Stat Phys 15: 505–511 + </help> </tool>
--- a/galaxy-conf/Phasogram.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Phasogram.xml Wed Apr 25 16:53:48 2012 -0400 @@ -10,8 +10,18 @@ </outputs> <help> - .. class:: warningmark - - This tool requires mapped dyads in BigWig format. + +This tool calculates the phase distribution of sequencing data. It can be used to identify genome-wide periodicities. Phase counts are aggregated for each base pair across the genome. The tool is a reimplementation of the algorithm described in (Valouev et al. 2011). + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + +----- + +**Citation** + +Valouev A, Johnson SM, Boyd SD, Smith CL, Fire AZ and Sidow A (2011) Determinants of nucleosome organization in primary human cells. Nature 474: 516–520 + </help> </tool>
--- a/galaxy-conf/PowerSpectrum.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/PowerSpectrum.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,17 +1,59 @@ <tool id="PowerSpectrum" name="Compute the power spectrum" version="1.0.0"> - <description>on data in a Wiggle file</description> - <command interpreter="sh">galaxyToolRunner.sh ngs.PowerSpectrum -i $input -l $windows -o $output</command> + <description>of data in a Wiggle file</description> + <command interpreter="sh">galaxyToolRunner.sh ngs.PowerSpectrum -i $input -l $windows -m $max -o $output</command> <inputs> <param format="bigwig,wig" name="input" type="data" label="Input data" /> <param format="bed,bedgraph,gff" name="windows" type="data" label="List of intervals" /> + <param name="max" type="integer" value="40" label="Number of frequencies to output" /> </inputs> <outputs> <data format="tabular" name="output" /> </outputs> <help> + +This tool computes the power spectrum of intervals of sequencing data. For each interval provided, the normalized power spectrum is calculated, representing the relative power in each frequency. Power spectra are normalized to have total power 1, with the DC component (0 frequency) removed. Power spectra are computed using the FFT_ implementation in JTransforms_. + +.. _FFT: http://en.wikipedia.org/wiki/Fast_Fourier_transform + +.. _JTransforms: http://sites.google.com/site/piotrwendykier/software/jtransforms + +----- + +**Syntax** + +- **Input data** is the genomic data on which to compute the power spectrum. +- **List of intervals:** The power spectrum will be computed for each genomic interval specified in this list. +- **Number of frequencies:** The power spectrum will be truncated at this frequency in the output + +----- + +**Output** + +The output has the following format :: + + chr start stop id alignment strand freq1 freq2 ... + +up to the maximum frequency specified. Frequencies are truncated to reduce the size of the output since signals are often band-limited. + +----- + .. class:: warningmark -This tool requires Wiggle/BigWig input data. +**NOTE:** Even though frequencies may be truncated in the output, all frequencies in the power spectrum are computed and used for normalization. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. Intervals must be provided in Bed, BedGraph, or GFF format. + +----- + +This tool is equivalent to the following Matlab commands, where x is a vector with the interval of sequencing data :: + + N = length(x); + f = fft(x); + p = abs(f(2:N/2)).^2; + p = p / sum(p); + </help> </tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/PredictFAIRESignal.xml Wed Apr 25 16:53:48 2012 -0400 @@ -0,0 +1,42 @@ +<tool id="PredictFAIRE" name="Predict FAIRE signal" version="1.0.0"> + <description>from nucleosome occupancy</description> + <command interpreter="sh">galaxyToolRunner.sh nucleosomes.PredictFAIRESignal -i $input -s $sonication -c $crosslinking -x $extend -o $output</command> + <inputs> + <param format="bigwig,wig" name="input" type="data" label="Nucleosome occupancy data" /> + <param format="tabular" name="sonication" type="data" label="Sonication fragment length distribution" /> + <param name="crosslinking" type="float" value="1.0" label="Crosslinking coefficient" /> + <param name="extend" type="integer" value="250" label="In silico read extension (bp)" /> + </inputs> + <outputs> + <data format="wig" name="output" metadata_source="input" /> + </outputs> + + <help> + +This tool attempts to predict FAIRE signal from nucleosome occupancy by calculating the probability that a random sonicated fragment is occupied anywhere by a nucleosome. + +----- + +**Syntax** + +- **Nucleosome occupancy data** should be fragment coverage data from an MNase-seq experiment +- **Sonication fragment length distribution:** The relative proportion of each size of fragment produced by sonication +- **Crosslinking coefficient** is the efficiency of crosslinking (what fraction of the time is a nucleosome crosslinked) +- **In silico read extension** is the length that single-end reads should be extended to match FAIRE-seq data + +----- + +Sonication fragment distribution must be provided in the following tabular format :: + + length proportion + +So for example :: + + 1 0.1 + 2 0.2 + 3 0.3 + 4 0.2 + 5 0.2 + + </help> +</tool>
--- a/galaxy-conf/RollingReadLength.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/RollingReadLength.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,5 +1,5 @@ -<tool id="RollingReadLength" name="Compute read length" version="1.0.0"> - <description>from paired-end sequencing reads</description> +<tool id="RollingReadLength" name="Compute mean fragment length" version="1.0.0"> + <description>over each locus</description> <command interpreter="sh">galaxyToolRunner.sh ngs.RollingReadLength -i $input -a ${chromInfo} -o $output</command> <inputs> <param format="sam,bam,bed,bedgraph" name="input" type="data" label="Mapped reads" /> @@ -9,8 +9,12 @@ </outputs> <help> + +This tool will compute the mean length of all fragments overlapping a given locus, and can be used to identify sites with exceptionally long or short reads. + .. class:: warningmark -This tool requires paired-end SAM, BAM, or Bed formatted data. Using single-end data will result in a constant read length. +This tool requires paired-end SAM, BAM, Bed, or BedGraph formatted data. Using single-end data will result in a constant read length. + </help> </tool>
--- a/galaxy-conf/RomanNumeralize.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/RomanNumeralize.xml Wed Apr 25 16:53:48 2012 -0400 @@ -15,8 +15,8 @@ </tests> <help> -.. class:: warningmark + +This tool scans any file with chromosomal coordinates of the form "chr5" and replaces them with "chrV". -This tool will work for any genomic data with chromosomal coordinates of the form "chr5" by replacing them with "chrV" </help> </tool>
--- a/galaxy-conf/Scale.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Scale.xml Wed Apr 25 16:53:48 2012 -0400 @@ -48,5 +48,12 @@ </tests> <help> + +This tool will multiply all values in a Wig file by a scalar. For example, this can be used to normalize to read depth by multiplying by 1/(# reads). By default, the tool will scale to 1/(mean value), which is equivalent to dividing by coverage and multiplying by the size of the genome. The resulting output file should have mean 1. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/Shift.xml Wed Apr 25 16:53:48 2012 -0400 @@ -0,0 +1,20 @@ +<tool id="WigShift" name="Mean shift" version="1.0.0"> + <description>a (Big)Wig file</description> + <command interpreter="sh">galaxyToolRunner.sh wigmath.Shift -i $input -m $M -o $output</command> + <inputs> + <param format="bigwig,wig" name="input" type="data" label="Shift the data in" /> + <param name="M" type="float" value="0" label="To have mean" /> + </inputs> + <outputs> + <data format="wig" name="output" metadata_source="input" /> + </outputs> + <help> + +This tool will shift all values in a Wig file by a scalar so that the output has the desired mean. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + + </help> +</tool>
--- a/galaxy-conf/StripMatrix.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/StripMatrix.xml Wed Apr 25 16:53:48 2012 -0400 @@ -15,8 +15,23 @@ </tests> <help> - .. class:: warningmark - This tool is intended to strip the column/row headers off of an aligned matrix (in matrix2png format) for easy import into Matlab if only data values are required. +This tool is intended to strip the column/row headers off of an aligned matrix (in matrix2png format) for easy import into Matlab or other software where only the data values are required. It removes the first row and first column from a tabular file. + +----- + +**Example** + +If the following tabular matrix is used as input :: + + ID col1 col2 col3 + row1 2 4 5 + row2 5 1 1 + +then the following tabular matrix will be produced as output :: + + 2 4 5 + 5 1 1 + </help> </tool>
--- a/galaxy-conf/Subtract.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/Subtract.xml Wed Apr 25 16:53:48 2012 -0400 @@ -36,5 +36,12 @@ </tests> <help> + +This tool will subtract the values in one Wig file from another, base pair by base pair. + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + </help> </tool>
--- a/galaxy-conf/ValueDistribution.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/ValueDistribution.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,6 +1,16 @@ <tool id="ValueDistribution" name="Compute the value distribution" version="1.0.0"> <description>of a (Big)Wig file</description> - <command interpreter="sh">galaxyToolRunner.sh wigmath.ValueDistribution -i $input --max $max --min $min -n $bins -o $output</command> + <command interpreter="sh">galaxyToolRunner.sh wigmath.ValueDistribution -i $input + #if str( $min ) != '' + --min $min + #end if + + #if str( $max ) != '' + --max $max + #end if + + -n $bins -o $output + </command> <inputs> <param format="bigwig,wig" name="input" type="data" label="(Big)Wig file" /> <param name="min" type="float" optional="true" label="Minimum bin value (optional)" /> @@ -12,5 +22,32 @@ </outputs> <help> + +This tool computes a histogram of the values in a Wig file, as well as the moments of the distribution. + +----- + +**Syntax** + +- **Input data** is the genomic data used to compute the histogram. +- **Minimum bin value** is the smallest bin. If unset, it is equal to the minimum value in the input data +- **Maximum bin value** is the largest bin. If unset, it is equal to the maximum value in the input data +- **Number of bins** is the number of bins to use. The bin size will be equal to (max - min) / (# bins). + +----- + +**Output** + +The output is in 2-column tabular format, where the first column represents the lower edge of a bin inteval and the second column represents the number of values that fell in that bin. For example if the **minimum bin value** is 0, the **maximum bin value** is 0.3, and the **number of bins** is 3, then the following output might be produced :: + + bin count + <0 3 + 0 1 + 0.1 10 + 0.2 4 + >0.3 12 + +where there were 3 values in (-inf, 0), 1 value in [0, 0.1), 10 values in [0.1, 0.2), 4 values in [0.2, 0.3), and 12 values in [0.3, inf). + </help> </tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/WigCorrelate.xml Wed Apr 25 16:53:48 2012 -0400 @@ -0,0 +1,48 @@ +<tool id="WigCorrelate" name="Correlate" version="1.0.0"> + <description>multiple (Big)Wig files</description> + <command interpreter="sh"> + galaxyToolRunner.sh wigmath.WigCorrelate -w $window -t $type -o $output + #for $input in $inputs + ${input.file} + #end for + </command> + <inputs> + <repeat name="inputs" title="(Big)Wig file"> + <param name="file" type="data" format="bigwig,wig" /> + </repeat> + <param name="window" type="integer" value="100" label="Window size (bp)" /> + <param name="type" type="select" label="Correlation metric"> + <option value="pearson">Pearson</option> + <option value="spearman">Spearman</option> + </param> + </inputs> + <outputs> + <data format="tabular" name="output" /> + </outputs> + +<help> + +This tool will compute a correlation matrix between the supplied Wig or BigWig files. Each row/column in the matrix is added in the order that files are added above, starting from the top left. The Wig file is downsampled into non-overlapping windows with the specified size by computing the mean value in each window. These windows are then correlated using either Pearson_'s Product-Moment correlation coefficient or Spearman_'s rank correlation coefficient. If the window size is set to 1, the correlation is calculated between all base pairs in the genome. + +.. _Pearson: http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient + +.. _Spearman: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient + +----- + +.. class:: warningmark + +**WARN:** In order to calculate the correlation coefficient, the data is loaded into entirely into memory. For large genomes, this may require a lot of RAM unless comparably larger window sizes are used. + +----- + +**Citation** + +This tool was inspired by ACT_ from the Gerstein lab. + +.. _ACT: http://act.gersteinlab.org + +J Jee*, J Rozowsky*, KY Yip*, L Lochovsky, R Bjornson, G Zhong, Z Zhang, Y Fu, J Wang, Z Weng, M Gerstein. ACT: Aggregation and Correlation Toolbox for Analyses of Genome Tracks. (2011) Bioinformatics 27(8): 1152-4. + +</help> +</tool>
--- a/galaxy-conf/WigSummary.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/WigSummary.xml Wed Apr 25 16:53:48 2012 -0400 @@ -1,5 +1,5 @@ -<tool id="WigStats" name="Compute basic statistics" version="1.0.0"> - <description>on a (Big)Wig file</description> +<tool id="WigStats" name="Output a summary" version="1.0.0"> + <description>of a (Big)Wig file</description> <command interpreter="sh">galaxyToolRunner.sh wigmath.WigSummary -i $input -o $output</command> <inputs> <param format="bigwig,wig" name="input" type="data" label="(Big)Wig file" /> @@ -9,5 +9,61 @@ </outputs> <help> + +This tool will output a summary of a Wig or BigWig file, including information about the chromosomes and types of contigs in the Wig file, as well as basic descriptive statistics. + +----- + +**Example:** + +The following is an example of the output of this tool :: + + ASCII Text Wiggle file: track type=wiggle_0 + Chromosomes: + 2micron start=1 stop=6318 + chrVI start=1 stop=270148 + chrI start=1 stop=230208 + chrIII start=1 stop=316617 + chrXII start=1 stop=1078175 + chrXV start=1 stop=1091289 + chrXVI start=1 stop=948062 + chrII start=1 stop=813178 + chrVIII start=1 stop=562643 + chrX start=1 stop=745742 + chrXIII start=1 stop=924429 + chrV start=1 stop=576869 + chrXIV start=1 stop=784333 + chrIV start=1 stop=1531919 + chrXI start=1 stop=666454 + chrIX start=1 stop=439885 + chrM start=1 stop=85779 + chrVII start=1 stop=1090947 + Contigs: + fixedStep chrom=2micron start=1 span=1 step=1 + fixedStep chrom=chrVI start=1 span=1 step=1 + fixedStep chrom=chrI start=1 span=1 step=1 + fixedStep chrom=chrIII start=1 span=1 step=1 + fixedStep chrom=chrXII start=1 span=1 step=1 + fixedStep chrom=chrXVI start=1 span=1 step=1 + fixedStep chrom=chrXV start=1 span=1 step=1 + fixedStep chrom=chrII start=1 span=1 step=1 + fixedStep chrom=chrVIII start=1 span=1 step=1 + fixedStep chrom=chrXIII start=1 span=1 step=1 + fixedStep chrom=chrX start=1 span=1 step=1 + fixedStep chrom=chrV start=1 span=1 step=1 + fixedStep chrom=chrXIV start=1 span=1 step=1 + fixedStep chrom=chrIV start=1 span=1 step=1 + fixedStep chrom=chrXI start=1 span=1 step=1 + fixedStep chrom=chrIX start=1 span=1 step=1 + fixedStep chrom=chrM start=1 span=1 step=1 + fixedStep chrom=chrVII start=1 span=1 step=1 + Basic Statistics: + Mean: 1.000000164913575 + Standard Deviation: 1.8843731523620193 + Total: 1.2162997005843896E7 + Bases Covered: 12162995 + Min value: 0.0 + Max value: 277.98996 + </help> </tool>
--- a/galaxy-conf/ZScore.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/ZScore.xml Wed Apr 25 16:53:48 2012 -0400 @@ -39,4 +39,18 @@ <output name="output" file="zscorer.output3"/> </test>--> </tests> + + <help> + +This tool will compute normal scores (Z-scores) for each of the values in a Wig file. For each base pair, the Z-scored value is equal to the deviance from the mean divided by the standard deviation (i.e. the number of standard deviations a value is away from the mean). The output file should have mean 0 and standard deviation 1. + +.. class:: infomark + +This tool is equivalent to using the **Mean Shift** tool to shift a Wig file to mean 0, then using the **Scale** tool to scale by 1/(standard deviation). + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. + + </help> </tool>
--- a/galaxy-conf/matrix2png.xml Mon Apr 09 11:50:23 2012 -0400 +++ b/galaxy-conf/matrix2png.xml Wed Apr 25 16:53:48 2012 -0400 @@ -95,16 +95,20 @@ </outputs> <help> - .. class:: warningmark - This tool requires that matrix2png be available in Galaxy's PATH. - - .. class:: warningmark - - This tool requires tabular data with column AND row headers. For more information about the required format and usage instructions, see http://bioinformatics.ubc.ca/matrix2png/dataformat.html - - .. class:: warningmark - - It is recommended to specify the colorspace range. +.. class:: warningmark + +This tool requires that matrix2png be installed and available in Galaxy's PATH. + +.. class:: warningmark + +This tool requires tabular data with column AND row headers. For more information about the required format and usage instructions, see the matrix2png_ website. + +.. _matrix2png: http://bioinformatics.ubc.ca/matrix2png/dataformat.html + +.. class:: warningmark + +It is recommended to specify the colorspace range since outliers will often skew it otherwise. + </help> </tool>
