# HG changeset patch # User timpalpant # Date 1337438205 14400 # Node ID eb53be9a09f4d538cdb5a3c684998396d5c9024c # Parent 81d5b81fb3c289f1cff9f441384aadcf83ed6344 Uploaded diff -r 81d5b81fb3c2 -r eb53be9a09f4 dist/java-genomics-toolkit.jar Binary file dist/java-genomics-toolkit.jar has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._BaseAlignCounts.xml Binary file galaxy-conf/._BaseAlignCounts.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._DNAPropertyCalculator.xml Binary file galaxy-conf/._DNAPropertyCalculator.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._GeneTrackToBedGraph.xml Binary file galaxy-conf/._GeneTrackToBedGraph.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._GeneTrackToWig.xml Binary file galaxy-conf/._GeneTrackToWig.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._IntervalAverager.xml Binary file galaxy-conf/._IntervalAverager.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._LogTransform.xml Binary file galaxy-conf/._LogTransform.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._Phasogram.xml Binary file galaxy-conf/._Phasogram.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._ReadLengthDistributionMatrix.xml Binary file galaxy-conf/._ReadLengthDistributionMatrix.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._StripMatrix.xml Binary file galaxy-conf/._StripMatrix.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._WigCorrelate.xml Binary file galaxy-conf/._WigCorrelate.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._ZScore.xml Binary file galaxy-conf/._ZScore.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._Zinba.xml Binary file galaxy-conf/._Zinba.xml has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/._galaxyToolRunner.sh Binary file galaxy-conf/._galaxyToolRunner.sh has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/BaseAlignCounts.xml --- a/galaxy-conf/BaseAlignCounts.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/BaseAlignCounts.xml Sat May 19 10:36:45 2012 -0400 @@ -3,12 +3,13 @@ galaxyToolRunner.sh ngs.BaseAlignCounts -i $input -a ${chromInfo} -x $X -o $output - + - + + This tool produces a new Wig file with the number of reads/intervals overlapping each base pair. Reads can be artificially extended to match known fragment lengths. If you wish to count the number of reads starting at each base pair, set the read extension to 1. If you wish to count the number of intervals overlapping each base pair, set the extension to -1. @@ -19,6 +20,10 @@ This tool requires sequencing reads in SAM, BAM, Bed, or BedGraph format. If you are artificially extending reads, ensure that the strand is set correctly in SAM, BAM, and Bed files. +.. class:: warningmark + +Paired-end reads are considered to be the entire fragment (the distance from the 5' end of mate 1 to the 5' end of mate 2) if the extension is set to -1. + .. class:: infomark If you would like to convert valued interval data (e.g. BedGraph files from microarrays) to Wig format, use the Converters -> Interval to Wig converter. diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/DNAPropertyCalculator.xml --- a/galaxy-conf/DNAPropertyCalculator.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/DNAPropertyCalculator.xml Sat May 19 10:36:45 2012 -0400 @@ -81,6 +81,21 @@ + + + + + + + + + + + + + + + This tool will create a new Wig file with genome-wide calculations of sequence-specific DNA properties determined from local n-nucleotide sequences. DNA properties are calculated using AJT_. diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/Downsample.xml --- a/galaxy-conf/Downsample.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/Downsample.xml Sat May 19 10:36:45 2012 -0400 @@ -6,8 +6,10 @@ - - + + + + @@ -16,13 +18,23 @@ +This tool can be used to reduce the resolution and file size of Wig files for easier upload to UCSC. Data is downsampled in non-overlapping windows starting from the beginning of each chromosome. Each window can be downsampled as the mean, minimum, maximum, total, or coverage of the original data. + +----- + +**Downsampling Methods** + +- **Mean:** the arithmetic mean of the values in the original data window +- **Minimum:** the least value in the original data window +- **Maximum:** the greatest value in the original data window +- **Coverage:** the fraction of bases with values in the original window +- **Total:** the sum of all values in the original data window + +----- + .. class:: infomark **TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use "edit attributes" to set the correct format if it was not detected correctly. ------ - -This tool can be used to reduce the resolution and file size of Wig files for easier upload to UCSC. Data is downsampled in non-overlapping moving windows starting from the beginning of each chromosome. Each window can be downsampled as the arithmetic mean, minimum, or maximum value of the original data. - diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/FastqIlluminaToSanger.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/FastqIlluminaToSanger.xml Sat May 19 10:36:45 2012 -0400 @@ -0,0 +1,25 @@ + + from Illumina to Sanger + galaxyToolRunner.sh converters.FastqIlluminaToSanger -i $input -o $output + + + + + + + + + + + + + + +This tool will convert a FASTQ file with ASCII quality scores encoded in Illumina 1.3 format (Phred+64) to Sanger format (Phred+33). It is a simpler, faster version of the FASTQ Groomer. + +.. class:: warningmark + +This tool requires fastqillumina formatted data. If you have tabular data that was not correctly autodetected, change the metadata by clicking on the pencil icon for the dataset. + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/FindNMers.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/FindNMers.xml Sat May 19 10:36:45 2012 -0400 @@ -0,0 +1,43 @@ + + in a DNA sequence + galaxyToolRunner.sh dna.FindNMers -i + #if $refGenomeSource.genomeSource == "history": + $refGenomeSource.ownFile + #else + ${refGenomeSource.index.fields.path} + #end if + -m $mismatches -n $nmer $rc -o $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +This tool will find all matches of a given NMer in a DNA sequence. + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/FindOutlierRegions.xml --- a/galaxy-conf/FindOutlierRegions.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/FindOutlierRegions.xml Sat May 19 10:36:45 2012 -0400 @@ -1,10 +1,11 @@ such as CNVs - galaxyToolRunner.sh ngs.FindOutlierRegions -i $input -w $window -t $threshold -o $output + galaxyToolRunner.sh ngs.FindOutlierRegions -i $input -w $window -t $threshold $below -o $output + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/GeneTrackToBedGraph.xml --- a/galaxy-conf/GeneTrackToBedGraph.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/GeneTrackToBedGraph.xml Sat May 19 10:36:45 2012 -0400 @@ -2,12 +2,17 @@ converter galaxyToolRunner.sh converters.GeneTrackToBedGraph -i $input -o $output - + - + + + + + + This tool will sum the counts from the forward and reverse strands in a GeneTrack_ index to create a BedGraph file. diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/GeneTrackToWig.xml --- a/galaxy-conf/GeneTrackToWig.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/GeneTrackToWig.xml Sat May 19 10:36:45 2012 -0400 @@ -2,14 +2,29 @@ converter galaxyToolRunner.sh converters.GeneTrackToWig -i $input -s $shift $zero -a ${chromInfo} -o $output - + - + + + + + + + + + + + + + + + + This tool will convert GeneTrack_ format files into Wig files, optionally offsetting the + and - strand counts by a specified value before merging them. diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/InterpolateDiscontinuousData.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/InterpolateDiscontinuousData.xml Sat May 19 10:36:45 2012 -0400 @@ -0,0 +1,40 @@ + + missing values in a (Big)Wig file + galaxyToolRunner.sh converters.InterpolateDiscontinousData -i $input -t $type -m $max -o $output + + + + + + + + + + + + + + + +This tool will attempt to interpolate missing values (NaN) in a Wig file that result when converting discontinuous microarray probe data to Wig format. Stretches of missing data that extend longer than the allowed maximum will be left as NaN. + +----- + +**Interpolation types** + +- **Nearest** uses the value of the nearest base pair that has data +- **Linear** uses a linear interpolant between the values of the nearest two probes +- **Cubic** uses a cubic interpolant between the values of the nearest two probes + +For more information, see Wikipedia_. + +.. _Wikipedia: http://en.wikipedia.org/wiki/Interpolation + +----- + +.. class:: infomark + +**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in Wig or BigWig format. Use the Converters -> IntervalToWig tool to convert Bed, BedGraph, or GFF-formatted microarray data to Wig format, then use this tool to interpolate the missing values between probes. + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/IntervalAverager.xml --- a/galaxy-conf/IntervalAverager.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/IntervalAverager.xml Sat May 19 10:36:45 2012 -0400 @@ -1,8 +1,16 @@ that have been aligned - galaxyToolRunner.sh visualization.IntervalAverager -i $input -l $loci -o $output + + galaxyToolRunner.sh visualization.IntervalAverager -l $loci -o $output $file1 + #for $input in $inputs + ${input.file} + #end for + - + + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/IntervalLengthDistribution.xml --- a/galaxy-conf/IntervalLengthDistribution.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/IntervalLengthDistribution.xml Sat May 19 10:36:45 2012 -0400 @@ -1,8 +1,9 @@ of read lengths - galaxyToolRunner.sh ngs.IntervalLengthDistribution -i $input -o $output + galaxyToolRunner.sh ngs.IntervalLengthDistribution -i $input $freq -o $output + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/IntervalToWig.xml --- a/galaxy-conf/IntervalToWig.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/IntervalToWig.xml Sat May 19 10:36:45 2012 -0400 @@ -2,7 +2,7 @@ converter galaxyToolRunner.sh converters.IntervalToWig -i $input $zero -a ${chromInfo} -o $output - + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/LogTransform.xml --- a/galaxy-conf/LogTransform.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/LogTransform.xml Sat May 19 10:36:45 2012 -0400 @@ -9,42 +9,36 @@ - + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/Phasogram.xml --- a/galaxy-conf/Phasogram.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/Phasogram.xml Sat May 19 10:36:45 2012 -0400 @@ -11,7 +11,7 @@ -This tool calculates the phase distribution of sequencing data. It can be used to identify genome-wide periodicities. Phase counts are aggregated for each base pair across the genome. The tool is a reimplementation of the algorithm described in (Valouev et al. 2011). +This tool calculates the phase distribution of sequencing data. It can be used to identify genome-wide periodicities. Phase counts are aggregated for each base pair across the genome. This is equivalent to summing the autocovariance of a sliding window across the genome. The tool is a reimplementation of the algorithm described in (Valouev et al. 2011). .. class:: infomark diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/ReadLengthDistributionMatrix.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/ReadLengthDistributionMatrix.xml Sat May 19 10:36:45 2012 -0400 @@ -0,0 +1,63 @@ + + across a genomic interval + galaxyToolRunner.sh ngs.ReadLengthDistributionMatrix -i $input --chr $chr --start $start --stop $stop --min $min --max $max --bin $bin -o $output + + + + + + + + + + + + + + + +This tool will create a matrix (in matrix2png_ format) with the distribution of read lengths over each base pair. Reads are binned by genomic location and length to create a matrix where each column represents the distribution of read lengths over that base pair. The resulting matrix can be turned into heatmap using the Visualization -> Make heatmap with matrix2png tool. + +.. _matrix2png: http://bioinformatics.ubc.ca/matrix2png/dataformat.html + +.. class:: warningmark + +This tool requires paired-end SAM, BAM, Bed, or BedGraph formatted data. Using single-end data will result in a constant read length. + +----- + +**Syntax** + +- **Mapped reads** are the mapped paired-end reads used to make the histograms +- **Chromosome** a locus in the genome +- **Start base pair** a locus in the genome +- **Stop base pair** a locus in the genome +- **Minimum fragment length** is the lowest fragment length bin. Reads shorter than this will be ignored. +- **Maximum fragment length** is the highest fragment length bin. Reads longer than this will be ignored. +- **Fragment length bin size** is the bin size used when making the fragment length histograms + +----- + +**Example** + +Make a matrix with the read length distribution across the region chrI:5001-6000, looking at reads 100-200bp in length in bins of 1bp: + +- **Chromosome:** chrI +- **Start:** 5001 +- **Stop:** 6000 +- **Minimum fragment length:** 100 +- **Maximum fragment length:** 200 +- **Fragment length bin size:** 1 + +The resulting matrix will be 1000x101, with each column representing a base pair and each row representing a read length. The column headers give the base pair and the row headers give the read length. + +----- + +**Citation** + +This tool was inspired by the analysis and figures in + +Floer M, Wang X, Prabhu V, Berrozpe G, Narayan S, Spagna D, Alvarez D, Kendall J, Krasnitz A, Stepansky A, Hicks J, Bryant GO and Ptashne M (2010) A RSC/nucleosome complex determines chromatin architecture and facilitates activator binding. Cell 141: 407–418 + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/StripMatrix.xml --- a/galaxy-conf/StripMatrix.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/StripMatrix.xml Sat May 19 10:36:45 2012 -0400 @@ -9,8 +9,8 @@ - - + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/WaveletTransform.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/WaveletTransform.xml Sat May 19 10:36:45 2012 -0400 @@ -0,0 +1,21 @@ + + across a genomic interval + galaxyToolRunner.sh ngs.WaveletTransform -i $input -w $wavelet --chr $chr --start $start --stop $stop --min $min --max $max --step $N -o $output + + + + + + + + + + + + + + + + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/WigCorrelate.xml --- a/galaxy-conf/WigCorrelate.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/WigCorrelate.xml Sat May 19 10:36:45 2012 -0400 @@ -1,7 +1,7 @@ multiple (Big)Wig files - galaxyToolRunner.sh wigmath.WigCorrelate -w $window -t $type -o $output + galaxyToolRunner.sh wigmath.WigCorrelate -w $window -s $step -t $type -o $output #for $input in $inputs ${input.file} #end for @@ -11,6 +11,7 @@ + @@ -22,7 +23,7 @@ -This tool will compute a correlation matrix between the supplied Wig or BigWig files. Each row/column in the matrix is added in the order that files are added above, starting from the top left. The Wig file is downsampled into non-overlapping windows with the specified size by computing the mean value in each window. These windows are then correlated using either Pearson_'s Product-Moment correlation coefficient or Spearman_'s rank correlation coefficient. If the window size is set to 1, the correlation is calculated between all base pairs in the genome. +This tool will compute a correlation matrix between the supplied Wig or BigWig files. Each row/column in the matrix is added in the order that files are added above, starting from the top left. The Wig file is downsampled into sliding windows with the specified bin size and shift by computing the mean value in each window. These windows are then correlated using either Pearson_'s Product-Moment correlation coefficient or Spearman_'s rank correlation coefficient. If the window size is set to 1, the correlation is calculated between all base pairs in the genome. .. _Pearson: http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient @@ -30,6 +31,15 @@ ----- +**Syntax** + +- **Inputs** are the genomic data to correlate +- **Window size** is the size of the window to bin data into +- **Sliding step size** is the shift step size of the sliding window used during binning +- **Correlation metric** is the type of correlation to calculate + +----- + .. class:: warningmark **WARN:** In order to calculate the correlation coefficient, the data is loaded into entirely into memory. For large genomes, this may require a lot of RAM unless comparably larger window sizes are used. diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/WigSummary.xml --- a/galaxy-conf/WigSummary.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/WigSummary.xml Sat May 19 10:36:45 2012 -0400 @@ -7,6 +7,20 @@ + + + + + + + + + + + + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/ZScore.xml --- a/galaxy-conf/ZScore.xml Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/ZScore.xml Sat May 19 10:36:45 2012 -0400 @@ -8,36 +8,30 @@ - + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/Zinba.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy-conf/Zinba.xml Sat May 19 10:36:45 2012 -0400 @@ -0,0 +1,16 @@ + + with ZINBA + runZINBA.sh -i $input -o $output + + + + + + + + + + + + + diff -r 81d5b81fb3c2 -r eb53be9a09f4 galaxy-conf/galaxyToolRunner.sh --- a/galaxy-conf/galaxyToolRunner.sh Wed Apr 25 16:53:48 2012 -0400 +++ b/galaxy-conf/galaxyToolRunner.sh Sat May 19 10:36:45 2012 -0400 @@ -6,9 +6,12 @@ exit; fi -if [ "$1" = "list" ] -then - find src/edu/unc/genomics/**/*.java -exec basename -s .java {} \; +# Verify that the user has Java 7 installed +# Otherwise there will be an obscure UnsupportedClassVersion error +version=$(java -version 2>&1 | awk -F '"' '/version/ {print $2}') +if [[ "$version" < "1.7" ]]; then + echo "Need Java 7 or greater. You have Java $version installed." + exit fi DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" diff -r 81d5b81fb3c2 -r eb53be9a09f4 lib/java-genomics-io.jar Binary file lib/java-genomics-io.jar has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 lib/picard-1.67.jar Binary file lib/picard-1.67.jar has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 lib/sam-1.56.jar Binary file lib/sam-1.56.jar has changed diff -r 81d5b81fb3c2 -r eb53be9a09f4 lib/sam-1.67.jar Binary file lib/sam-1.67.jar has changed