Mercurial > repos > iuc > gatk2
changeset 4:f244b8209eb8 draft
bug fix release
author | iuc |
---|---|
date | Mon, 25 Aug 2014 17:43:11 -0400 |
parents | 2553f84b8174 |
children | 84584664264c |
files | base_recalibrator.xml depth_of_coverage.xml gatk2_annotations.txt.sample gatk2_macros.xml gatk2_wrapper.py haplotype_caller.xml indel_realigner.xml print_reads.xml readme.rst realigner_target_creator.xml reduce_reads.xml tool_dependencies.xml unified_genotyper.xml variant_annotator.xml variant_apply_recalibration.xml variant_combine.xml variant_eval.xml variant_filtration.xml variant_recalibrator.xml variant_select.xml variant_validate.xml |
diffstat | 21 files changed, 169 insertions(+), 157 deletions(-) [+] |
line wrap: on
line diff
--- a/base_recalibrator.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/base_recalibrator.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_base_recalibrator" name="Base Recalibrator" version="0.0.7"> +<tool id="gatk2_base_recalibrator" name="Base Recalibrator" version="@VERSION@.0"> <description>calculates covariates used to recalibrate base quality scores of reads</description> <expand macro="requirements" /> <macros> @@ -302,4 +302,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/depth_of_coverage.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/depth_of_coverage.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_depth_of_coverage" name="Depth of Coverage" version="0.0.7"> +<tool id="gatk2_depth_of_coverage" name="Depth of Coverage" version="@VERSION@.0"> <description>on BAM files</description> <expand macro="requirements" /> <macros> @@ -692,7 +692,7 @@ DepthOfCoverage processes a set of bam files to determine coverage at different levels of partitioning and aggregation. Coverage can be analyzed per locus, per interval, per gene, or in total; can be partitioned by sample, by read group, by technology, by center, or by library; and can be summarized by mean, median, quartiles, and/or percentage of bases covered to or beyond a threshold. Additionally, reads and bases can be filtered by mapping or base quality score. -For more information on the GATK Depth of Coverage, see this `tool specific page <http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_annotator_DepthOfCoverage.html>`_. +For more information on the GATK Depth of Coverage, see this `tool specific page <http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html>`_. To learn about best practices for variant detection using GATK, see this `overview <http://www.broadinstitute.org/gatk/guide/topic?name=best-practices>`_. @@ -738,4 +738,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/gatk2_annotations.txt.sample Wed Feb 19 04:39:38 2014 -0500 +++ b/gatk2_annotations.txt.sample Mon Aug 25 17:43:11 2014 -0400 @@ -1,30 +1,26 @@ #unique_id name gatk_value tools_valid_for -AlleleBalance AlleleBalance AlleleBalance UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -AlleleBalanceBySample AlleleBalanceBySample AlleleBalanceBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -BaseCounts BaseCounts BaseCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -BaseQualityRankSumTest BaseQualityRankSumTest BaseQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -ChromosomeCounts ChromosomeCounts ChromosomeCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -DepthOfCoverage DepthOfCoverage DepthOfCoverage UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -DepthPerAlleleBySample DepthPerAlleleBySample DepthPerAlleleBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -FisherStrand FisherStrand FisherStrand UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -GCContent GCContent GCContent UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -HaplotypeScore HaplotypeScore HaplotypeScore UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -HardyWeinberg HardyWeinberg HardyWeinberg UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -HomopolymerRun HomopolymerRun HomopolymerRun UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -InbreedingCoeff InbreedingCoeff InbreedingCoeff UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -IndelType IndelType IndelType UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -LowMQ LowMQ LowMQ UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MVLikelihoodRatio MVLikelihoodRatio MVLikelihoodRatio UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityRankSumTest MappingQualityRankSumTest MappingQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityZero MappingQualityZero MappingQualityZero UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityZeroBySample MappingQualityZeroBySample MappingQualityZeroBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -MappingQualityZeroFraction MappingQualityZeroFraction MappingQualityZeroFraction UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -NBaseCount NBaseCount NBaseCount UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -QualByDepth QualByDepth QualByDepth UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -RMSMappingQuality RMSMappingQuality RMSMappingQuality UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -ReadDepthAndAllelicFractionBySample ReadDepthAndAllelicFractionBySample ReadDepthAndAllelicFractionBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -ReadPosRankSumTest ReadPosRankSumTest ReadPosRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -SampleList SampleList SampleList UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -SnpEff SnpEff SnpEff VariantAnnotator,VariantRecalibrator -SpanningDeletions SpanningDeletions SpanningDeletions UnifiedGenotyper,VariantAnnotator,VariantRecalibrator -TechnologyComposition TechnologyComposition TechnologyComposition UnifiedGenotyper,VariantAnnotator,VariantRecalibrator +AlleleBalance AlleleBalance AlleleBalance UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +AlleleBalanceBySample AlleleBalanceBySample AlleleBalanceBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +BaseCounts BaseCounts BaseCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +BaseQualityRankSumTest BaseQualityRankSumTest BaseQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +ChromosomeCounts ChromosomeCounts ChromosomeCounts UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +Coverage Coverage Coverage UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +DepthPerAlleleBySample DepthPerAlleleBySample DepthPerAlleleBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +FisherStrand FisherStrand FisherStrand UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +GCContent GCContent GCContent UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HaplotypeScore HaplotypeScore HaplotypeScore UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HardyWeinberg HardyWeinberg HardyWeinberg UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +HomopolymerRun HomopolymerRun HomopolymerRun UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +InbreedingCoeff InbreedingCoeff InbreedingCoeff UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +LowMQ LowMQ LowMQ UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MVLikelihoodRatio MVLikelihoodRatio MVLikelihoodRatio VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityRankSumTest MappingQualityRankSumTest MappingQualityRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityZero MappingQualityZero MappingQualityZero UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +MappingQualityZeroBySample MappingQualityZeroBySample MappingQualityZeroBySample UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +NBaseCount NBaseCount NBaseCount UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +QualByDepth QualByDepth QualByDepth UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +RMSMappingQuality RMSMappingQuality RMSMappingQuality UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +ReadPosRankSumTest ReadPosRankSumTest ReadPosRankSumTest UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +SampleList SampleList SampleList UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller +SnpEff SnpEff SnpEff VariantAnnotator,VariantRecalibrator,HaplotypeCaller +SpanningDeletions SpanningDeletions SpanningDeletions UnifiedGenotyper,VariantAnnotator,VariantRecalibrator,HaplotypeCaller
--- a/gatk2_macros.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/gatk2_macros.xml Mon Aug 25 17:43:11 2014 -0400 @@ -3,13 +3,16 @@ <requirements> <requirement type="package">gatk2</requirement> <requirement type="package" version="0.1.19">samtools</requirement> + <requirement type="package" version="1.56.0">picard</requirement> <requirement type="set_environment">GATK2_PATH</requirement> <requirement type="set_environment">GATK2_SITE_OPTIONS</requirement> + <yield /> </requirements> </xml> <token name="@THREADS@"> --num_threads \${GALAXY_SLOTS:-4} </token> + <token name="@VERSION@">2.8</token> <token name="@JAR_PATH@"> java -jar "\$GATK2_PATH/GenomeAnalysisTK.jar" </token> @@ -54,7 +57,7 @@ #end for -p '--interval_set_rule "${gatk_param_type.interval_set_rule}"' - + -p '--interval_padding "${gatk_param_type.interval_padding}"' -p '--downsampling_type "${gatk_param_type.downsampling_type.downsampling_type_selector}"' #if str( $gatk_param_type.downsampling_type.downsampling_type_selector ) != "NONE": -p '--${gatk_param_type.downsampling_type.downsample_to_type.downsample_to_type_selector} "${gatk_param_type.downsampling_type.downsample_to_type.downsample_to_value}"' @@ -217,7 +220,9 @@ <option value="UNION" selected="True">UNION</option> <option value="INTERSECTION">INTERSECTION</option> </param> - + <param name="interval_padding" type="integer" value="0" min="0" label="Amount of padding (in bp) to add to each interval" + help="This is typically used to add padding around exons when analyzing exomes. (--interval_padding / -ip)"/> + <conditional name="downsampling_type"> <param name="downsampling_type_selector" type="select" label="Type of reads downsampling to employ at a given locus" help="-dt,--downsampling_type &lt;downsampling_type&gt;"> <option value="NONE" selected="True">NONE</option> @@ -295,7 +300,7 @@ <param name="fix_misencoded_quality_scores" type="boolean" truevalue="--fix_misencoded_quality_scores" falsevalue="" label="Fix mis-encoded base quality scores. Q0 == ASCII 33 according to the SAM specification, whereas Illumina encoding starts at Q64. The idea here is simple: we just iterate over all reads and subtract 31 from every quality score." checked="False" help="-fixMisencodedQuals / --fix_misencoded_quality_scores"/> </when> - </conditional> + </conditional> </xml> <xml name="analysis_type_conditional"> <conditional name="analysis_param_type"> @@ -341,4 +346,11 @@ If you use this tool in Galaxy, please cite Blankenberg D, et al. *In preparation.* </token> + <xml name="citations"> + <citations> + <citation type="doi">10.1038/ng.806</citation> + <citation type="doi">10.1101/gr.107524.110</citation> + <citation type="doi">10.1002/0471250953.bi1110s43</citation> + </citations> + </xml> </macros>
--- a/gatk2_wrapper.py Wed Feb 19 04:39:38 2014 -0500 +++ b/gatk2_wrapper.py Mon Aug 25 17:43:11 2014 -0400 @@ -7,7 +7,6 @@ import sys, optparse, os, tempfile, subprocess, shutil from binascii import unhexlify -from string import Template GALAXY_EXT_TO_GATK_EXT = { 'gatk_interval':'intervals', 'bam_index':'bam.bai', 'gatk_dbsnp':'dbSNP', 'picard_interval_list':'interval_list' } #items not listed here will use the galaxy extension as-is GALAXY_EXT_TO_GATK_FILE_TYPE = GALAXY_EXT_TO_GATK_EXT #for now, these are the same, but could be different if needed @@ -19,6 +18,7 @@ if tmp_dir and os.path.exists( tmp_dir ): shutil.rmtree( tmp_dir ) + def gatk_filename_from_galaxy( galaxy_filename, galaxy_ext, target_dir = None, prefix = None ): suffix = GALAXY_EXT_TO_GATK_EXT.get( galaxy_ext, galaxy_ext ) if prefix is None: @@ -29,36 +29,39 @@ os.symlink( galaxy_filename, gatk_filename ) return gatk_filename + def gatk_filetype_argument_substitution( argument, galaxy_ext ): return argument % dict( file_type = GALAXY_EXT_TO_GATK_FILE_TYPE.get( galaxy_ext, galaxy_ext ) ) + def open_file_from_option( filename, mode = 'rb' ): if filename: return open( filename, mode = mode ) return None + def html_report_from_directory( html_out, dir ): html_out.write( '<html>\n<head>\n<title>Galaxy - GATK Output</title>\n</head>\n<body>\n<p/>\n<ul>\n' ) for fname in sorted( os.listdir( dir ) ): html_out.write( '<li><a href="%s">%s</a></li>\n' % ( fname, fname ) ) html_out.write( '</ul>\n</body>\n</html>\n' ) -def index_bam_files( bam_filenames, tmp_dir ): + +def index_bam_files( bam_filenames ): for bam_filename in bam_filenames: bam_index_filename = "%s.bai" % bam_filename if not os.path.exists( bam_index_filename ): #need to index this bam file stderr_name = tempfile.NamedTemporaryFile( prefix = "bam_index_stderr" ).name command = 'samtools index %s %s' % ( bam_filename, bam_index_filename ) - proc = subprocess.Popen( args=command, shell=True, stderr=open( stderr_name, 'wb' ) ) - return_code = proc.wait() - if return_code: + try: + subprocess.check_call( args=command, shell=True, stderr=open( stderr_name, 'wb' ) ) + except: for line in open( stderr_name ): print >> sys.stderr, line - os.unlink( stderr_name ) #clean up - cleanup_before_exit( tmp_dir ) raise Exception( "Error indexing BAM file" ) - os.unlink( stderr_name ) #clean up + finally: + os.unlink( stderr_name ) def __main__(): #Parse Command Line @@ -74,8 +77,7 @@ parser.add_option( '-e', '--phone_home', dest='phone_home', action='store', type="string", default='STANDARD', help='What kind of GATK run report should we generate(NO_ET|STANDARD|STDOUT)' ) parser.add_option( '-K', '--gatk_key', dest='gatk_key', action='store', type="string", default=None, help='What kind of GATK run report should we generate(NO_ET|STANDARD|STDOUT)' ) (options, args) = parser.parse_args() - - tmp_dir = tempfile.mkdtemp( prefix='tmp-gatk-' ) + if options.pass_through_options: cmd = ' '.join( options.pass_through_options ) else: @@ -87,42 +89,50 @@ elif options.max_jvm_heap_fraction is not None: cmd = cmd.replace( 'java ', 'java -XX:DefaultMaxRAMFraction=%s -XX:+UseParallelGC ' % ( options.max_jvm_heap_fraction ), 1 ) bam_filenames = [] - if options.datasets: - for ( dataset_arg, filename, galaxy_ext, prefix ) in options.datasets: - gatk_filename = gatk_filename_from_galaxy( filename, galaxy_ext, target_dir = tmp_dir, prefix = prefix ) - if dataset_arg: - cmd = '%s %s "%s"' % ( cmd, gatk_filetype_argument_substitution( dataset_arg, galaxy_ext ), gatk_filename ) - if galaxy_ext == "bam": - bam_filenames.append( gatk_filename ) - index_bam_files( bam_filenames, tmp_dir ) - #set up stdout and stderr output options - stdout = open_file_from_option( options.stdout, mode = 'wb' ) - stderr = open_file_from_option( options.stderr, mode = 'wb' ) - #if no stderr file is specified, we'll use our own - if stderr is None: - stderr = tempfile.NamedTemporaryFile( prefix="gatk-stderr-", dir=tmp_dir ) - - proc = subprocess.Popen( args=cmd, stdout=stdout, stderr=stderr, shell=True, cwd=tmp_dir ) - return_code = proc.wait() - - if return_code: - stderr_target = sys.stderr - else: - stderr_target = sys.stdout - stderr.flush() - stderr.seek(0) - while True: - chunk = stderr.read( CHUNK_SIZE ) - if chunk: - stderr_target.write( chunk ) + tmp_dir = tempfile.mkdtemp( prefix='tmp-gatk-' ) + try: + if options.datasets: + for ( dataset_arg, filename, galaxy_ext, prefix ) in options.datasets: + gatk_filename = gatk_filename_from_galaxy( filename, galaxy_ext, target_dir = tmp_dir, prefix = prefix ) + if dataset_arg: + cmd = '%s %s "%s"' % ( cmd, gatk_filetype_argument_substitution( dataset_arg, galaxy_ext ), gatk_filename ) + if galaxy_ext == "bam": + bam_filenames.append( gatk_filename ) + if galaxy_ext == 'fasta': + subprocess.check_call( 'samtools faidx "%s"' % gatk_filename, shell=True ) + subprocess.check_call( 'java -jar %s R=%s O=%s QUIET=true' % ( os.path.join(os.environ['JAVA_JAR_PATH'], 'CreateSequenceDictionary.jar'), gatk_filename, os.path.splitext(gatk_filename)[0] + '.dict' ), shell=True ) + index_bam_files( bam_filenames ) + #set up stdout and stderr output options + stdout = open_file_from_option( options.stdout, mode = 'wb' ) + stderr = open_file_from_option( options.stderr, mode = 'wb' ) + #if no stderr file is specified, we'll use our own + if stderr is None: + stderr = tempfile.NamedTemporaryFile( prefix="gatk-stderr-", dir=tmp_dir ) + + proc = subprocess.Popen( args=cmd, stdout=stdout, stderr=stderr, shell=True, cwd=tmp_dir ) + return_code = proc.wait() + + if return_code: + stderr_target = sys.stderr else: - break - stderr.close() + stderr_target = sys.stdout + stderr.flush() + stderr.seek(0) + while True: + chunk = stderr.read( CHUNK_SIZE ) + if chunk: + stderr_target.write( chunk ) + else: + break + stderr.close() + finally: + cleanup_before_exit( tmp_dir ) + #generate html reports if options.html_report_from_directory: for ( html_filename, html_dir ) in options.html_report_from_directory: html_report_from_directory( open( html_filename, 'wb' ), html_dir ) - - cleanup_before_exit( tmp_dir ) + -if __name__=="__main__": __main__() +if __name__ == "__main__": + __main__()
--- a/haplotype_caller.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/haplotype_caller.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_haplotype_caller" name="Haplotype Caller" version="0.0.7"> +<tool id="gatk2_haplotype_caller" name="Haplotype Caller" version="@VERSION@.0"> <description>Call SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region</description> <expand macro="requirements" /> <macros> @@ -158,7 +158,7 @@ <!-- load the available annotations from an external configuration file, since additional ones can be added to local installs --> <options from_data_table="gatk2_annotations"> <filter type="multiple_splitter" column="tools_valid_for" separator=","/> - <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/> + <filter type="static_value" value="HaplotypeCaller" column="tools_valid_for"/> </options> </param> <repeat name="additional_annotations" title="Additional annotation" help="-A,--annotation &lt;annotation&gt;"> @@ -191,7 +191,7 @@ <!-- load the available annotations from an external configuration file, since additional ones can be added to local installs --> <options from_data_table="gatk2_annotations"> <filter type="multiple_splitter" column="tools_valid_for" separator=","/> - <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/> + <filter type="static_value" value="HaplotypeCaller" column="tools_valid_for"/> </options> </param> @@ -320,4 +320,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/indel_realigner.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/indel_realigner.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_indel_realigner" name="Indel Realigner" version="0.0.7"> +<tool id="gatk2_indel_realigner" name="Indel Realigner" version="@VERSION@.0"> <description>- perform local realignment</description> <expand macro="requirements" /> <macros> @@ -206,4 +206,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/print_reads.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/print_reads.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_print_reads" name="Print Reads" version="0.0.7"> +<tool id="gatk2_print_reads" name="Print Reads" version="@VERSION@.0"> <description>on BAM files</description> <expand macro="requirements" /> <macros> @@ -32,11 +32,6 @@ #include source=$standard_gatk_options# - #if str( $reference_source.reference_source_selector ) == "history": - -d "-R" "${reference_source.ref_file}" "${reference_source.ref_file.ext}" "gatk_input" - #end if - ##end standard gatk options - ##start analysis specific options #if $analysis_param_type.analysis_param_type_selector == "advanced": -p ' @@ -202,7 +197,7 @@ This walker is designed to work as the second pass in a two-pass processing step, doing a by-read traversal. For each base in each read this walker calculates various user-specified covariates (such as read group, reported quality score, cycle, and dinuc) Using these values as a key in a large hashmap the walker calculates an empirical base quality score and overwrites the quality score currently in the read. This walker then outputs a new bam file with these updated (recalibrated) reads. Note: This walker expects as input the recalibration table file generated previously by CovariateCounterWalker. Note: This walker is designed to be used in conjunction with CovariateCounterWalker. -For more information on base quality score recalibration using the GATK, see this `tool specific page <http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_PrintReads.html>`_. +For more information on base quality score recalibration using the GATK, see this `tool specific page <http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_readutils_PrintReads.html>`_. To learn about best practices for variant detection using GATK, see this `overview <http://www.broadinstitute.org/gatk/guide/topic?name=best-practices>`_. @@ -247,4 +242,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/readme.rst Wed Feb 19 04:39:38 2014 -0500 +++ b/readme.rst Mon Aug 25 17:43:11 2014 -0400 @@ -63,7 +63,8 @@ History ======= -v0.1 - Initial public release +* v0.1 - Initial public release +* v2.8.0 - Bugfix release, increase version number to reflect the underlying GATK version Licence (MIT)
--- a/realigner_target_creator.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/realigner_target_creator.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_realigner_target_creator" name="Realigner Target Creator" version="0.0.7"> +<tool id="gatk2_realigner_target_creator" name="Realigner Target Creator" version="@VERSION@.0"> <description>for use in local realignment</description> <expand macro="requirements" /> <macros> @@ -164,4 +164,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/reduce_reads.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/reduce_reads.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_reduce_reads" name="Reduce Reads" version="0.0.7"> +<tool id="gatk2_reduce_reads" name="Reduce Reads" version="@VERSION@.0"> <description>in BAM files</description> <expand macro="requirements" /> <macros> @@ -154,7 +154,7 @@ This walker will generated reduced versions of the BAM files that still follow the BAM spec and contain all the information necessary for the GSA variant calling pipeline. Some options allow you to tune in how much compression you want to achieve. The default values have been shown to reduce a typical whole exome BAM file 100x. The higher the coverage, the bigger the savings in file size and performance of the downstream tools. -For more information on using read based compression in the GATK, see this `tool specific page <http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_compression_reducereads_ReduceReads.html>`_. +.. For more information on using read based compression in the GATK, see this `tool specific page <http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_compression_reducereads_ReduceReads.html>`_. To learn about best practices for variant detection using GATK, see this `overview <http://www.broadinstitute.org/gatk/guide/topic?name=best-practices>`_. @@ -223,4 +223,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/tool_dependencies.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/tool_dependencies.xml Mon Aug 25 17:43:11 2014 -0400 @@ -15,6 +15,12 @@ </set_environment> <package name="samtools" version="0.1.19"> - <repository changeset_revision="00e17a794a2e" name="package_samtools_0_1_19" owner="iuc" toolshed="http://toolshed.g2.bx.psu.edu" /> + <repository changeset_revision="923adc89c666" name="package_samtools_0_1_19" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" /> + </package> + <package name="picard" version="1.56.0"> + <repository changeset_revision="61e41d21cb6f" name="package_picard_1_56_0" owner="devteam" toolshed="https://toolshed.g2.bx.psu.edu" /> + </package> + <package name="ggplot2" version="0.9.3"> + <repository changeset_revision="87a27fc31fb5" name="package_r_ggplot2_0_9_3" owner="iuc" toolshed="https://toolshed.g2.bx.psu.edu" /> </package> </tool_dependency>
--- a/unified_genotyper.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/unified_genotyper.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_unified_genotyper" name="Unified Genotyper" version="0.0.7"> +<tool id="gatk2_unified_genotyper" name="Unified Genotyper" version="@VERSION@.0"> <description>SNP and indel caller</description> <expand macro="requirements" /> <macros> @@ -72,7 +72,6 @@ --excludeAnnotation "${annotation}" #end for #end if - ${analysis_param_type.multiallelic} #if str( $analysis_param_type.sample_ploidy ) != '': --sample_ploidy "$analysis_param_type.sample_ploidy" #end if @@ -199,7 +198,7 @@ <filter type="static_value" value="UnifiedGenotyper" column="tools_valid_for"/> </options> </param> - <param name="multiallelic" type="boolean" truevalue="--multiallelic" falsevalue="" label="Allow the discovery of multiple alleles (SNPs only)" help="--multiallelic" /> + <param name="sample_ploidy" type="integer" value="2" label="Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy)" help="-ploidy,--sample_ploidy" /> </expand> </inputs> <outputs> @@ -294,4 +293,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_annotator.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_annotator.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_annotator" name="Variant Annotator" version="0.0.7"> +<tool id="gatk2_variant_annotator" name="Variant Annotator" version="@VERSION@.0"> <description></description> <expand macro="requirements" /> <macros> @@ -244,4 +244,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_apply_recalibration.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_apply_recalibration.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_apply_recalibration" name="Apply Variant Recalibration" version="0.0.7"> +<tool id="gatk2_variant_apply_recalibration" name="Apply Variant Recalibration" version="@VERSION@.0"> <description></description> <expand macro="requirements" /> <macros> @@ -135,4 +135,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_combine.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_combine.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_combine" name="Combine Variants" version="0.0.7"> +<tool id="gatk2_variant_combine" name="Combine Variants" version="@VERSION@.0"> <description></description> <expand macro="requirements" /> <macros> @@ -167,4 +167,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_eval.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_eval.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_eval" name="Eval Variants" version="0.0.7"> +<tool id="gatk2_variant_eval" name="Eval Variants" version="@VERSION@.0"> <description></description> <expand macro="requirements" /> <macros> @@ -80,10 +80,6 @@ --minPhaseQuality "${analysis_param_type.min_phase_quality}" - #if str( $analysis_param_type.family ): - --family_structure "${analysis_param_type.family}" - #end if - --mendelianViolationQualThreshold "${analysis_param_type.mendelian_violation_qual_threshold}" #if str( $analysis_param_type.ancestral_alignments ) != "None": @@ -165,9 +161,8 @@ </repeat> <param name="stratification_modules" type="select" multiple="True" display="checkboxes" label="Stratification modules to apply to the eval track(s)" help="-ST,--stratificationModule &lt;stratificationModule&gt;" > - <!-- do these need individual options also? gatk wiki has little info --> + <option value="AlleleCount" /> <option value="AlleleFrequency" /> - <option value="AlleleCount" /> <option value="CompRod" /> <option value="Contig" /> <option value="CpG" /> @@ -175,9 +170,15 @@ <option value="EvalRod" /> <option value="Filter" /> <option value="FunctionalClass" /> + <option value="IndelSize" /> + <option value="IntervalStratification" /> <option value="JexlExpression" /> + <option value="Novelty" /> + <option value="OneBPIndel" /> <option value="Sample" /> - <option value="IntervalStratification" /> + <option value="SnpEffPositionModifier" /> + <option value="TandemRepeat" /> + <option value="VariantType" /> </param> <param name="do_not_use_all_standard_stratifications" checked="false" type="boolean" truevalue="--doNotUseAllStandardStratifications" falsevalue="" label="Do not use the standard stratification modules by default" help="-noST,--doNotUseAllStandardStratifications" /> @@ -186,29 +187,22 @@ </repeat> <param name="eval_modules" type="select" multiple="True" display="checkboxes" label="Eval modules to apply to the eval track(s)" help="-EV,--evalModule &lt;evalModule&gt;" > - <!-- do these need individual options also? gatk wiki has little info --> - <option value="ACTransitionTable" /> - <option value="AlleleFrequencyComparison" /> - <option value="AminoAcidTransition" /> <option value="CompOverlap" /> <option value="CountVariants" /> - <option value="GenotypeConcordance" /> - <option value="GenotypePhasingEvaluator" /> - <option value="IndelMetricsByAC" /> - <option value="IndelStatistics" /> + <option value="IndelLengthHistogram" /> + <option value="IndelSummary" /> <option value="MendelianViolationEvaluator" /> + <option value="MultiallelicSummary" /> <option value="PrintMissingComp" /> - <option value="PrivatePermutations" /> - <option value="SimpleMetricsByAC" /> <option value="ThetaVariantEvaluator" /> <option value="TiTvVariantEvaluator" /> - <option value="VariantQualityScore" /> + <option value="ValidationReport" /> + <option value="VariantSummary" /> </param> <param name="do_not_use_all_standard_modules" checked="false" type="boolean" truevalue="--doNotUseAllStandardModules" falsevalue="" label="Do not use the standard eval modules by default" help="-noEV,--doNotUseAllStandardModules" /> <param name="num_samples" type="integer" label="Number of samples (used if no samples are available in the VCF file" value="0" help="-ns,--numSamples &lt;numSamples&gt;"/> <param name="min_phase_quality" type="float" label="Minimum phasing quality " value="10.0" help="-mpq,--minPhaseQuality &lt;minPhaseQuality&gt;"/> - <param name="family" type="text" value="" label="If provided, genotypes in will be examined for mendelian violations: this argument is a string formatted as dad+mom=child where these parameters determine which sample names are examined" help="--family_structure"/> <param name="mendelian_violation_qual_threshold" type="integer" label="Minimum genotype QUAL score for each trio member required to accept a site as a violation" value="50" help="-mvq,--mendelianViolationQualThreshold &lt;mendelianViolationQualThreshold&gt;"/> <param name="ancestral_alignments" type="data" format="fasta" optional="True" label="Fasta file with ancestral alleles" help="-aa,--ancestralAlignments &lt;ancestralAlignments&gt;" /> <param name="known_cnvs" type="data" format="bed,gatk_interval,picard_interval_list" optional="True" label="File containing tribble-readable features describing a known list of copy number variants" help="-knownCNVs,--knownCNVs &lt;knownCNVs&gt;" /> @@ -280,10 +274,10 @@ doNotUseAllStandardModules Do not use the standard modules by default (instead, only those that are specified with the -E option) numSamples Number of samples (used if no samples are available in the VCF file minPhaseQuality Minimum phasing quality - family_structure If provided, genotypes in will be examined for mendelian violations: this argument is a string formatted as dad+mom=child where these parameters determine which sample names are examined mendelianViolationQualThreshold Minimum genotype QUAL score for each trio member required to accept a site as a violation ancestralAlignments Fasta file with ancestral alleles @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_filtration.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_filtration.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_filtration" name="Variant Filtration" version="0.0.7"> +<tool id="gatk2_variant_filtration" name="Variant Filtration" version="@VERSION@.0"> <description>on VCF files</description> <expand macro="requirements" /> <macros> @@ -177,4 +177,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_recalibrator.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_recalibrator.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,6 +1,8 @@ -<tool id="gatk2_variant_recalibrator" name="Variant Recalibrator" version="0.0.7"> +<tool id="gatk2_variant_recalibrator" name="Variant Recalibrator" version="@VERSION@.0"> <description></description> - <expand macro="requirements" /> + <expand macro="requirements"> + <requirement type="package" version="0.9.3">ggplot</requirement> + </expand> <macros> <import>gatk2_macros.xml</import> </macros> @@ -63,15 +65,12 @@ --maxIterations "${analysis_param_type.max_iterations}" --numKMeans "${analysis_param_type.num_k_means}" --stdThreshold "${analysis_param_type.std_threshold}" - --qualThreshold "${analysis_param_type.qual_threshold}" --shrinkage "${analysis_param_type.shrinkage}" --dirichlet "${analysis_param_type.dirichlet}" --priorCounts "${analysis_param_type.prior_counts}" - #if str( $analysis_param_type.bad_variant_selector.bad_variant_selector_type ) == 'percent': - --percentBadVariants "${analysis_param_type.bad_variant_selector.percent_bad_variants}" - #else: - --minNumBadVariants "${analysis_param_type.bad_variant_selector.min_num_bad_variants}" - #end if + + --minNumBadVariants "${analysis_param_type.min_num_bad_variants}" + --target_titv "${analysis_param_type.target_titv}" #for $tranche in [ $tranche.strip() for $tranche in str( $analysis_param_type.ts_tranche ).split( ',' ) if $tranche.strip() ] --TStranche "${tranche}" @@ -83,7 +82,6 @@ #end if --ignore_filter "${ignore_filter_name}" #end for - --ts_filter_level "${analysis_param_type.ts_filter_level}" ' #end if @@ -100,7 +98,7 @@ <param name="input_variants" type="data" format="vcf" label="Variant file to recalibrate" /> </repeat> <param name="ref_file" type="select" label="Using reference genome" help="-R,--reference_sequence &lt;reference_sequence&gt;"> - <options from_data_table="gatk_picard_indexes"> + <options from_data_table="gatk2_picard_indexes"> <!-- <filter type="data_meta" key="dbkey" ref="variants[0].input_variants" column="dbkey"/> --> </options> <validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/> @@ -114,7 +112,7 @@ </when> </conditional> - <repeat name="rod_bind" title="Binding for reference-ordered data" help="-resource,--resource &lt;resource&gt;"> + <repeat name="rod_bind" title="Binding for reference-ordered data" help="-resource,--resource &lt;resource&gt;" min="2"> <conditional name="rod_bind_type"> <param name="rod_bind_type_selector" type="select" label="Binding Type"> <option value="dbsnp" selected="True">dbSNP</option> @@ -324,26 +322,16 @@ <expand macro="gatk_param_type_conditional" /> <expand macro="analysis_type_conditional"> - <param name="max_gaussians" type="integer" label="maximum number of Gaussians to try during variational Bayes Algorithm" value="10" help="-mG,--maxGaussians &lt;maxGaussians&gt;"/> - <param name="max_iterations" type="integer" label="maximum number of maximum number of VBEM iterations to be performed in variational Bayes Algorithm" value="100" help="-mI,--maxIterations &lt;maxIterations&gt;"/> - <param name="num_k_means" type="integer" label="number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model" value="30" help="-nKM,--numKMeans &lt;numKMeans&gt;"/> - <param name="std_threshold" type="float" label="If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model." value="8.0" help="-std,--stdThreshold &lt;stdThreshold&gt;"/> - <param name="qual_threshold" type="float" label="If a known variant has raw QUAL value less than -qual then don't use it for building the Gaussian mixture model." value="80.0" help="-qual,--qualThreshold &lt;qualThreshold&gt;"/> + <param name="max_gaussians" type="integer" label="maximum number of Gaussians to try during variational Bayes Algorithm" value="8" help="-mG,--maxGaussians &lt;maxGaussians&gt;"/> + <param name="max_iterations" type="integer" label="maximum number of maximum number of VBEM iterations to be performed in variational Bayes Algorithm" value="150" help="-mI,--maxIterations &lt;maxIterations&gt;"/> + <param name="num_k_means" type="integer" label="number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model" value="100" help="-nKM,--numKMeans &lt;numKMeans&gt;"/> + <param name="std_threshold" type="float" label="If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model." value="10.0" help="-std,--stdThreshold &lt;stdThreshold&gt;"/> <param name="shrinkage" type="float" label="shrinkage parameter in variational Bayes algorithm" value="1.0" help="-shrinkage,--shrinkage &lt;shrinkage&gt;"/> <param name="dirichlet" type="float" label="dirichlet parameter in variational Bayes algorithm" value="0.001" help="-dirichlet,--dirichlet &lt;dirichlet&gt;"/> <param name="prior_counts" type="float" label="number of prior counts to use in variational Bayes algorithm" value="20.0" help="-priorCounts,--priorCounts &lt;priorCounts&gt;"/> - <conditional name="bad_variant_selector"> - <param name="bad_variant_selector_type" type="select" label="How to specify bad variants"> - <option value="percent" selected="True">Percent</option> - <option value="min_num">Number</option> - </param> - <when value="percent"> - <param name="percent_bad_variants" type="float" label="percentage of the worst scoring variants to use when building the Gaussian mixture model of bad variants. 0.07 means bottom 7 percent." value="0.03" help="-percentBad,--percentBadVariants &lt;percentBadVariants&gt;"/> - </when> - <when value="min_num"> - <param name="min_num_bad_variants" type="integer" label="minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. Will override -percentBad arugment if necessary" value="2000" help="-minNumBad,--minNumBadVariants &lt;minNumBadVariants&gt;"/> - </when> - </conditional> + <!--<param name="trustAllPolymorphic" type="boolean" label="trustAllPolymorphic" truevalue="-/-trustAllPolymorphic=true" falsevalue="-/-trustAllPolymorphic=false" + help="Trust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation. -trustAllPolymorphic" />--> + <param name="min_num_bad_variants" type="integer" label="Minimum number of worst scoring variants to use when building the Gaussian mixture model of bad variants" value="1000" help="--minNumBadVariants &lt;minNumBadVariants&gt;"/> <param name="target_titv" type="float" label="expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES!" value="2.15" help="-titv,--target_titv &lt;target_titv&gt;"/> <param name="ts_tranche" type="text" label="levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data. (in percent, that is 1.0 for 1 percent)" value="100.0, 99.9, 99.0, 90.0" help="-tranche,--TStranche &lt;TStranche&gt;"/> <repeat name="ignore_filters" title="Ignore Filter" help="-ignoreFilter,--ignore_filter &lt;ignore_filter&gt;"> @@ -360,7 +348,6 @@ <when value="LowQual" /> </conditional> </repeat> - <param name="ts_filter_level" type="float" label="truth sensitivity level at which to start filtering, used here to indicate filtered variants in plots" value="99.0" help="-ts_filter_level,--ts_filter_level &lt;ts_filter_level&gt;"/> </expand> </inputs> <outputs> @@ -410,12 +397,10 @@ maxIterations The maximum number of VBEM iterations to be performed in variational Bayes algorithm. Procedure will normally end when convergence is detected. numKMeans The number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model. stdThreshold If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model. - qualThreshold If a known variant has raw QUAL value less than -qual then don't use it for building the Gaussian mixture model. shrinkage The shrinkage parameter in variational Bayes algorithm. dirichlet The dirichlet parameter in variational Bayes algorithm. priorCounts The number of prior counts to use in variational Bayes algorithm. - percentBadVariants What percentage of the worst scoring variants to use when building the Gaussian mixture model of bad variants. 0.07 means bottom 7 percent. - minNumBadVariants The minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. Will override -percentBad arugment if necessary. + minNumBadVariants The minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. recal_file The output recal file used by ApplyRecalibration target_titv The expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES! TStranche The levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data. (in percent, that is 1.0 for 1 percent) @@ -423,8 +408,8 @@ path_to_Rscript The path to your implementation of Rscript. For Broad users this is maybe /broad/tools/apps/R-2.6.0/bin/Rscript rscript_file The output rscript file generated by the VQSR to aid in visualization of the input data and learned model path_to_resources Path to resources folder holding the Sting R scripts. - ts_filter_level The truth sensitivity level at which to start filtering, used here to indicate filtered variants in plots @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_select.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_select.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_select" name="Select Variants" version="0.0.7"> +<tool id="gatk2_variant_select" name="Select Variants" version="@VERSION@.0"> <description>from VCF files</description> <expand macro="requirements" /> <macros> @@ -285,4 +285,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>
--- a/variant_validate.xml Wed Feb 19 04:39:38 2014 -0500 +++ b/variant_validate.xml Mon Aug 25 17:43:11 2014 -0400 @@ -1,4 +1,4 @@ -<tool id="gatk2_variant_validate" name="Validate Variants" version="0.0.7"> +<tool id="gatk2_variant_validate" name="Validate Variants" version="@VERSION@.0"> <description></description> <expand macro="requirements" /> <macros> @@ -101,4 +101,5 @@ @CITATION_SECTION@ </help> + <expand macro="citations" /> </tool>