pal_finder: pal_finder_wrapper.xml comparison

comparison pal_finder_wrapper.xml @ 8:4e625d3672ba draft

Pal_finder tool version 0.02.04.7: add detection/reporting of bad ranges; enable subset of reads to be used; check n-mers.

author	pjbriggs
date	Wed, 16 May 2018 07:39:16 -0400
parents	5e133b7b79a6
children	52dbe2089d14

comparison

equal deleted inserted replaced

-:5e133b7b79a6
+:4e625d3672ba
-<tool id="microsat_pal_finder" name="pal_finder" version="0.02.04.6">
+<tool id="microsat_pal_finder" name="pal_finder" version="0.02.04.7">
 <description>Find microsatellite repeat elements from sequencing reads and design PCR primers to amplify them</description>
 <macros>
 <import>pal_finder_macros.xml</import>
 </macros>
 <requirements>
 <requirement type="package" version="0.02.04">pal_finder</requirement>
 <requirement type="package" version="2.7">python</requirement>
 <requirement type="package" version="1.65">biopython</requirement>
 <requirement type="package" version="2.8.1">pandaseq</requirement>
 </requirements>
-<command><![CDATA[
+<command detect_errors="exit_code"><![CDATA[
 @CONDA_PAL_FINDER_SCRIPT_DIR@ &&
 @CONDA_PAL_FINDER_DATA_DIR@ &&
 bash $__tool_directory__/pal_finder_wrapper.sh
 #if str( $platform.platform_type ) == "illumina"
 #set $paired_input_type = $platform.paired_input_type_conditional.paired_input_type
 #end if
 #else
 --454 "$platform.input_fasta"
 #end if
 $output_microsat_summary $output_pal_summary
+#if $report_bad_primer_ranges
+--bad_primer_ranges "$output_bad_primer_read_ids"
+#end if
 #if $keep_config_file
 --output_config_file "$output_config_file"
 #end if
 --primer-prefix "$primer_prefix"
 --2merMinReps $min_2mer_repeats
 --filter_microsats "$output_filtered_microsats"
 #end for
 #end if
 #if str( $platform.assembly ) == '-assembly'
 $platform.assembly "$output_assembly"
+#end if
+#set $use_all_reads = $platform.subset_conditional.use_all_reads
+#if str( $use_all_reads ) != "yes"
+--subset "$platform.subset_conditional.subset"
 #end if
 #end if
 ]]></command>
 <inputs>
 <param name="primer_prefix" type="text" value="test" size="25" label="Primer prefix" help="This prefix will be added to the beginning of all primer names" />
 	    <param name="input_fastq_pair" format="fastqsanger"
 		   type="data_collection" collection_type="paired"
 		   label="Select FASTQ dataset collection with R1/R2 pair" />
 	  </when>
 	</conditional>
+	<conditional name="subset_conditional">
+	  <param name="use_all_reads" type="boolean" label="Use all reads for microsatellite detection?" checked="True" truevalue="yes" falsevalue="no" />
+	  <when value="no">
+	    <param name="subset" type="text" value="0.5" label="Number or fraction of reads to use" help="Either an integer number of reads or a decimal fraction (e.g. 0.5 to select 50% of reads)" />
+	  </when>
+	  <when value="yes" />
+	</conditional>
 	<param name="filters" type="select" display="checkboxes"
 	       multiple="True" label="Filters to apply to the pal_finder results"
 	       help="Apply none, one or more filters to refine results">
 <option value="-primers" selected="True">Only include loci with designed primers</option>
 <option value="-occurrences" selected="True">Exclude loci where the primer sequences occur more than once in the reads</option>
 </when>
 <when value="454">
 	<param name="input_fasta" type="data" format="fasta" label="454 fasta file with raw reads" />
 </when>
 </conditional>
-<param name="min_2mer_repeats" type="integer" value="6" label="Minimum number of 2-mer repeat units to detect" help="Set to zero to ignore repeats of this n-mer unit" />
+<param name="min_2mer_repeats" type="integer" value="6" label="Minimum number of 2-mer repeat units to detect" min="1" help="Must detect at least one repeat of this n-mer unit" />
 <param name="min_3mer_repeats" type="integer" value="0" label="Minimum number of 3-mer repeat units" help="Set to zero to ignore repeats of this n-mer unit" />
 <param name="min_4mer_repeats" type="integer" value="0" label="Minimum number of 4-mer repeat units" help="Set to zero to ignore repeats of this n-mer unit" />
 <param name="min_5mer_repeats" type="integer" value="0" label="Minimum number of 5-mer repeat units" help="Set to zero to ignore repeats of this n-mer unit" />
 <param name="min_6mer_repeats" type="integer" value="0" label="Minimum number of 6-mer repeat units" help="Set to zero to ignore repeats of this n-mer unit" />
 <conditional name="mispriming">
 	       help="Temperature should be in degrees Celsius" />
 	<param name="primer_pair_max_diff_tm" type="float" value="2.0"
 	       label="Maximum acceptable difference between melting temperatures of left and right primers (PRIMER_PAIR_MAX_DIFF_TM)"
 	       help="Temperature should be in degrees Celsius" />
 </when>
+<when value="default" />
 </conditional>
+<param name="report_bad_primer_ranges" type="boolean" truevalue="True" falsevalue="False" label="Output IDs for input reads which generate bad primer product size ranges" help="Can be used to screen reads in input Fastqs " />
 <param name="keep_config_file" type="boolean" truevalue="True" falsevalue="False"
 	   label="Output the config file to the history"
 	   help="Can be used to run pal_finder outside of Galaxy" />
 </inputs>
 <outputs>
 </data>
 <data name="output_microsat_summary" format="txt" label="${tool.name} on ${on_string} for ${primer_prefix}: summary of microsatellite types" />
 <data name="output_assembly" format="tabular" label="${tool.name} on ${on_string} for ${primer_prefix}: assembly">
 <filter>platform['assembly'] is True</filter>
 </data>
+<data name="output_bad_primer_read_ids" format="tabular" label="${tool.name} on ${on_string} for ${primer_prefix}: read IDs generating bad primer ranges">
+<filter>report_bad_primer_ranges is True</filter>
+</data>
 <data name="output_config_file" format="txt" label="${tool.name} on ${on_string} for ${primer_prefix}: config file">
 <filter>keep_config_file is True</filter>
 </data>
 </outputs>
 <tests>
 <param name="input_fastq_r2" value="illuminaPE_r2.fq" ftype="fastqsanger" />
 <expand macro="output_illumina_microsat_summary" />
 <output name="output_pal_summary" compare="re_match" file="illuminaPE_microsats.out.re_match" />
 <output name="output_filtered_microsats" compare="re_match" file="illuminaPE_filtered_microsats_rankmotifs.out.re_match" />
 </test>
+<!-- Test with Illumina input using subset of reads -->
+<test>
+<param name="platform_type" value="illumina" />
+<param name="filters" value="" />
+<param name="assembly" value="false" />
+<param name="use_all_reads" value="no" />
+<param name="subset" value="0.5" />
+<param name="input_fastq_r1" value="illuminaPE_r1.fq" ftype="fastqsanger" />
+<param name="input_fastq_r2" value="illuminaPE_r2.fq" ftype="fastqsanger" />
+<expand macro="output_illumina_microsat_subset_summary" />
+<output name="output_pal_summary" compare="re_match" file="illuminaPE_microsats_subset.out.re_match" />
+</test>
+<!-- Test with Illumina input filter that doesn't find any
+	 microsatellites -->
+<test expect_failure="true">
+<param name="platform_type" value="illumina" />
+<param name="filters" value="" />
+<param name="assembly" value="false" />
+<param name="min_2mer_repeats" value="8" />
+<param name="input_fastq_r1" value="illuminaPE_r1_no_microsats.fq" ftype="fastqsanger" />
+<param name="input_fastq_r2" value="illuminaPE_r2_no_microsats.fq" ftype="fastqsanger" />
+<assert_stderr>
+	<has_text text="pal_finder failed to locate any microsatellites" />
+</assert_stderr>
+</test>
+<!-- Test with Illumina input generating bad ranges -->
+<test>
+<param name="platform_type" value="illumina" />
+<param name="filters" value="" />
+<param name="assembly" value="false" />
+<param name="min_2mer_repeats" value="8" />
+<param name="input_fastq_r1" value="illuminaPE_r1_bad_ranges.fq" ftype="fastqsanger" />
+<param name="input_fastq_r2" value="illuminaPE_r2_bad_ranges.fq" ftype="fastqsanger" />
+<param name="min_2mer_repeats" value="8" />
+<param name="min_3mer_repeats" value="8" />
+<param name="min_4mer_repeats" value="8" />
+<param name="min_5mer_repeats" value="8" />
+<param name="min_6mer_repeats" value="8" />
+<param name="primer_options" value="custom" />
+<param name="primer_opt_size" value="25" />
+<param name="primer_min_size" value="21" />
+<param name="primer_max_size" value="30" />
+<param name="primer_min_gc" value="40.0" />
+<param name="primer_max_gc" value="60.0" />
+<param name="primer_gc_clamp" value="3" />
+<param name="primer_max_end_gc" value="5" />
+<param name="primer_min_tm" value="60.0" />
+<param name="primer_max_tm" value="80.0" />
+<param name="primer_opt_tm" value="68.0" />
+<param name="primer_pair_max_diff_tm" value="3.0" />
+<param name="report_bad_primer_ranges" value="true" />
+<expand macro="output_illumina_microsat_summary_bad_ranges" />
+<output name="output_pal_summary" compare="re_match" file="illuminaPE_microsats_bad_ranges.out.re_match" />
+<output name="output_bad_primer_read_ids" file="illuminaPE_bad_primer_read_ids.out" />
+</test>
+<!-- Test with bad n-mers specified -->
+<test expect_failure="true">
+<param name="platform_type" value="illumina" />
+<param name="filters" value="" />
+<param name="assembly" value="false" />
+<param name="min_2mer_repeats" value="8" />
+<param name="min_3mer_repeats" value="8" />
+<param name="min_4mer_repeats" value="0" />
+<param name="min_5mer_repeats" value="8" />
+<param name="min_6mer_repeats" value="8" />
+<param name="input_fastq_r1" value="illuminaPE_r1_no_microsats.fq" ftype="fastqsanger" />
+<param name="input_fastq_r2" value="illuminaPE_r2_no_microsats.fq" ftype="fastqsanger" />
+<assert_stderr>
+	<has_text text="Minimum number of 4-mers cannot be zero if number of 5-mers is non-zero" />
+</assert_stderr>
+</test>
 <!-- Test with 454 input -->
 <test>
 <param name="platform_type" value="454" />
 <param name="input_fasta" value="454_in.fa" ftype="fasta" />
 <expand macro="output_454_microsat_summary" />
 present in high-quality assembly
 Pal_finder runs the primer3_core program; information on the settings used in
 primer3_core can be found in the Primer3 manual at
 http://primer3.sourceforge.net/primer3_manual.htm
+-------------
+.. class:: infomark
+**Known issues**
+.. class:: warning
+**Low number of reads used for microsatellite detection/bad primer product size ranges**
+For some datasets pal_finder may generate 'bad' product size ranges (where the
+lower limit exceeds the upper limit) for one or more reads, for input into
+primer3_core. In these cases primer3_core will terminate prematurely, which can
+result in a substantially lower number of reads being used for microsatellite
+detection and potentially sub-optimal primer design.
+The number of reads generating the bad size ranges are reported in the
+*Summary of microsat types* output dataset as 'readsWithBadRanges'. Ideally
+the reported value should be zero.
+The conditions which cause this issue within pal_finder are still unclear,
+however we believe it to be associated with short or low quality reads. If this
+problem affects your data then:
+* Ensure that the input data are sufficiently trimmed and filtered (using
+e.g. the Trimmomatic tool) before rerunning pal_finder.
+* A list of read IDs for which pal_finder generates bad product size ranges can
+be output by turning on *Output IDs for input reads which generate bad primer
+ranges*. This outputs an additional dataset with a list of read IDs which can
+be used to remove read pairs from the input Fastq files (using e.g. the *Filter
+sequences by ID* tool) before rerunning pal_finder.
+.. class:: warning
+**Pal_finder takes a long time to run for large input datasets**
+pal_finder was originally developed using MiSeq data, and is not optimised for
+working with the larger Fastqs that are output from other platforms such as
+HiSeq and NextSeq. As a consequence pal_finder may take a very long time to
+complete when operating on larger datasets.
+If this is a problem then the tool can be run using a subset of the input reads
+by unchecking the *Use all reads...* option and entering either an integer number
+of reads to use, or a decimal fraction (e.g. 0.5 will select 50% of the reads).
 -------------
 .. class:: infomark

Mercurial > repos > pjbriggs > pal_finder

comparison pal_finder_wrapper.xml @ 8:4e625d3672ba draft