comparison spolpred.xml @ 0:5402893569cb draft

planemo upload commit 870da8582a7bc43817b1de0720397ae60a8efef6-dirty
author nml
date Tue, 15 Dec 2015 14:19:42 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:5402893569cb
1 <?xml version="1.0"?>
2 <tool id="spolpred" name="SpolPred" version="1.0.0">
3 <description>with options and commands</description>
4 <requirements>
5 <requirement type="package" version="1.0.0">spolpred</requirement>
6 </requirements>
7 <command interpreter="bash">
8
9 #set $output=$input_file.name
10
11 spolpred.sh "$input_file.name" $input_file
12
13 -l $read_length -b $type_reads -d $more_details -s $screening_options.stop_screening
14
15 #if $screening_options.stop_screening == "on":
16 -a $screening_options.screening_threshold
17 #end if
18
19 -m $matching_threshold
20
21 </command>
22 <inputs>
23 <param name="input_file" type="data" format="fastqsanger" label="FASTQ input file"/>
24
25 <param name="read_length" type="integer" label="Read length [35, 1000]" value="75">
26 <validator type="in_range" min="35" max="1000" message="Must be between 35 and 1000 (inclusive)"/>
27 </param>
28
29 <param name="type_reads" type="select" label="Type of input reads">
30 <option value="d">Direct</option>
31 <option value="r">Reverse</option>
32 </param>
33
34 <param name="more_details" type="select" label="Level of processing output detail"
35 help="If set on, processing details are output to the job's STDOUT, including
36 number of processed reads and number of spacer sequences found">
37 <option value="on">High</option>
38 <option value="off">Normal</option>
39 </param>
40
41 <conditional name="screening_options">
42 <param name="stop_screening" type="select" label="Read screening"
43 help="Used to end read processing when Screening Threshold is reached">
44 <option value="on">Perform read screening</option>
45 <option value="off">Do not perform read screening</option>
46 </param>
47 <when value="on">
48 <param name="screening_threshold" type="integer" label="Screening threshold" value="50"
49 help="Average number of spacer occurrences used to stop screening">
50 <validator type="in_range" min="0" max="inf" message="Must be at least 0"/>
51 </param>
52 </when>
53 <when value="off"/>
54 </conditional>
55
56 <param name="matching_threshold" type="integer" label="Matching threshold" value="4"
57 help="Minimum number of spacer occurrences below which spacer absence is assigned">
58 <validator type="in_range" min="1" max="inf" message="Must be at least 1"/>
59 </param>
60
61 </inputs>
62 <outputs>
63 <data name="outfile" format="tabular" from_work_dir="output.txt"/>
64 </outputs>
65
66 <tests>
67 <test>
68 <param/>
69 <output/>
70 </test>
71 </tests>
72
73 <help>
74 **Frequently Asked Questions**
75
76 **SpolPred only accepts one FASTQ file, what if I have got paired-end reads?**
77
78 Forward and reverse read files can be merged into one by making use of the Perl script
79 shuffleSequences_fastq.pl provided in Velvet software suite. SpolPred run will therefore take longer
80 than using only forward or reverse reads. In our dataset (read Methods for more details), the forward file
81 had enough reads to find all present spacers and infer the octal code for 49 out of 51 samples. That
82 decision will have to be made depending on the sample coverage depth.
83
84
85
86 **What if I have a FASTA file?**
87
88 SpolPred has been particularly designed to process raw reads and therefore only supports sequence
89 files in FASTQ format.
90
91
92
93 **What is the point of stopping the read screening?**
94
95 By default, all reads in the FASTQ file will be processed. Nevertheless, we have observed that a point is
96 reached when no more reads are needed to infer the octal code, in other words, the number of spacer
97 occurrences is high enough and steady to assume that all present spacers have already been found.
98 Therefore, stopping the program at this point would save time and computer resources. If low coverage
99 is the case, stopping the scanning is not advisable.
100
101
102
103 **How do I choose the Screening threshold?**
104
105 If you have decided to scan the whole input file there is no need to set such threshold. The Screening
106 threshold is used to let the program know when the screening should stop. Such value will depend on
107 read coverage. Running the software and looking at the number of times all spacers are detected will
108 provide insight into both the coverage and the most appropriate threshold value.
109
110
111
112 **Why is a Matching threshold required? Are spacers not supposed to occur uniquely?**
113
114 The number of times each spacer is found is tracked during the screening and absence assigned when
115 such number does not reach a user-defined threshold (4 times by default). This threshold, here called
116 Matching threshold, has had to be implemented because for some absent spacers, a few spurious
117 matches were found. Those false positives are likely to be related with bad-quality issues, like
118 sequencing errors. In our data set, no more than 3 false matches were detected for absent spacers, in
119 contrast to 50-150 found per present spacer.
120
121
122
123 **Should I be worried then about false positive matches?**
124
125 As long as proper pre-filtering steps are carried out to the raw reads, no important issues are expected
126 to come up.
127
128
129
130 **Can I change the number of allowed SNPs when querying the spacers?**
131
132 This option has not been implemented. Spacer sequences are conserved and only one SNP has been
133 reported to occur at the most.
134
135
136
137 **Why are exact matches output as well?**
138
139 The number of read-spacer exact matches, i.e. without allowing SNPs, will enable the easily
140 identification of SNPs on spacer sequences. When inferring the octal code, exact matches are not
141 employed.
142
143 Wrapper Author: Mark Iskander
144 </help>
145 <citations>
146 <citation type="doi">10.1093/bioinformatics/bts544</citation>
147 </citations>
148 </tool>