# HG changeset patch # User nml # Date 1450207182 18000 # Node ID 5402893569cb1a87f992511ac0667713debe5345 planemo upload commit 870da8582a7bc43817b1de0720397ae60a8efef6-dirty diff -r 000000000000 -r 5402893569cb spolpred.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/spolpred.sh Tue Dec 15 14:19:42 2015 -0500 @@ -0,0 +1,10 @@ +#/bin/bash + +name=$1 +shift + +spolpred $@ + +sed -i s/^.*\t/$name/ output.txt + +exit 0 diff -r 000000000000 -r 5402893569cb spolpred.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/spolpred.xml Tue Dec 15 14:19:42 2015 -0500 @@ -0,0 +1,148 @@ + + + with options and commands + + spolpred + + + +#set $output=$input_file.name + +spolpred.sh "$input_file.name" $input_file + +-l $read_length -b $type_reads -d $more_details -s $screening_options.stop_screening + +#if $screening_options.stop_screening == "on": + -a $screening_options.screening_threshold +#end if + +-m $matching_threshold + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + **Frequently Asked Questions** + + **SpolPred only accepts one FASTQ file, what if I have got paired-end reads?** + + Forward and reverse read files can be merged into one by making use of the Perl script + shuffleSequences_fastq.pl provided in Velvet software suite. SpolPred run will therefore take longer + than using only forward or reverse reads. In our dataset (read Methods for more details), the forward file + had enough reads to find all present spacers and infer the octal code for 49 out of 51 samples. That + decision will have to be made depending on the sample coverage depth. + + + + **What if I have a FASTA file?** + + SpolPred has been particularly designed to process raw reads and therefore only supports sequence + files in FASTQ format. + + + + **What is the point of stopping the read screening?** + + By default, all reads in the FASTQ file will be processed. Nevertheless, we have observed that a point is + reached when no more reads are needed to infer the octal code, in other words, the number of spacer + occurrences is high enough and steady to assume that all present spacers have already been found. + Therefore, stopping the program at this point would save time and computer resources. If low coverage + is the case, stopping the scanning is not advisable. + + + + **How do I choose the Screening threshold?** + + If you have decided to scan the whole input file there is no need to set such threshold. The Screening + threshold is used to let the program know when the screening should stop. Such value will depend on + read coverage. Running the software and looking at the number of times all spacers are detected will + provide insight into both the coverage and the most appropriate threshold value. + + + + **Why is a Matching threshold required? Are spacers not supposed to occur uniquely?** + + The number of times each spacer is found is tracked during the screening and absence assigned when + such number does not reach a user-defined threshold (4 times by default). This threshold, here called + Matching threshold, has had to be implemented because for some absent spacers, a few spurious + matches were found. Those false positives are likely to be related with bad-quality issues, like + sequencing errors. In our data set, no more than 3 false matches were detected for absent spacers, in + contrast to 50-150 found per present spacer. + + + + **Should I be worried then about false positive matches?** + + As long as proper pre-filtering steps are carried out to the raw reads, no important issues are expected + to come up. + + + + **Can I change the number of allowed SNPs when querying the spacers?** + + This option has not been implemented. Spacer sequences are conserved and only one SNP has been + reported to occur at the most. + + + + **Why are exact matches output as well?** + + The number of read-spacer exact matches, i.e. without allowing SNPs, will enable the easily + identification of SNPs on spacer sequences. When inferring the octal code, exact matches are not + employed. + + Wrapper Author: Mark Iskander + + + 10.1093/bioinformatics/bts544 + + diff -r 000000000000 -r 5402893569cb tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Tue Dec 15 14:19:42 2015 -0500 @@ -0,0 +1,6 @@ + + + + + +