# HG changeset patch # User czlab # Date 1526611185 14400 # Node ID f3128f4ffe34d8998390ccad521137903b7b8fd6 # Parent 621da360a15513c34d67bc225686cc817875b04a Deleted selected files diff -r 621da360a155 -r f3128f4ffe34 fastq2collapse.xml --- a/fastq2collapse.xml Thu May 17 21:33:10 2018 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,30 +0,0 @@ - - in FASTQ - - - fastq2collapse.pl -v $input $output - - - - - - - - - - - -.. class:: infomark - -**What this tool does** - -This tool collapses exact duplicate sequences. - -It takes as input files in FASTQ format of filtered and trimmed reads and output files in FASTQ format in which exact PCR duplicates have been collapsed. - - - - - - - diff -r 621da360a155 -r f3128f4ffe34 fastqFilter.xml --- a/fastqFilter.xml Thu May 17 21:33:10 2018 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,101 +0,0 @@ - - - - fastq_filter.pl -v - #if $sampleIndex.filterBySampleIndex == "yes": - -index $sampleIndex.sequence - #end if - -maxN $maxN -if sanger -f $filterString -of $outputFormat $inputfile $outputfile - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -.. class:: infomark - -**What this tool does** - -This tool extracts reads passing quality filters. - -It takes as input Sanger FASTQ files and output FASTQ/A files of filtered reads. - ------ - -**FASTQ format** - -Check quality score in the FASTQ file for the right format. - -Reference https://en.wikipedia.org/wiki/FASTQ_format#Quality : - -* Sanger format can encode a Phred quality score from 0 to 93 using ASCII 33 to 126. -* Solexa/Illumina 1.0 format can encode a Solexa/Illumina quality score from -5 to 62 using ASCII 59 to 126. - -See http://www.asciitable.com/ for ASCII table. - ------ - -**Filter by sample index (optional)** - -For users who would like to start from a FASTQ file consisting of multiple libraries. - -For example: - -If you have six samples with indexes GTCA, GCAT, ACTG, AGCT, GCAT, TCGA, you can extract reads for each library with indicated index sequences (e.g. GTCA, etc.) starting from position 0 in the read. For example, you could specify 0:GTCA, etc. - ------ - -**How to set the filter** - -You can apply multiple filtering criteria based on the quality scores for each read. They are separated by commas. - -Each critieron is composed of four components (e.g. method1:start1-end1:score1,method2:start2-end2:score2) - -1. Method: min or mean, which means requirement on minimal or mean score of a region -2. Start: the first nucleotide to consider (0-based) -3. End: the last nucleotide to consider (0-based) -4. score: the threshold required - -**Parameter suggestion** - -For example: - -* For Standard CLIP protocol filtering: mean:0-29:20 (this specifies a mean score of 20 or above in the first 30 bases, which includes 5 positions with sample indexes and the random barcode, followed by 25 positions with the actual CLIP tag). -* For iCLIP/BrdU CLIP filtering: mean:0-38:20 (this specifies a mean score of 20 or above in the first 39 bases, which includes 14 positions with sample indexes and the random barcode, followed by 25 positions with the actual CLIP tag). - -The reason to filter as such is because low quality reads can introduce mapping errors and background. They will inflate the number of unique tags after removal of PCR duplicates. - - - - - diff -r 621da360a155 -r f3128f4ffe34 trimming3.xml --- a/trimming3.xml Thu May 17 21:33:10 2018 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,61 +0,0 @@ - - using FASTX Toolkit - - - fastx_clipper -a $adapterSeq -l $discardShorterThan $discardNonclipped $discardClipped $adapterOnly $keepUnknown - #if $minAdapterAlignment.minOverlapRequired =="yes": - -M $minAdapterAlignment.minLen - #end if - -v -i $input 2>/dev/null | fastq_quality_trimmer -v -l $discardShorterThan -t $qualityThreshold -o $output - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -.. class:: infomark - -**What this tool does** - - -This tool takes as input FASTQ files and output FASTQ files with 3' adapters and extremely low quality bases (e.g. score less than 5) removed. - -It is a wrapper of fastx_clipper and fastq_quality_trimmer that are a part of the FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). - ------ - -**Parameter suggestion for discarding sequences** - -We typically require high quality score in barcode and 15 nt of CLIP tags. -* For standard CLIP: discard sequences shorter than 20 nt (5 nt barcode + 15 nt CLIP tag). -* For BrdU CLIP: discard sequences shorter than 29 nucleotides (14 nt barcode + 15 nt CLIP tag). - - - -