# HG changeset patch # User nick # Date 1448334381 18000 # Node ID d2e46adc199ec81772bb0de575039a903eaaf2bb planemo upload commit 35b743e6492923c0e2b1e5e434eaf4e56d268108 diff -r 000000000000 -r d2e46adc199e align_families.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/align_families.xml Mon Nov 23 22:06:21 2015 -0500 @@ -0,0 +1,64 @@ + + + from duplex sequencing data + + mafft + duplex + DUPLEX_DIR + + python \$DUPLEX_DIR/align_families.py $input > $output + + + + + + + + + + + + + + + + + + + +**What it does** + +This is for processing duplex sequencing data. It does a multiple sequence alignment on each (single-stranded) family of reads. + +----- + +**Input** + +This expects the output format of the "Make families" tool. + +----- + +**Output** + +The output is a tabular file where each line corresponds to a (single) read. + +The columns are:: + + 1: barcode (both tags) + 2: tag order in barcode ("ab" or "ba") + 3: read mate ("1" or "2") + 4: read name + 5: read sequence, aligned ("-" for gaps) + 6: read quality scores, aligned (" " for gaps) + +----- + +**Alignments** + +The alignments are done using MAFFT, specifically the command +:: + + $ mafft --nuc --quiet family.fa > family.aligned.fa + + + diff -r 000000000000 -r d2e46adc199e duplex.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/duplex.xml Mon Nov 23 22:06:21 2015 -0500 @@ -0,0 +1,61 @@ + + + from duplex sequencing data + + duplex + DUPLEX_DIR + + duplex.fa + && awk -f \$DUPLEX_DIR/utils/outconv.awk -v target=1 duplex.fa > $output1 + && awk -f \$DUPLEX_DIR/utils/outconv.awk -v target=2 duplex.fa > $output2 + ]]> + + + + + + + + + + + + + + + + keep_sscs + + + + + + + + + + + +**What it does** + +This is for processing duplex sequencing data. It creates single-strand and duplex consensus reads from aligned read families. + +----- + +**Input** + +This expects the output format of the "Align families" tool. + +----- + +**Output** + +This will output final, duplex consensus reads in two FASTA files (first and second reads in the pairs). Optionally, you can save the single-strand reads too, in a separate FASTA file. + + + diff -r 000000000000 -r d2e46adc199e make_families.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/make_families.xml Mon Nov 23 22:06:21 2015 -0500 @@ -0,0 +1,83 @@ + + + from duplex sequencing data + + duplex + DUPLEX_DIR + + paste $fastq1 $fastq2 + | paste - - - - + | awk -f \$DUPLEX_DIR/make-barcodes.awk -v TAG_LEN=$taglen -v INVARIANT=$invariant + | sort + > $output + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool is for processing raw duplex sequencing data, removing the barcodes and grouping by them into families of reads from the same fragment. + +----- + +**Output** + +The output will be a tabular file where each line corresponds to a pair of input reads. + +The columns are:: + + 1: barcode (both tags joined and ordered) + 2: tag order in barcode ("ab" or "ba") + 3: read1 name + 4: read1 sequence (minus the tag and invariant sequences) + 5: read1 quality scores (minus the same tag and invariant) + 6: read2 name + 7: read2 sequence (minus the tag and invariant sequences) + 8: read2 quality scores (minus the same tag and invariant) + +----- + +**Barcode creation** + +For each pair, the tool will remove the tag at the beginning of each read and create a barcode by concatenating the two tags. The order of the tags is determined by a string comparison so that it will make an identical barcode from pairs of either order. The original tag order will be noted in the second column. + +Since pairs from opposite strands will have the same tags, but in the reverse order, this produces the same barcode for reads from the same fragment, regardless of strand. Then a simple sort will group all reads from the same strand together, separated into strands by the different "order" values. + +Examples:: + + +---------------+-----------------+ + | input tags | output | + +-------+-------+-------+---------+ + | read1 | read2 | order | barcode | + +-------+-------+-------+---------+ + | ATG | CCT | ab | ATGCCT | + +-------+-------+-------+---------+ + | CCT | ATG | ba | ATGCCT | + +-------+-------+-------+---------+ + + + diff -r 000000000000 -r d2e46adc199e tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Mon Nov 23 22:06:21 2015 -0500 @@ -0,0 +1,22 @@ + + + + + + + + + https://github.com/makrutenko/duplex/archive/master.tar.gz + make + + . + $INSTALL_DIR + + + $INSTALL_DIR + $INSTALL_DIR + + + + +