# HG changeset patch # User iuc # Date 1482178720 18000 # Node ID aa72470e14f7a5de359bc95f25d51b271a19c994 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/samblaster commit 82097013a9eb5a6161d400e5b6c493113c440687 diff -r 000000000000 -r aa72470e14f7 samblaster.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/samblaster.xml Mon Dec 19 15:18:40 2016 -0500 @@ -0,0 +1,211 @@ + + marks duplicates, outputs split reads, discordant read pairs and unmapped reads + + samblaster + sambamba + + samblaster --version + + + + + + + + + + + + + + + + + + + + + output + + + discordantFile + + + splitterFile + + + unmappedFile + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + `__ for details +about the SAM alignment format. + +By default, samblaster marks duplicates with SAM FLAG 0x400. The +**--removeDups** option will instead remove duplicate alignments from the +output file. + +**ALIGNMENT TYPE DEFINITIONS:** Below, we will use the following +definitions for alignment types. Starting with *samblaster* release +0.1.22, these definitions are affected by the use of the **-M** option. +By default, *samblaster* will use the current definitions of alignment +types as specified in the `SAM +Specification `__. Namely, +alignments marked with FLAG 0x100 are considered *secondary*, while +those marked with FLAG 0x800 are considered *supplemental*. If the +**-M** option is specified, alignments marked with either FLAG 0x100 or +0x800 are considered *supplemental*, and no alignments are considered +*secondary*. A *primary* alignment is always one that is neither +*secondary* nor *supplemental*. Only *primary* and *supplemental* +alignments are used to find chimeric (split-read) mappings. The **-M** +flag is used for backward compatibility with older SAM/BAM files in +which "chimeric" alignments were marked with FLAG 0x100, and should also +be used with output from more recent runs of *bwa mem* using its **-M** +option. + +**DISCORDANT READ PAIR IDENTIFICATION:** A **discordant** read pair is +one which meets all of the following criteria: + +1. Both side of the read pair are mapped (neither FLAG 0x4 or 0x8 is + set). +2. The *properly paired* FLAG (0x2) is not set. +3. *Secondary* or *supplemental* alignments are never output as + discordant, although a discordant read pair can have such alignments + associated with them. +4. Duplicate read pairs that meet the above criteria will be output as + discordant unless the **-e** option is used. + +**UNMAPPED/CLIPPED READ IDENTIFICATION:** An **unmapped** or **clipped** +read is a *primary* alignment that is unaligned over all or part of its +length respectively. The lack of a full alignment may be caused by a SV +breakpoint that falls within the read. Therefore, *samblaster* will +optionally output such reads to a FASTQ file for re-alignment by a tool, +such as `YAHA `__, geared toward +finding split-read mappings. *samblaster* applies the following strategy +to identify and output unmapped/clipped reads: + +1. An **unmapped** read has the *unmapped read* FLAG set (0x4). +2. A **clipped** read is a mapped read with a CIGAR string that begins + or ends with at least **--minClipSize** unaligned bases (CIGAR code S + and/or H), and is not from a read that has one or more *supplemental* + alignments. +3. In order for *samblaster* to output the entire sequence for clipped + reads, the input SAM file must have soft clipped primary alignments. +4. *samblaster* will output unmapped/clipped reads into a FASTQ file if + QUAL information is available in the input file, and a FASTA file if + not. +5. Unmapped/clipped reads that are part of a duplicate read pair will be + output unless the **-e** option is used. + + +**Written by:** Greg Faust (gf4ea@virginia.edu) `Ira Hall Lab, +University of Virginia `__ + +**Please cite:** `Faust, G.G. and Hall, I.M., “\ *SAMBLASTER*: fast +duplicate marking and structural variant read extraction,” +*Bioinformatics* Sept. 2014; **30**\ (17): +2503-2505. `__ + +**Also see:** `SAMBLASTER\_Supplemental.pdf +`__ +for additonal discussion and statistics about the duplicates marked by +*samblaster* vs. *Picard* using the NA12878 sample dataset. Click the +preceeding link or download the file from this repository. +**Written by:** Greg Faust (gf4ea@virginia.edu) `Ira Hall Lab, +University of Virginia `__ + +**Please cite:** `Faust, G.G. and Hall, I.M., “\ *SAMBLASTER*: fast +duplicate marking and structural variant read extraction,” +*Bioinformatics* Sept. 2014; **30**\ (17): +2503-2505. `__ + +**Also see:** `SAMBLASTER\_Supplemental.pdf +`__ +for additonal discussion and statistics about the duplicates marked by +*samblaster* vs. *Picard* using the NA12878 sample dataset. Click the +preceeding link or download the file from this repository. + + ]]> + + 10.1093/bioinformatics/btu314 + + diff -r 000000000000 -r aa72470e14f7 test-data/output.bam Binary file test-data/output.bam has changed diff -r 000000000000 -r aa72470e14f7 test-data/splitters.bam Binary file test-data/splitters.bam has changed diff -r 000000000000 -r aa72470e14f7 test-data/sr.input.bam Binary file test-data/sr.input.bam has changed diff -r 000000000000 -r aa72470e14f7 test-data/sr.input.sam.gz Binary file test-data/sr.input.sam.gz has changed