view clip_overlap.xml @ 0:79725ecf10a3 draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/bamutil commit 29e40a76f1e249c3ed73f9129ad711beba34eb07"
author iuc
date Mon, 29 Mar 2021 14:15:42 +0000
parents
children 047a20d4258f
line wrap: on
line source

<tool id="bamutil_clip_overlap" name="BamUtil clipOverlap" version="@WRAPPER_VERSION@" profile="@PROFILE@">
    <description></description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <expand macro="edam"/>
    <command detect_errors="exit_code"><![CDATA[
## clipOverlap uses the output file
## extension to determine the output format.
#if $input.ext.endswith('bam'):
    #set tmp_out = 'output.bam'
#else:
    #set tmp_out = 'output.sam'
#end if
trap '>&2 cat output.log' EXIT;
touch 'output.log' &&
bam clipOverlap
--in '$input'
#if str($storeOrig):
    --storeOrig '$storeOrig'
#end if
$stats
#if str($input.ext) == 'qname_sorted.bam':
    --readName
#end if
$overlapsOnly
#if str($excludeFlags):
    --excludeFlags $excludeFlags
#end if
$unmapped
--noPhoneHome
--out '$tmp_out'
2> 'output.log'
&& mv '$tmp_out' '$output'
#if str($stats):
    && cp 'output.log' '$output_stats'
#end if
    ]]></command>
    <inputs>
        <param name="input" type="data" format="sam,bam,qname_sorted.bam" label="Select SAM or BAM file on which to clip overlapping read pairs"/>
        <param argument="--storeOrig" type="text" value="" label="Enter a tag in which to store the original CIGAR" help="Leave blank to skip">
            <sanitizer invalid_char="">
                <valid initial="string.letters,string.digits"/>
            </sanitizer>
        </param>
        <param argument="--stats" type="boolean" truevalue="--stats" falsevalue="" checked="false" label="Output statistics on the overlaps?"/>
        <param argument="--overlapsOnly" type="boolean" truevalue="--overlapsOnly" falsevalue="" checked="false" label="Only output overlapping read pairs?"/>
        <param argument="--excludeFlags" type="integer" optional="true" value="" label="Enter an integer representation of a flag to skip records with any of the specified flags set" help="See the help section below for information about this option"/>
        <param argument="--unmapped" type="boolean" truevalue="--unmapped" falsevalue="" checked="false" label="Mark records that would be completely clipped as unmapped?"/>
    </inputs>
    <outputs>
        <data name="output" format_source="input" metadata_source="input"/>
        <data name="output_stats" format="txt" label="${tool.name} on ${on_string}: Statistics">
            <filter>stats</filter>
        </data>
    </outputs>
    <tests>
        <test expect_num_outputs="1">
            <param name="input" value="input.sam" ftype="sam"/>
            <output name="output" file="output.sam" ftype="sam"/>
        </test>
        <test expect_num_outputs="2">
            <param name="input" value="input.bam" ftype="bam"/>
            <param name="storeOrig" value="6M"/>
            <param name="stats" value="--stats"/>
            <output name="output" file="input.bam" ftype="bam"/>
            <output name="output_stats" file="output_stats.txt" ftype="txt"/>
        </test>
        <test expect_num_outputs="1">
            <param name="input" value="input_qname_sorted.bam" ftype="qname_sorted.bam"/>
            <output name="output" file="output_qname_sorted.bam" ftype="qname_sorted.bam"/>
        </test>
    </tests>
    <help>
**What it does**

Clips overlapping read pairs in a SAM or BAM file based on criteria.

The input file and resulting output file are sorted by coordinate (or readName if specified in the options).

When a read is clipped from the front:

 * the read start position is updated to reflect the clipping
 * the mate's mate start position is updated to reflect the record's new position
 * the record is placed in the output file in the correct location based on the updated position

To handle coordinate-sorted files, SAM/BAM records are buffered up until it is known that all following records will have a later start position. To prevent the program from running away with memory, a limit is set to the number of records that can be buffered, see --poolSize for more information.

When two mates overlap, this tool will clip the record's whose clipped region would have the lowest average quality.

It also checks strand. If a forward strand extends past the end of a reverse strand, that will be clipped. Similarly, if a reverse strand starts before the forward strand, the region prior to the forward strand will be clipped. If the reverse strand occurs entirely before the forward strand, both strands will be entirely clipped. If the --unmapped option is specified, then rather than clipping an entire read, it will be marked as unmapped.
The qualities on the two strands remain unchanged even with clipping.

The excludeFlags option accepts a decimal value and skips the records with the specified flags set.  The default is 3852 (0xF0C hex), so records with any of the following flags set will be skipped:

 * unmapped
 * mate unmapped
 * secondary alignment
 * fails QC checks
 * duplicate
 * supplementary

**Assumptions/ Restrictions**

 * Assumes the file is sorted by Coordinate (or ReadName if using --readName option)
 * Assumes only 2 reads have matching ReadNames (Supplementary and Secondary reads are ignored/skipped by default so will not cause a problem)

  * It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't. If there are 4, the first 2 will be matched and the last 2 will be matched and compared.

 * Only mapped reads will be clipped
 * Assumes that mate information in records are accurate

**Clipping from the front**

The first operation after the softclip will be a Match/Mismatch, meaning that any trailing pads, deletions, insertions, or skips will also be soft clipped.

+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| Clip location                                                                            | How it is handled                                                                    |
+================================+=========================================================+======================================================================================+
| If the clip position falls in a skip/deletion                                            | Removes the entire skip/deletion                                                     |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| If the position immediately after the clip is a skip/deletion                            | Also removes the skip/deletion                                                       |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| If the position immediately after the clip is an Insert                                  | Softclips the insert                                                                 |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| If the position immediately after the clip is a Pad                                      | Removes the pad                                                                      |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| Clip occurs at the last match/mismatch position of the read (the entire read is clipped) | Entire read is soft clipped, 0-based position is left as the original (not modified) |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| Clip occurs after the read ends                                                          | Entire read is soft clipped, 0-based position is left as the original (not modified) |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+
| Clip occurs before the read starts                                                       | Nothing is clipped. The read is not changed.                                         |
+------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+

**Clipping from the back**

+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Clip location                                                              | How it is handled                                                                                                 |
+==================+=========================================================+===================================================================================================================+
| If the clip position falls in a skip/deletion                              | Removes the entire skip/deletion                                                                                  |
+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| If the position immediately before the clip is a deletion/skip/pad         | Remove the deletion/skip/pad                                                                                      |
+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| If the position immediately before the clip is an insertion                | Leave the insertion, even if it results in a 70M3I27S                                                             |
+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Clip occurs at the first position of the read (the entire read is clipped) | Entire read is soft clipped, preceding insertions remain, 0-based position is left as the original (not modified) |
+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Clip occurs before the read starts                                         | Entire read is soft clipped, 0-based position is left as the original (not modified)                              |
+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Clip occurs after the read ends                                            | Nothing is clipped. The read is not changed.                                                                      |
+----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+

    </help>
    <expand macro="citations"/>
</tool>