Mercurial > repos > artbio > artbio_bam_cleaning
view artbio_bam_cleaning.xml @ 7:745f529127b8 draft
"planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/artbio_bam_cleaning commit b782130b62b7c74911774b58c7a965a99dee1519"
author | artbio |
---|---|
date | Mon, 20 Dec 2021 19:44:29 +0000 |
parents | 999c2b871f36 |
children | b12e50bcddd2 |
line wrap: on
line source
<tool id="artbio_bam_cleaning" name="ARTbio bam cleaning" version="1.7+galaxy0"> <description> on flags and PCR Duplicates and MD recalibration </description> <macros> <import>macro.xml</import> </macros> <requirements> <requirement type="package" version="1.6=hb116620_7">samtools</requirement> <requirement type="package" version="0.8.1=h41abebc_0">sambamba</requirement> <requirement type="package" version="1.3.5=py39hba5d119_3">freebayes</requirement> </requirements> <stdio> <exit_code range="1:" level="fatal" description="Error occured" /> </stdio> <command detect_errors="exit_code"><![CDATA[ @pipefail@ @set_fasta_index@ #set input_base = 'input' ln -f -s $input_bam.metadata.bam_index input.bam.bai && ln -s $input_bam input.bam && sambamba view -h -t \${GALAXY_SLOTS:-2} --filter="mapping_quality >= 1 and not(unmapped) and not(mate_is_unmapped) and not(duplicate)" -f "bam" ${input_base}".bam" | bamleftalign --fasta-reference reference.fa -c --max-iterations "5" - | samtools calmd -C 50 -b -@ \${GALAXY_SLOTS:-2} - reference.fa #if $filter_MQ_255 == 'no': > $calmd #else if $filter_MQ_255 == 'yes': | tee $calmd | sambamba view -h -t \${GALAXY_SLOTS:-2} --filter='mapping_quality <= 254' -f 'bam' /dev/stdin > $fullfilter #end if ]]></command> <inputs> <expand macro="reference_source_conditional" /> <param name="input_bam" type="data" format="bam" label="BAM or SAM file to process"/> <param name="filter_MQ_255" type="select" label="Discard alignments with mapping quality > 254" display="radio" help="If `No`, generates the calMD output without discarding aberrant MQs generated by the step. Useful if you need to keep split reads that we be eliminated if `Yes`"> <option value="yes" selected="true">Yes</option> <option value="no">No</option> </param> </inputs> <outputs> <data name="calmd" format="bam" label="CalMD filter (for lumpy-smoove)" /> <data name="fullfilter" format="bam" label="Full filtering (for somatic-varscan)"> <filter>filter_MQ_255 == "yes"</filter> </data> </outputs> <tests> <test> <param name="input_bam" value="chr22_sample.bam" ftype="bam" /> <param name="reference_source_selector" value="history" /> <param name="ref_file" value="chr22.fa" /> <output name="calmd" file="calmd.bam" ftype="bam" /> <output name="fullfilter" file="full.bam" ftype="bam" /> </test> <test> <param name="input_bam" value="chr22_sample.bam" ftype="bam" /> <param name="reference_source_selector" value="history" /> <param name="filter_MQ_255" value="yes" /> <param name="ref_file" value="chr22.fa" /> <output name="calmd" file="calmd.bam" ftype="bam" /> </test> </tests> <help> ARTbio bam cleaning overview ============================ .. class:: infomark This tool is wrapping several cleaning steps to produce bam files suitable for subsequent analyses with lumpy-smoove (or other large structural variation callers) or with somatic-varscan (or other small structural variation callers) Workflow ============= .. class:: infomark The tool is using the following command line for filtering: :: sambamba view -h -t 8 --filter='mapping_quality >= 1 and not(unmapped) and not(mate_is_unmapped) and not(duplicate)' -f 'bam' $input_base".bam" | bamleftalign --fasta-reference reference.fa -c --max-iterations "5" - | samtools calmd -C 50 -b -@ 4 - reference.fa > $input_base".filt1.dedup.bamleft.calmd.bam" ; sambamba view -h -t 8 --filter='mapping_quality <= 254' -f 'bam' -o $input_base".filt1.dedup.bamleft.calmd.filt2.bam" $input_base".filt1.dedup.bamleft.calmd.bam" .. class:: warningmark From version **1.7+galaxy0**, this tool assumes that the input bam already has its optical/PCR duplicate alignments marked appropriately in their flag value. If it is not the case, it may be necessary to use tool that perform this job, for instance samtools markdup, or sambamba markdup. Purpose -------- This "workflow" tool was generated in order to limit the number of ``python metadata/set.py`` jobs which occur at each step of standard galaxy workflows. Indeed, these jobs are poorly optimized and may last considerable amounts of time when datasets are large, at each step, lowering the overall performance of the workflow. </help> <citations> <citation type="doi">10.1371/journal.pone.0168397</citation> </citations> </tool>