Mercurial > repos > artbio > small_rna_signatures

<tool id="overlapping_reads" name="Get overlapping reads" version="3.4.1">
    <description />
    <requirements>
        <requirement type="package" version="0.18.0">pysam</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" level="fatal" description="Tool exception" />
    </stdio>
      <command detect_errors="exit_code"><![CDATA[
        ln -f -s $input.metadata.bam_index input.bam.bai &&
        ln -s $input input.bam &&
        python '$__tool_directory__'/overlapping_reads.py
           --input input.bam
           --minquery '$minquery'
           --maxquery '$maxquery'
           --mintarget '$mintarget'
           --maxtarget '$maxtarget'
           --overlap '$overlap'
           --output '$output'
    ]]></command>
    <inputs>
        <param format="bam" label="Compute signature from this bowtie standard output" name="input" type="data" />
        <param help="'23' = 23 nucleotides" label="Min size of query small RNAs" name="minquery" size="3" type="integer" value="23" />
        <param help="'29' = 29 nucleotides" label="Max size of query small RNAs" name="maxquery" size="3" type="integer" value="29" />
        <param help="'23' = 23 nucleotides" label="Min size of target small RNAs" name="mintarget" size="3" type="integer" value="23" />
        <param help="'29' = 29 nucleotides" label="Max size of target small RNAs" name="maxtarget" size="3" type="integer" value="29" />
        <param help="'10' = 10 nucleotides overlap" label="Overlap (in nt)" name="overlap" size="3" type="integer" value="10" />
    </inputs>
    <outputs>
        <data format="fasta" label="pairable reads" name="output" />
    </outputs>
    <tests>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="23" />
            <param name="maxquery" value="29" />
            <param name="mintarget" value="23" />
            <param name="maxtarget" value="29" />
            <param name="overlap" value="10" />
            <output file="paired.fa" ftype="fasta" name="output" />
        </test>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="20" />
            <param name="maxquery" value="22" />
            <param name="mintarget" value="23" />
            <param name="maxtarget" value="29" />
            <param name="overlap" value="10" />
            <output file="paired_2.fa" ftype="fasta" name="output" />
        </test>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="23" />
            <param name="maxquery" value="29" />
            <param name="mintarget" value="20" />
            <param name="maxtarget" value="22" />
            <param name="overlap" value="10" />
            <output file="paired_3.fa" ftype="fasta" name="output" />
        </test>
        <test>
            <param ftype="bam" name="input" value="sr_bowtie.bam" />
            <param name="minquery" value="20" />
            <param name="maxquery" value="22" />
            <param name="mintarget" value="20" />
            <param name="maxtarget" value="22" />
            <param name="overlap" value="10" />
            <output file="paired_4.fa" ftype="fasta" name="output" />
        </test>
    </tests>
    <help>

**What it does**

Extract reads with overlap signatures of the specified overlap (in nt) and
return a fasta file of these "pairable" reads.

See `Antoniewski (2014)`_ for background and details

.. _Antoniewski (2014): https://link.springer.com/protocol/10.1007%2F978-1-4939-0931-5_12

**Input**

*A **sorted** BAM alignment file.*

*Query and target sizes:*

The algorithm search for each *query* reads (of specified size) in the bam alignment if
there are *target* reads (of specified size) that align on the opposite strand with a 10 nt
overlap.

Searching query reads of 20-22 nt that overlap by 10 nt with target
reads of 23-29 nt is equivalent to searching query reads of 23-29 nt that overlap by 10 nt
with target reads of 20-22 nt. i.e, searching for siRNAs that pair with piRNAs is equivalent
to searching for siRNAs that pairs with piRNAs. In contrast, searching query reads of 20-22 nt
that overlap by 10 nt with target reads of 23-29 nt is different from searching query reads of
23-29 nt that overlap by 10 nt with target reads of 23-29 nt, since the number of "heterotypic"
pairs of reads is likely to be different from the number of "homotypic" pairs of reads.

*Overlap*
The number of nucleotides by which the pairs of sequences will overlap


**Outputs**

a fasta file of pairable reads such as :

>FBgn0000004_17.6|coord=5839|strand -|size=26|nreads=1

TTTTCGTCAATTGTGCCAAATAGGTA

>FBgn0000004_17.6|coord=5855|strand +|size=23|nreads=1

TTGACGAAAATGATCGAGTGGAT


where FBgn0000004_17.6 stands for the chromosome, 5839 stands for the 1-based read position,
'strand -' stands for lower strand of chromosome, 26 stands for the size of the sequence and
nreads=1 stands for the number of reads of the sequence in the dataset.

the second sequence in this example corresponds to 1 read that overlap by 10 nt with
1 read of the first sequence.

The tool also returns in the standard output the numbers of pairs of reads that
can be formed simultaneously in silico. Note that these numbers are distinct from the numbers
of pairs of read alignments (as computed by the small_rna_signature tool) when analysis is
performed with multi-mapping reads.

        </help>
    <citations>
            <citation type="doi">10.1007/978-1-4939-0931-5_12</citation>
    </citations>
</tool>
author	artbio
date	Sat, 22 Oct 2022 23:49:52 +0000
parents	8d3ca9652a5b
children	124f404b0fe7