Mercurial > repos > nilesh > rseqc

<tool id="rseqc_infer_experiment" name="Infer Experiment" version="@TOOL_VERSION@.1">
    <description>speculates how RNA-seq were configured</description>
    <expand macro="bio_tools"/>
    <macros>
        <import>rseqc_macros.xml</import>
    </macros>

    <expand macro="requirements" />

    <expand macro="stdio" />

    <version_command><![CDATA[infer_experiment.py --version]]></version_command>

    <command><![CDATA[
        infer_experiment.py -i '${input}' -r '${refgene}'
            --sample-size ${sample_size}
            --mapq ${mapq}
            > '${output}'
            ]]>
    </command>

    <inputs>
        <expand macro="bam_param" />
        <expand macro="refgene_param" />
        <expand macro="sample_size_param" />
        <expand macro="mapq_param" />
    </inputs>

    <outputs>
        <data format="txt" name="output" />
    </outputs>

    <tests>
        <test>
            <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/>
            <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed" ftype="bed12"/>
            <output name="output" file="output.infer_experiment.txt"/>
        </test>
    </tests>

    <help><![CDATA[
infer_experiment.py
+++++++++++++++++++

This program is used to speculate how RNA-seq sequencing were configured, especially how
reads were stranded for strand-specific RNA-seq data, through comparing reads' mapping
information to the underneath gene model.


Inputs
++++++++++++++

Input BAM/SAM file
    Alignment file in BAM/SAM format.

Reference gene model
    Gene model in BED format.

Number of usable sampled reads (default=200000)
    Number of usable reads sampled from SAM/BAM file. More reads will give more accurate estimation, but make program little slower.

Outputs
+++++++

For pair-end RNA-seq, there are two different
ways to strand reads (such as Illumina ScriptSeq protocol):

1. 1++,1--,2+-,2-+

* read1 mapped to '+' strand indicates parental gene on '+' strand
* read1 mapped to '-' strand indicates parental gene on '-' strand
* read2 mapped to '+' strand indicates parental gene on '-' strand
* read2 mapped to '-' strand indicates parental gene on '+' strand

2. 1+-,1-+,2++,2--

* read1 mapped to '+' strand indicates parental gene on '-' strand
* read1 mapped to '-' strand indicates parental gene on '+' strand
* read2 mapped to '+' strand indicates parental gene on '+' strand
* read2 mapped to '-' strand indicates parental gene on '-' strand

For single-end RNA-seq, there are also two different ways to strand reads:

1. ++,--

* read mapped to '+' strand indicates parental gene on '+' strand
* read mapped to '-' strand indicates parental gene on '-' strand

2. +-,-+

* read mapped to '+' strand indicates parental gene on '-' strand
* read mapped to '-' strand indicates parental gene on '+' strand


Example Output
++++++++++++++

**Example1** ::

    =========================================================
    This is PairEnd Data ::

    Fraction of reads explained by "1++,1--,2+-,2-+": 0.4992
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.5008
    Fraction of reads explained by other combinations: 0.0000
    =========================================================

*Conclusion*: We can infer that this is NOT a strand specific because 50% of reads can be explained by "1++,1--,2+-,2-+", while the other 50% can be explained by "1+-,1-+,2++,2--".

**Example2** ::

    ============================================================
    This is PairEnd Data

    Fraction of reads explained by "1++,1--,2+-,2-+": 0.9644 ::
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.0356
    Fraction of reads explained by other combinations: 0.0000
    ============================================================

*Conclusion*: We can infer that this is a strand-specific RNA-seq data. strandness of read1 is consistent with that of gene model, while strandness of read2 is opposite to the strand of reference gene model.

**Example3** ::

    =========================================================
    This is SingleEnd Data ::

    Fraction of reads explained by "++,--": 0.9840 ::
    Fraction of reads explained by "+-,-+": 0.0160
    Fraction of reads explained by other combinations: 0.0000
    =========================================================

*Conclusion*: This is single-end, strand specific RNA-seq data. Strandness of reads are concordant with strandness of reference gene.

@ABOUT@

]]>
    </help>

    <expand macro="citations" />

</tool>
author	iuc
date	Sat, 18 Dec 2021 19:41:19 +0000
parents	5873cd7afb67
children	1421603cc95b