view tin.xml @ 60:1421603cc95b draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/rseqc commit 1dfe55ca83685cadb0ce8f6ebbd8c13232376d1d
author iuc
date Sat, 26 Nov 2022 15:19:14 +0000
parents dbedfc5f5a3c
children 5968573462fa
line wrap: on
line source

<tool id="rseqc_tin" name="Transcript Integrity Number" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@GALAXY_VERSION@">
    <description>
        evaluates RNA integrity at a transcript level
    </description>
    <expand macro="bio_tools"/>
    <macros>
        <import>rseqc_macros.xml</import>
    </macros>

    <expand macro="requirements" />

    <expand macro="stdio" />

    <version_command><![CDATA[tin.py --version]]></version_command>

    <!-- Generate output files here because tin.py removes all instances of "bam"
    in the filename -->
    <command><![CDATA[
        #import re
        #set $input_list = []
        #for $i, $input in enumerate($input):
            #set $safename = re.sub('[^\w\-_]', '_', $input.element_identifier)
            #if $safename in $input_list:
                #set $safename = str($safename) + "." + str($i)
            #end if
            $input_list.append($safename)
            ln -sf '${input}' '${safename}.bam' &&
            ln -sf '${input.metadata.bam_index}' '${safename}.bam.bai' &&
            echo '${safename}.bam' >> 'input_list.txt' &&
        #end for
        tin.py -i 'input_list.txt' --refgene='${refgene}' --minCov=${minCov}
        --sample-size=${samplesize} ${subtractbackground}
        && mv *summary.txt summary.tab
        && mv *tin.xls tin.xls
        ]]>
    </command>

    <inputs>
        <param name="input" type="data" format="bam" multiple="true" label="Input BAM file" help="(--input-file)"/>
        <expand macro="refgene_param" />
        <param name="minCov" type="integer" value="10" label="Minimum coverage (default=10)"
            help="Minimum number of reads mapped to a transcript (--minCov)." />
        <param name="samplesize" type="integer" value="100" label="Sample size (default=100)"
            help="Number of equal-spaced nucleotide positions picked from mRNA.
            Note: if this number is larger than the length of mRNA (L), it will
            be halved until is's smaller than L. (--sample-size)." />
        <param name="subtractbackground" type="boolean" value="false" falsevalue=""
            truevalue="--subtract-background" label="Subtract background noise
            (default=No)" help="Subtract background noise (estimated from
            intronic reads). Only use this option if there are substantial
            intronic reads (--subtract-background)." />
    </inputs>

    <outputs>
        <data name="outputsummary" format="tabular" from_work_dir="summary.tab" label="TIN on ${on_string} (summary)" />
        <data name="outputxls" format="xls" from_work_dir="tin.xls" label="TIN on ${on_string} (tin)" />
    </outputs>

    <!-- PDF Files contain R version, must avoid checking for diff -->
    <tests>
        <test expect_num_outputs="2">
            <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/>
            <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed" ftype="bed12"/>
            <output name="outputsummary" file="summary.tin.txt" ftype="tabular"/>
            <output name="outputxls" file="output.tin.xls" ftype="xls"/>
        </test>
    </tests>

    <help><![CDATA[
## tin.py

This program is designed to evaluate RNA integrity at transcript level. TIN
(transcript integrity number) is named in analogous to RIN (RNA integrity
number). RIN (RNA integrity number) is the most widely used metric to
evaluate RNA integrity at sample (or transcriptome) level. It is a very
useful preventive measure to ensure good RNA quality and robust,
reproducible RNA sequencing. However, it has several weaknesses:

* RIN score (1 <= RIN <= 10) is not a direct measurement of mRNA quality.
  RIN score heavily relies on the amount of 18S and 28S ribosome RNAs, which
  was demonstrated by the four features used by the RIN algorithm: the
  “total RNA ratio” (i.e. the fraction of the area in the region of 18S and
  28S compared to the total area under the curve), 28S-region height, 28S
  area ratio and the 18S:28S ratio24. To a large extent, RIN score was a
  measure of ribosome RNA integrity. However, in most RNA-seq experiments,
  ribosome RNAs were depleted from the library to enrich mRNA through either
  ribo-minus or polyA selection procedure.

* RIN only measures the overall RNA quality of an RNA sample. However, in  real
  situation, the degradation rate may differs significantly among
  transcripts, depending on factors such as “AU-rich sequence”, “transcript
  length”, “GC content”, “secondary structure” and the “RNA-protein
  complex”. Therefore, RIN is practically not very useful in downstream
  analysis such as adjusting the gene expression count.

* RIN has very limited sensitivity to measure substantially degraded RNA
  samples such as preserved clinical tissues. (ref:
  https://www.scribd.com/document/352764986/DV200-Technote-Truseq-Rna-Access).

To overcome these limitations, we developed TIN, an algorithm that is able
to measure RNA integrity at transcript level. TIN calculates a score (0 <=
TIN <= 100) for each expressed transcript, however, the medTIN (i.e.
meidan TIN score across all the transcripts) can also be used to measure
the RNA integrity at sample level. Below plots demonstrated TIN is a
useful metric to measure RNA integrity in both transcriptome-wise and
transcript-wise, as demonstrated by the high concordance with both RIN and
RNA fragment size (estimated from RNA-seq read pairs).


## Inputs

Input BAM/SAM file
    Alignment file in BAM/SAM format.

Reference gene model
    Gene Model in BED format. Must be standard 12-column BED file.

Minimum coverage
    Minimum number of reads mapped to a tracript (default is 10).

Sample size
    Number of equal-spaced nucleotide positions picked from mRNA. Note: if
    this number is larger than the length of mRNA (L), it will be halved until
    it’s smaller than L (default is 100).

Subtract background
    Subtract background noise (estimated from intronic reads). Only use this
    option if there are substantial intronic reads.


## Outputs

Text
    Table that includes the gene identifier (geneID), chromosome (chrom),
    transcript start (tx_start), transcript end (tx_end), and transcript
    integrity number (TIN).

Example output:

------  -----  ----------  ---------  -------------
geneID  chrom  tx_start    tx_end     TIN
------  -----  ----------  ---------  -------------
ABCC2   chr10   101542354  101611949  67.6446525761
IPMK    chr10   59951277   60027694   86.383618429
RUFY2   chr10   70100863   70167051   43.8967503948
------  -----  ----------  ---------  -------------

@ABOUT@

]]>
    </help>

    <expand macro="citations" />

</tool>