Mercurial > repos > nilesh > rseqc
view tin.xml @ 58:1a052c827e88 draft
"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/rseqc commit 57f71aa633a43ab02bbf05acd0c6d7f406e01f1e"
author | iuc |
---|---|
date | Thu, 28 Nov 2019 15:56:37 -0500 |
parents | f437057e46f1 |
children | dbedfc5f5a3c |
line wrap: on
line source
<tool id="rseqc_tin" name="Transcript Integrity Number" version="@WRAPPER_VERSION@.1"> <description> evaluates RNA integrity at a transcript level </description> <macros> <import>rseqc_macros.xml</import> </macros> <expand macro="requirements" /> <expand macro="stdio" /> <version_command><![CDATA[tin.py --version]]></version_command> <!-- Generate output files here because tin.py removes all instances of "bam" in the filename --> <command><![CDATA[ #import re ln -sf '${input}' 'input.bam' && ln -sf '${input.metadata.bam_index}' 'input.bam.bai' && tin.py -i 'input.bam' --refgene='${refgene}' --minCov=${minCov} --sample-size=${samplesize} ${subtractbackground} ]]> </command> <inputs> <expand macro="bam_param" /> <expand macro="refgene_param" /> <param name="minCov" type="integer" value="10" label="Minimum coverage (default=10)" help="Minimum number of reads mapped to a transcript (--minCov)." /> <param name="samplesize" type="integer" value="100" label="Sample size (default=100)" help="Number of equal-spaced nucleotide positions picked from mRNA. Note: if this number is larger than the length of mRNA (L), it will be halved until is's smaller than L. (--sample-size)." /> <param name="subtractbackground" type="boolean" value="false" falsevalue="" truevalue="--subtract-background" label="Subtract background noise (default=No)" help="Subtract background noise (estimated from intronic reads). Only use this option if there are substantial intronic reads (--subtract-background)." /> </inputs> <outputs> <data name="outputsummary" format="tabular" from_work_dir="input.summary.txt" label="TIN on ${on_string} (summary)" /> <data name="outputxls" format="xls" from_work_dir="input.tin.xls" label="TIN on ${on_string} (tin)" /> </outputs> <!-- PDF Files contain R version, must avoid checking for diff --> <tests> <test> <param name="input" value="pairend_strandspecific_51mer_hg19_chr1_1-100000.bam"/> <param name="refgene" value="hg19_RefSeq_chr1_1-100000.bed"/> <output name="outputsummary"> <assert_contents> <has_line_matching expression="^Bam_file\tTIN\(mean\)\tTIN\(median\)\tTIN\(stdev\)$" /> <has_line_matching expression="^input\.bam\t8\.8709677419\d+\t8\.8709677419\d+\t0\.0$" /> </assert_contents> </output> <output name="outputxls" file="output.tin.xls"/> </test> </tests> <help><![CDATA[ ## tin.py This program is designed to evaluate RNA integrity at transcript level. TIN (transcript integrity number) is named in analogous to RIN (RNA integrity number). RIN (RNA integrity number) is the most widely used metric to evaluate RNA integrity at sample (or transcriptome) level. It is a very useful preventive measure to ensure good RNA quality and robust, reproducible RNA sequencing. However, it has several weaknesses: * RIN score (1 <= RIN <= 10) is not a direct measurement of mRNA quality. RIN score heavily relies on the amount of 18S and 28S ribosome RNAs, which was demonstrated by the four features used by the RIN algorithm: the “total RNA ratio” (i.e. the fraction of the area in the region of 18S and 28S compared to the total area under the curve), 28S-region height, 28S area ratio and the 18S:28S ratio24. To a large extent, RIN score was a measure of ribosome RNA integrity. However, in most RNA-seq experiments, ribosome RNAs were depleted from the library to enrich mRNA through either ribo-minus or polyA selection procedure. * RIN only measures the overall RNA quality of an RNA sample. However, in real situation, the degradation rate may differs significantly among transcripts, depending on factors such as “AU-rich sequence”, “transcript length”, “GC content”, “secondary structure” and the “RNA-protein complex”. Therefore, RIN is practically not very useful in downstream analysis such as adjusting the gene expression count. * RIN has very limited sensitivity to measure substantially degraded RNA samples such as preserved clinical tissues. (ref: https://www.scribd.com/document/352764986/DV200-Technote-Truseq-Rna-Access). To overcome these limitations, we developed TIN, an algorithm that is able to measure RNA integrity at transcript level. TIN calculates a score (0 <= TIN <= 100) for each expressed transcript, however, the medTIN (i.e. meidan TIN score across all the transcripts) can also be used to measure the RNA integrity at sample level. Below plots demonstrated TIN is a useful metric to measure RNA integrity in both transcriptome-wise and transcript-wise, as demonstrated by the high concordance with both RIN and RNA fragment size (estimated from RNA-seq read pairs). ## Inputs Input BAM/SAM file Alignment file in BAM/SAM format. Reference gene model Gene Model in BED format. Must be standard 12-column BED file. Minimum coverage Minimum number of reads mapped to a tracript (default is 10). Sample size Number of equal-spaced nucleotide positions picked from mRNA. Note: if this number is larger than the length of mRNA (L), it will be halved until it’s smaller than L (default is 100). Subtract background Subtract background noise (estimated from intronic reads). Only use this option if there are substantial intronic reads. ## Outputs Text Table that includes the gene identifier (geneID), chromosome (chrom), transcript start (tx_start), transcript end (tx_end), and transcript integrity number (TIN). Example output: ------ ----- ---------- --------- ------------- geneID chrom tx_start tx_end TIN ------ ----- ---------- --------- ------------- ABCC2 chr10 101542354 101611949 67.6446525761 IPMK chr10 59951277 60027694 86.383618429 RUFY2 chr10 70100863 70167051 43.8967503948 ------ ----- ---------- --------- ------------- @ABOUT@ ]]> </help> <expand macro="citations" /> </tool>