Mercurial > repos > clifinder > clifinder
view README.rst @ 17:3e3370387441 draft
"planemo upload for repository https://github.com/GReD-Clermont/CLIFinder/ commit 94d5cab008ee4422a8ba63468532d3bb552abcd5"
author | clifinder |
---|---|
date | Fri, 14 Feb 2020 17:03:24 -0500 |
parents | feecd33c8390 |
children | f25d12179c6c |
line wrap: on
line source
.. image:: https://travis-ci.org/GReD-Clermont/CLIFinder.svg?branch=master :target: https://travis-ci.org/GReD-Clermont/CLIFinder CLIFinder v0.5.0 ================ Description ----------- L1 Chimeric Transcripts (LCTs) are transcribed from LINE 1 antisense promoter and include the L1 5’UTR sequence in antisense orientation followed by the adjacent genomic region. CLIFinder v0.4.1 is a Galaxy tool, specifically designed to identify potential LCTs from one or several oriented RNA-seq paired-end reads in the human genome. CLIFinder v0.4.1 is customizable to detect transcripts initiated by different types of repeat elements. Installation ------------ Some tools are used by CLIFinder that must be installed and added to the PATH (listed in CLIFinder.xml). This is easily done through conda with the command: :: conda create -n clifinder samtools=1.9 bedtools=2.26.0gx repeatmasker=4.0.9_p2 bwa=0.7.17 fastx_toolkit=0.0.14 perl=5.26.2 perl-getopt-long=2.50 perl-file-copy-recursive=0.45 perl-parallel-forkmanager=2.02 perl-statistics-r=0.34 r-base=3.5.1 r-plyr=1.8.5 bioconductor-genomicranges=1.34.0 wget=1.20.1 You should then be able to use CLIFinder by activating the conda environment and running the script with: :: conda activate clifinder perl script/CLIFinder.pl Galaxy uses conda to solve tool requirements starting with Galaxy release 16.01 so that is the minimum version required to install this tool (which can be done through the toolshed: https://toolshed.g2.bx.psu.edu/repository?repository_id=5c73d1cf20ab37c3). Usage ----- The command you need to use to run the script is as follows: :: CLIFinder.pl --first <first fastq of paired-end set 1> --name <name 1> --second <second fastq of paired-end set 1> [--first <first fastq of paired-end set 2> --name <name 2> --second <second fastq of paired-end set 2> ...] --ref <reference genome> [--build_ref] --TE <transposable elements> [--build_TE] --html <results.html> --html-path <results directory> [options] Arguments: --first <fastq file> First fastq file to process from paired-end set --name <name> Name of the content to process --second <fastq file> Second fastq file to process from paired-end set --ref <reference> Fasta file containing the reference genome --TE <TE> Fasta file containing the transposable elements --rmsk <text file> Tab-delimited text file (with headers) containing reference repeat sequences (e.g. rmsk track from UCSC) --refseq <text file> Tab-delimited file (with headers) containing reference genes (e.g. RefGene.txt from UCSC) --html Main HTML file where results will be displayed --html-path Folder where results will be stored For any fasta file, if a bwa index is not provided, you should build it through the corresponding '--build_[element]' argument Options: --rnadb <RNA db> Blast database containing RNA sequences (default: empty) --estdb <EST db> Blast database containing EST sequences (default: empty) --size_read <INT> Size of reads (default: 100) --BDir <0|1|2> Orientation of reads (0: undirectional libraries, 1: TEs sequences in first read in pair, 2: TEs sequences in second read in pair) (default: 0) --size_insert <INT> Maximum size of insert tolerated between R1 and R2 for alignment on the reference genome (default: 250) --min_L1 <INT> Minimum number of bp matching for L1 mapping (default: 50) --mis_L1 <INT> Maximum number of mismatches tolerated for L1 mapping (default: 2) --min_unique <INT> Minimum number of consecutive bp not annotated by RepeatMasker (default: 33) --threads <INT> Number of threads (default: 1) For Blast database files, if a fasta is provided, the database can be built with '--build_[db]'. Otherwise, provide a path or URL. "tar(.gz)" files are acceptable, as well as wild card (rna*) URLs.