Galaxy |

Changeset 0:5ebf2354cc9b (2021-10-07)

Next changeset 1:9f2665b32c45 (2021-10-08)

Commit message:
"planemo upload for repository https://github.com/jj-umn/tools-iuc/tree/arriba/tools/arriba commit 52c9f9825debe783339c13bd1da9a42b59747bd2"

added:
arriba.help
arriba.xml
macros.xml

diff -r 000000000000 -r 5ebf2354cc9b arriba.help
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/arriba.help Thu Oct 07 11:47:02 2021 +0000

[

b"@@ -0,0 +1,191 @@\n+% arriba -h\n+[2021-10-06T19:04:33] Launching Arriba 2.1.0\n+\n+Arriba gene fusion detector\n+---------------------------\n+Version: 2.1.0\n+\n+Arriba is a fast tool to search for aberrant transcripts such as gene fusions.\n+It is based on chimeric alignments found by the STAR RNA-Seq aligner.\n+\n+Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \\\n+ -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \\\n+ [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \\\n+ -o fusions.tsv [-O fusions.discarded.tsv] \\\n+ [OPTIONS]\n+\n+ -c FILE File in SAM/BAM/CRAM format with chimeric alignments as generated by STAR\n+ (Chimeric.out.sam). This parameter is only required, if STAR was run with the\n+ parameter '--chimOutType SeparateSAMold'. When STAR was run with the parameter\n+ '--chimOutType WithinBAM', it suffices to pass the parameter -x to Arriba and -c\n+ can be omitted.\n+\n+ -x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR\n+ (Aligned.out.sam). Arriba extracts candidate reads from this file.\n+\n+ -g FILE GTF file with gene annotation. The file may be gzip-compressed.\n+\n+ -G GTF_FEATURES Comma-/space-separated list of names of GTF features.\n+ Default: gene_name=gene_name|gene_id gene_id=gene_id\n+ transcript_id=transcript_id feature_exon=exon feature_CDS=CDS\n+\n+ -a FILE FastA file with genome sequence (assembly). The file may be gzip-compressed. An\n+ index with the file extension .fai must exist only if CRAM files are processed.\n+\n+ -b FILE File containing blacklisted events (recurrent artifacts and transcripts\n+ observed in healthy tissue).\n+\n+ -k FILE File containing known/recurrent fusions. Some cancer entities are often\n+ characterized by fusions between the same pair of genes. In order to boost\n+ sensitivity, a list of known fusions can be supplied using this parameter. The list\n+ must contain two columns with the names of the fused genes, separated by tabs.\n+\n+ -o FILE Output file with fusions that have passed all filters.\n+\n+ -O FILE Output file with fusions that were discarded due to filtering.\n+\n+ -t FILE Tab-separated file containing fusions to annotate with tags in the 'tags' column.\n+ The first two columns specify the genes; the third column specifies the tag. The\n+ file may be gzip-compressed.\n+\n+ -p FILE File in GFF3 format containing coordinates of the protein domains of genes. The\n+ protein domains retained in a fusion are listed in the column\n+ 'retained_protein_domains'. The file may be gzip-compressed.\n+\n+ -d FILE Tab-separated file with coordinates of structural variants found using\n+ whole-genome sequencing data. These coordinates serve to increase sensitivity\n+ towards weakly expressed fusions and to eliminate fusions with low evidence.\n+\n+ -D MAX_GENOMIC_BREAKPOINT_DISTANCE When a file with genomic breakpoints obtained via\n+ whole-genome sequencing is supplied via the -d\n+ parameter, this parameter determines how far a\n+ genomic breakpoint may be away from a\n+ transcriptomic breakpoint to consider it as a\n+ related event. For events inside genes, the\n+ distance is added to the end of the gene; for\n+ intergenic events, the distance threshold is\n+ applied as is. Default: 100000\n+\n+ -s STRANDEDNESS Whether a strand-specific protocol was used for library preparation,\n+ and if so, the type of strandedness (auto/yes/no/reverse). When\n+ unstranded data is processed, the strand can "..b" to a short stretch in one of the genes. The\n+ 'short_anchor' filter removes these fusions. This parameter sets\n+ the threshold in bp for what the filter considers short. Default: 23\n+\n+ -M MANY_SPLICED_EVENTS The 'many_spliced' filter recovers fusions between genes that\n+ have at least this many spliced breakpoints. Default: 4\n+\n+ -K MAX_KMER_CONTENT The 'low_entropy' filter removes reads with repetitive 3-mers. If\n+ the 3-mers make up more than the given fraction of the sequence, then\n+ the read is discarded. Default: 0.600000\n+\n+ -V MAX_MISMATCH_PVALUE The 'mismatches' filter uses a binomial model to calculate a\n+ p-value for observing a given number of mismatches in a read. If\n+ the number of mismatches is too high, the read is discarded.\n+ Default: 0.010000\n+\n+ -F FRAGMENT_LENGTH When paired-end data is given, the fragment length is estimated\n+ automatically and this parameter has no effect. But when single-end\n+ data is given, the mean fragment length should be specified to\n+ effectively filter fusions that arise from hairpin structures.\n+ Default: 200\n+\n+ -U MAX_READS Subsample fusions with more than the given number of supporting reads. This\n+ improves performance without compromising sensitivity, as long as the\n+ threshold is high. Counting of supporting reads beyond the threshold is\n+ inaccurate, obviously. Default: 300\n+\n+ -Q QUANTILE Highly expressed genes are prone to produce artifacts during library\n+ preparation. Genes with an expression above the given quantile are eligible\n+ for filtering by the 'in_vitro' filter. Default: 0.998000\n+\n+ -e EXONIC_FRACTION The breakpoints of false-positive predictions of intragenic events\n+ are often both in exons. True predictions are more likely to have at\n+ least one breakpoint in an intron, because introns are larger. If the\n+ fraction of exonic sequence between two breakpoints is smaller than\n+ the given fraction, the 'intragenic_exonic' filter discards the\n+ event. Default: 0.330000\n+\n+ -T TOP_N Only report viral integration sites of the top N most highly expressed viral\n+ contigs. Default: 5\n+\n+ -C COVERED_FRACTION Ignore virally associated events if the virus is not fully\n+ expressed, i.e., less than the given fraction of the viral contig is\n+ transcribed. Default: 0.150000\n+\n+ -l MAX_ITD_LENGTH Maximum length of internal tandem duplications. Note: Increasing\n+ this value beyond the default can impair performance and lead to many\n+ false positives. Default: 100\n+\n+ -u Instead of performing duplicate marking itself, Arriba relies on duplicate marking by a\n+ preceding program using the BAM_FDUP flag. This makes sense when unique molecular\n+ identifiers (UMI) are used.\n+\n+ -X To reduce the runtime and file size, by default, the columns 'fusion_transcript',\n+ 'peptide_sequence', and 'read_identifiers' are left empty in the file containing\n+ discarded fusion candidates (see parameter -O). When this flag is set, this extra\n+ information is reported in the discarded fusions file.\n+\n+ -I If assembly of the fusion transcript sequence from the supporting reads is incomplete\n+ (denoted as '...'), fill the gaps using the assembly sequence wherever possible.\n+\n+ -h Print help and exit.\n+\n+ Code repository: https://github.com/suhrig/arriba\n+ Get help/report bugs: https://github.com/suhrig/arriba/issues\n+ User manual: https://arriba.readthedocs.io/\n+ Please cite: https://doi.org/10.1101/gr.257246.119\n+\n"

diff -r 000000000000 -r 5ebf2354cc9b arriba.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/arriba.xml Thu Oct 07 11:47:02 2021 +0000

[

b'@@ -0,0 +1,242 @@\n+<tool id="arriba" name="Arriba" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.5">\n+ <description>detect gene fusions from STAR aligned RNA-Seq data</description>\n+ <macros>\n+ <import>macros.xml</import>\n+ </macros>\n+ <expand macro="requirements" />\n+ <expand macro="version_command" />\n+ <command detect_errors="exit_code"><![CDATA[\n+ arriba \n+ -x \'$input\'\n+ #if $chimeric\n+ -c \'$chimeric\'\n+ #endif\n+ -a \'$genome_assembly\'\n+ -g \'$gtf\'\n+ -b \'$blacklist\'\n+ #if \'$protein_domains\'\n+ -p \'$protein_domains\'\n+ #endif\n+ #if \'$known_fusions\'\n+ -k \'$known_fusions\'\n+ #endif\n+ #if \'$tags\'\n+ -t \'$tags\'\n+ #endif\n+ -o fusions.tsv\n+ -O fusions.discarded.tsv \n+ ]]></command>\n+ <inputs>\n+ <param name="input" argument="-x" type="data" format="sam,bam,cram" label="STAR Aligned.out.sam"/>\n+ <param name="chimeric" argument="-c" type="data" format="sam,bam,cram" optional="true" label="STAR Chimeric.out.sam">\n+ <help><![CDATA[ only required, if STAR was run with the parameter \'--chimOutType SeparateSAMold\' ]]></help>\n+ </param>\n+ <param name="genome_assembly" argument="-a" type="data" format="fasta" label="genome assembly fasta"/>\n+ <param name="gtf" argument="-g" type="data" format="gtf" label="GTF file with gene annotation"/>\n+ <param name="blacklist" argument="-b" type="data" format="tabular" label="File containing blacklisted ranges."/>\n+ <param name="protein_domains" argument="-p" type="data" format="gff3" optional="true" label="File containing blacklisted ranges."/>\n+ <param name="known_fusions" argument="-k" type="data" format="tabular" optional="true" label="File containing known fusions">\n+ <help><![CDATA[ file two TAB separated columns: five-prime region three-prime region ]]></help>\n+ </param>\n+ <param name="tags" argument="-t" type="data" format="tabular" optional="true" label="File containing tag names for a fusion."/>\n+ </inputs>\n+ <outputs>\n+ <data name="fusions" format="tabular" label="${tool.name} on ${on_string}: fusions.tsv" from_work_dir="fusions.tsv"/>\n+ <data name="discarded" format="tabular" label="${tool.name} on ${on_string}: fusions.discarded.tsv" from_work_dir="fusions.discarded.tsv"/>\n+ </outputs>\n+ <help><![CDATA[\n+\n+arriba -h\n+[2021-10-06T19:04:33] Launching Arriba 2.1.0\n+\n+Arriba gene fusion detector\n+---------------------------\n+Version: 2.1.0\n+\n+Arriba is a fast tool to search for aberrant transcripts such as gene fusions.\n+It is based on chimeric alignments found by the STAR RNA-Seq aligner.\n+\n+Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \\\n+ -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \\\n+ [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \\\n+ -o fusions.tsv [-O fusions.discarded.tsv] \\\n+ [OPTIONS]\n+\n+ -c FILE File in SAM/BAM/CRAM format with chimeric alignments as generated by STAR\n+ (Chimeric.out.sam). This parameter is only required, if STAR was run with the\n+ parameter \'--chimOutType SeparateSAMold\'. When STAR was run with the parameter\n+ \'--chimOutType WithinBAM\', it suffices to pass the parameter -x to Arriba and -c\n+ can be omitted.\n+\n+ -x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR\n+ (Aligned.out.sam). Arriba extracts candidate reads from this file.\n+\n+ -g FILE GTF file with gene annotation. The file may be gzip-compressed.\n+\n+ -G GTF_FEATURES Comma-/space-separated list of names of GTF features.\n+ Default: gene_name=gene_name|gene_id gene_id=gene_id\n+ transcript_id=transcript_id feature_exon=exon feature_CDS=CDS\n+\n+ -a FILE FastA file with genome sequence (assembly). The file may be gzip-compressed. An\n+ '..b' \'short_anchor\' filter removes these fusions. This parameter sets\n+ the threshold in bp for what the filter considers short. Default: 23\n+\n+ -M MANY_SPLICED_EVENTS The \'many_spliced\' filter recovers fusions between genes that\n+ have at least this many spliced breakpoints. Default: 4\n+\n+ -K MAX_KMER_CONTENT The \'low_entropy\' filter removes reads with repetitive 3-mers. If\n+ the 3-mers make up more than the given fraction of the sequence, then\n+ the read is discarded. Default: 0.600000\n+\n+ -V MAX_MISMATCH_PVALUE The \'mismatches\' filter uses a binomial model to calculate a\n+ p-value for observing a given number of mismatches in a read. If\n+ the number of mismatches is too high, the read is discarded.\n+ Default: 0.010000\n+\n+ -F FRAGMENT_LENGTH When paired-end data is given, the fragment length is estimated\n+ automatically and this parameter has no effect. But when single-end\n+ data is given, the mean fragment length should be specified to\n+ effectively filter fusions that arise from hairpin structures.\n+ Default: 200\n+\n+ -U MAX_READS Subsample fusions with more than the given number of supporting reads. This\n+ improves performance without compromising sensitivity, as long as the\n+ threshold is high. Counting of supporting reads beyond the threshold is\n+ inaccurate, obviously. Default: 300\n+\n+ -Q QUANTILE Highly expressed genes are prone to produce artifacts during library\n+ preparation. Genes with an expression above the given quantile are eligible\n+ for filtering by the \'in_vitro\' filter. Default: 0.998000\n+\n+ -e EXONIC_FRACTION The breakpoints of false-positive predictions of intragenic events\n+ are often both in exons. True predictions are more likely to have at\n+ least one breakpoint in an intron, because introns are larger. If the\n+ fraction of exonic sequence between two breakpoints is smaller than\n+ the given fraction, the \'intragenic_exonic\' filter discards the\n+ event. Default: 0.330000\n+\n+ -T TOP_N Only report viral integration sites of the top N most highly expressed viral\n+ contigs. Default: 5\n+\n+ -C COVERED_FRACTION Ignore virally associated events if the virus is not fully\n+ expressed, i.e., less than the given fraction of the viral contig is\n+ transcribed. Default: 0.150000\n+\n+ -l MAX_ITD_LENGTH Maximum length of internal tandem duplications. Note: Increasing\n+ this value beyond the default can impair performance and lead to many\n+ false positives. Default: 100\n+\n+ -u Instead of performing duplicate marking itself, Arriba relies on duplicate marking by a\n+ preceding program using the BAM_FDUP flag. This makes sense when unique molecular\n+ identifiers (UMI) are used.\n+\n+ -X To reduce the runtime and file size, by default, the columns \'fusion_transcript\',\n+ \'peptide_sequence\', and \'read_identifiers\' are left empty in the file containing\n+ discarded fusion candidates (see parameter -O). When this flag is set, this extra\n+ information is reported in the discarded fusions file.\n+\n+ -I If assembly of the fusion transcript sequence from the supporting reads is incomplete\n+ (denoted as \'...\'), fill the gaps using the assembly sequence wherever possible.\n+\n+ -h Print help and exit.\n+\n+ Code repository: https://github.com/suhrig/arriba\n+ Get help/report bugs: https://github.com/suhrig/arriba/issues\n+ User manual: https://arriba.readthedocs.io/\n+ Please cite: https://doi.org/10.1101/gr.257246.119\n+\n+ ]]></help>\n+ <expand macro="citations" />\n+</tool>\n'

diff -r 000000000000 -r 5ebf2354cc9b macros.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/macros.xml Thu Oct 07 11:47:02 2021 +0000

@@ -0,0 +1,20 @@
+<macros>
+    <token name="@TOOL_VERSION@">2.1.0</token>
+    <token name="@VERSION_SUFFIX@">0</token>
+dd
+    <xml name="requirements">
+        <requirements>
+        <requirement type="package" version="@TOOL_VERSION@">arriba</requirement>
+            <yield/>
+        </requirements>
+    </xml>
+    <xml name="citations">
+        <citations>
+            <citation type="doi">10.1101/gr.257246.119</citation>
+            <yield />
+        </citations>
+    </xml>
+    <xml name="version_command">
+        <version_command>arriba -h | grep Version | sed 's/^.* //'</version_command>
+    </xml>
+</macros>