# HG changeset patch # User jjohnson # Date 1633970451 0 # Node ID 25d207f7ff836d0c07ce6b3886c4ec4c608ecac4 # Parent 7253b367c082ba2bc5fab6b2ab95dae0c69562b6 "planemo upload for repository https://github.com/jj-umn/tools-iuc/tree/arriba/tools/arriba commit e113a79cc67e0bdb168babfe964f34873b2e1303" diff -r 7253b367c082 -r 25d207f7ff83 arriba.xml --- a/arriba.xml Mon Oct 11 01:47:22 2021 +0000 +++ b/arriba.xml Mon Oct 11 16:40:51 2021 +0000 @@ -55,7 +55,7 @@ #end if #end if -a '$genome_assembly' - -g '$gtf' + -g '$annotation' #if $blacklist -b '$blacklist' #else @@ -155,57 +155,8 @@ && samtools index Aligned.sortedByCoord.out.bam #end if #if str($visualization.do_viz) == "yes" -&& draw_fusions.R - --fusions=fusions.tsv - --alignments=Aligned.sortedByCoord.out.bam - --annotation='$gtf' - --output=fusions.pdf - #if $visualization.cytobands - --cytobands='$visualization.cytobands' - #end if - #if $protein_domains - --proteinDomains='$protein_domains' - #end if - ## Visualization Options - #if $visualization.options.transcriptSelection - --transcriptSelection=$visualization.options.transcriptSelection - #end if - #if $visualization.options.minConfidenceForCircosPlot - --minConfidenceForCircosPlot=$visualization.options.minConfidenceForCircosPlot - #end if - #if $visualization.options.showIntergenicVicinity - --showIntergenicVicinity=$visualization.options.showIntergenicVicinity - #end if - #if $visualization.options.squishIntrons - --squishIntrons=$visualization.options.squishIntrons - #end if - #if $visualization.options.mergeDomainsOverlappingBy - --mergeDomainsOverlappingBy=$visualization.options.mergeDomainsOverlappingBy - #end if - #if $visualization.options.printExonLabels - --printExonLabels=$visualization.options.printExonLabels - #end if - #if $visualization.options.render3dEffect - --render3dEffect=$visualization.options.render3dEffect - #end if - #if $visualization.options.optimizeDomainColors - --optimizeDomainColors=$visualization.options.optimizeDomainColors - #end if - #if $visualization.options.color1 - --color1=$visualization.options.color1 - #end if - #if $visualization.options.color2 - --color2=$visualization.options.color2 - #end if - #if $visualization.options.pdfWidth - --pdfWidth=$visualization.options.pdfWidth - #end if - #if $visualization.options.pdfHeight - --pdfHeight=$visualization.options.pdfHeight - #end if - #if $visualization.options.fontSize - --fontSize=$visualization.options.fontSize - #end if +#set $fusions = 'fusions.tsv' +&& @DRAW_FUSIONS@ #end if ]]> @@ -243,7 +194,7 @@ - + @@ -433,120 +384,15 @@ - -
- - By default the transcript isoform with the highest coverage is drawn. - Alternatively, the transcript isoform that is provided in the columns - transcript_id1 and transcript_id2 in the given fusions file can be drawn. - Selecting the isoform with the highest coverage usually produces nicer plots, - in the sense that the coverage track is smooth and shows a visible increase in coverage after the fusion breakpoint. - However, the isoform with the highest coverage may not be the one that is involved in the fusion. - Often, genomic rearrangements lead to non-canonical isoforms being transcribed. - For this reason, it can make sense to rely on the transcript selection provided by the columns transcript_id1/2, - which reflect the actual isoforms involved in a fusion. -\ As a third option, the transcripts that are annotated as canonical can be drawn. - Transcript isoforms tagged with appris_principal, appris_candidate, or CCDS are considered canonical. - - - - - - - The fusion of interest is drawn as a solid line in the circos plot. - To give an impression of the overall degree of rearrangement, - all other fusions are drawn as semi-transparent lines in the background. - This option determines which other fusions should be included in the circos plot. - Values specify the minimum confidence a fusion must have to be included. - It usually makes no sense to include low-confidence fusions in circos plots, - because they are abundant and unreliable, and would clutter up the circos plot. - Default: medium - - - - - - - - This option only applies to intergenic breakpoints. - If it is set to a value greater than 0, then the script draws the genes - which are no more than the given distance away from an intergenic breakpoint. - Note that this option is incompatible with squishIntrons. - Default: 0 - - - - Exons usually make up only a small fraction of a gene. - They may be hard to see in the plot. i - Since introns are in most situations of no interest in the context of gene fusions, - this switch can be used to shrink the size of introns to a fixed, negligible size. - It makes sense to disable this feature, if breakpoints in introns are of importance. - Default: TRUE - - - - - - - Occasionally, domains are annotated redundantly. - For example, tyrosine kinase domains are frequently annotated as - Protein tyrosine kinase and Protein kinase domain. - In order to simplify the visualization, such domains can be merged into one, - given that they overlap by the given fraction. - The description of the larger domain is used. - Default: 0.9 - - - - By default the number of an exon is printed inside each exon, - which is taken from the attribute exon_number of the GTF annotation. - When a gene has many exons, the boxes may be too narrow to contain the labels, - resulting in unreadable exon labels. In these situations, i - it may be better to turn off exon labels. - Default: TRUE - - - - - - Whether light and shadow should be rendered to give objects a 3D effect. - Default: TRUE - - - - - - By default, the script colorizes domains according to the colors - specified in the file given in --annotation. - This way, coloring of domains is consistent across all proteins. - But since there are more distinct domains than colors, - this can lead to different domains having the same color. - If this option is set to TRUE, the colors are recomputed for each fusion separately. - This ensures that the colors have the maximum distance for each individual fusion, - but they are no longer consistent across different fusions. - Default: FALSE - - - - - - - - - -
- +
- - + + output_fusions_discarded == "yes" @@ -564,13 +410,13 @@ - + - + @@ -583,15 +429,22 @@ - + + + - + + + + + + diff -r 7253b367c082 -r 25d207f7ff83 arriba_draw_fusions.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/arriba_draw_fusions.xml Mon Oct 11 16:40:51 2021 +0000 @@ -0,0 +1,282 @@ + + + + macros.xml + + + + + + + + + + +
+ +
+
+ + + visualization['do_viz'] == "yes" + + + + + + + + + + +
+ +
+ + + + + +
+
+ + +
diff -r 7253b367c082 -r 25d207f7ff83 macros.xml --- a/macros.xml Mon Oct 11 01:47:22 2021 +0000 +++ b/macros.xml Mon Oct 11 16:40:51 2021 +0000 @@ -1,7 +1,6 @@ 2.1.0 0 -dd arriba @@ -17,4 +16,164 @@ arriba -h | grep Version | sed 's/^.* //' + + +
+ + By default the transcript isoform with the highest coverage is drawn. + Alternatively, the transcript isoform that is provided in the columns + transcript_id1 and transcript_id2 in the given fusions file can be drawn. + Selecting the isoform with the highest coverage usually produces nicer plots, + in the sense that the coverage track is smooth and shows a visible increase in coverage after the fusion breakpoint. + However, the isoform with the highest coverage may not be the one that is involved in the fusion. + Often, genomic rearrangements lead to non-canonical isoforms being transcribed. + For this reason, it can make sense to rely on the transcript selection provided by the columns transcript_id1/2, + which reflect the actual isoforms involved in a fusion. +\ As a third option, the transcripts that are annotated as canonical can be drawn. + Transcript isoforms tagged with appris_principal, appris_candidate, or CCDS are considered canonical. + + + + + + + The fusion of interest is drawn as a solid line in the circos plot. + To give an impression of the overall degree of rearrangement, + all other fusions are drawn as semi-transparent lines in the background. + This option determines which other fusions should be included in the circos plot. + Values specify the minimum confidence a fusion must have to be included. + It usually makes no sense to include low-confidence fusions in circos plots, + because they are abundant and unreliable, and would clutter up the circos plot. + Default: medium + + + + + + + + This option only applies to intergenic breakpoints. + If it is set to a value greater than 0, then the script draws the genes + which are no more than the given distance away from an intergenic breakpoint. + Note that this option is incompatible with squishIntrons. + Default: 0 + + + + Exons usually make up only a small fraction of a gene. + They may be hard to see in the plot. i + Since introns are in most situations of no interest in the context of gene fusions, + this switch can be used to shrink the size of introns to a fixed, negligible size. + It makes sense to disable this feature, if breakpoints in introns are of importance. + Default: TRUE + + + + + + + Occasionally, domains are annotated redundantly. + For example, tyrosine kinase domains are frequently annotated as + Protein tyrosine kinase and Protein kinase domain. + In order to simplify the visualization, such domains can be merged into one, + given that they overlap by the given fraction. + The description of the larger domain is used. + Default: 0.9 + + + + By default the number of an exon is printed inside each exon, + which is taken from the attribute exon_number of the GTF annotation. + When a gene has many exons, the boxes may be too narrow to contain the labels, + resulting in unreadable exon labels. In these situations, i + it may be better to turn off exon labels. + Default: TRUE + + + + + + Whether light and shadow should be rendered to give objects a 3D effect. + Default: TRUE + + + + + + By default, the script colorizes domains according to the colors + specified in the file given in --annotation. + This way, coloring of domains is consistent across all proteins. + But since there are more distinct domains than colors, + this can lead to different domains having the same color. + If this option is set to TRUE, the colors are recomputed for each fusion separately. + This ensures that the colors have the maximum distance for each individual fusion, + but they are no longer consistent across different fusions. + Default: FALSE + + + + + + + + + +
+
+ +draw_fusions.R + --fusions='$fusions' + --alignments='Aligned.sortedByCoord.out.bam' + --annotation='$annotation' + --output=fusions.pdf + #if $visualization.cytobands + --cytobands='$visualization.cytobands' + #end if + #if $protein_domains + --proteinDomains='$protein_domains' + #end if + ## Visualization Options + #if $visualization.options.transcriptSelection + --transcriptSelection=$visualization.options.transcriptSelection + #end if + #if $visualization.options.minConfidenceForCircosPlot + --minConfidenceForCircosPlot=$visualization.options.minConfidenceForCircosPlot + #end if + #if $visualization.options.showIntergenicVicinity + --showIntergenicVicinity=$visualization.options.showIntergenicVicinity + #end if + #if $visualization.options.squishIntrons + --squishIntrons=$visualization.options.squishIntrons + #end if + #if $visualization.options.mergeDomainsOverlappingBy + --mergeDomainsOverlappingBy=$visualization.options.mergeDomainsOverlappingBy + #end if + #if $visualization.options.printExonLabels + --printExonLabels=$visualization.options.printExonLabels + #end if + #if $visualization.options.render3dEffect + --render3dEffect=$visualization.options.render3dEffect + #end if + #if $visualization.options.optimizeDomainColors + --optimizeDomainColors=$visualization.options.optimizeDomainColors + #end if + #if $visualization.options.color1 + --color1=$visualization.options.color1 + #end if + #if $visualization.options.color2 + --color2=$visualization.options.color2 + #end if + #if $visualization.options.pdfWidth + --pdfWidth=$visualization.options.pdfWidth + #end if + #if $visualization.options.pdfHeight + --pdfHeight=$visualization.options.pdfHeight + #end if + #if $visualization.options.fontSize + --fontSize=$visualization.options.fontSize + #end if +
diff -r 7253b367c082 -r 25d207f7ff83 test-data/fusions.tsv --- a/test-data/fusions.tsv Mon Oct 11 01:47:22 2021 +0000 +++ b/test-data/fusions.tsv Mon Oct 11 16:40:51 2021 +0000 @@ -1,2 +1,2 @@ #gene1 gene2 strand1(gene/fusion) strand2(gene/fusion) breakpoint1 breakpoint2 site1 site2 type split_reads1 split_reads2 discordant_mates coverage1 coverage2 confidence reading_frame tags retained_protein_domains closest_genomic_breakpoint1 closest_genomic_breakpoint2 gene_id1 gene_id2 transcript_id1 transcript_id2 direction1 direction2 filters fusion_transcript peptide_sequence read_identifiers -BCR ABL1 +/+ +/+ 22:23632600 9:133729451 CDS/splice-site CDS/splice-site translocation 4 7 0 4 12 high in-frame Mitelman Bcr-Abl_oncoprotein_oligomerisation_domain(100%),C2_domain(100%),PH_domain(100%),RhoGEF_domain(100%)|F-actin_binding(100%),Protein_kinase_domain(100%),SH2_domain(100%),SH3_domain(100%),Variant_SH3_domain(100%) . . ENSG00000186716.15 ENSG00000097007.13 ENST00000305877.8 ENST00000372348.2 downstream upstream . AGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAG___ATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAA|AAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAG___GTGAAAAGCTCCGGG SFSLTSVELQMLTNSCVKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSS|kALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLR BCR-ABL1-10,BCR-ABL1-2,BCR-ABL1-24,BCR-ABL1-28,BCR-ABL1-58,BCR-ABL1-60,BCR-ABL1-76,BCR-ABL1-12,BCR-ABL1-18,BCR-ABL1-4,BCR-ABL1-66 +BCR ABL1 +/+ +/+ 22:230999 9:275100 CDS/splice-site CDS/splice-site translocation 1 3 0 3 8 low in-frame . Bcr-Abl_oncoprotein_oligomerisation_domain(100%),C2_domain(100%),RhoGEF_domain(100%)|F-actin_binding(100%),Protein_kinase_domain(100%),SH2_domain(100%),SH3_domain(100%) . . ENSG00000186716 ENSG00000097007 ENST00000305877 ENST00000372348 downstream upstream . AGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAG___ATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAA|AAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAG___GTGAAAAGCTCCGGG SFSLTSVELQMLTNSCVKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSS|kALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLR BCR-ABL1-4,BCR-ABL1-28,BCR-ABL1-60,BCR-ABL1-76