changeset 5:ee4a90760848 draft

planemo upload for repository https://github.com/COMBAT-TB/tb_variant_filter commit e064fb07acad057d3df849a6f153ed6ef90837f1
author iuc
date Sun, 18 Dec 2022 17:39:30 +0000
parents e7aff4a85df5
children 32f14a2723ec
files tb_variant_filter.xml
diffstat 1 files changed, 17 insertions(+), 14 deletions(-) [+]
line wrap: on
line diff
--- a/tb_variant_filter.xml	Sun Aug 29 21:46:17 2021 +0000
+++ b/tb_variant_filter.xml	Sun Dec 18 17:39:30 2022 +0000
@@ -1,7 +1,7 @@
-<tool id="tb_variant_filter" name="TB Variant Filter" version="@TOOL_VERSION@+galaxy2" profile="20.09">
+<tool id="tb_variant_filter" name="TB Variant Filter" version="@TOOL_VERSION@+galaxy0" profile="20.09">
     <description>M. tuberculosis H37Rv VCF filter</description>
     <macros>
-        <token name="@TOOL_VERSION@">0.3.5</token>
+        <token name="@TOOL_VERSION@">0.3.6</token>
     </macros>
     <requirements>
         <requirement type="package" version="@TOOL_VERSION@">tb_variant_filter</requirement>
@@ -12,7 +12,7 @@
             #if str($filter_options.show_filter_options) == "yes":
                 --region_filter $filter_options.region_filter
             #else
-                --region_filter pe_ppe,uvp
+                --region_filter farhat_rlc
             #end if
         #end if
         #if "close_to_indel_filter" in str($filters).split(',')
@@ -57,11 +57,12 @@
             <when value="yes">
                 <param argument="--region_filter" type="select" multiple="true" label="Region filters to enable">
                     <!-- if these are changed the code above needs to change to keep the defaults in line with those that are default here -->
-                    <option value="farhat_rlc">Refined Low Confidence regions from Farhat lab</option>
-                    <option value="pe_ppe" selected="true">PE/PPE</option>
+                    <option value="farhat_rlc" selected="true">Refined Low Confidence regions from Farhat lab</option>
+                    <option value="farhat_rlc_lowmap">Refined Low Confidence and Low Mappability regions from Farhat lab (for &lt; 100bp or single ended reads)</option>
+                    <option value="pe_ppe">PE/PPE</option>
                     <option value="tbprofiler">TBProfiler antibiotic resistant genes</option>
                     <option value="mtbseq">MTBseq antibiotic resistant genes</option>
-                    <option value="uvp" selected="true">UVP repeat / insertion sequence sites</option>
+                    <option value="uvp">UVP repeat / insertion sequence sites</option>
                 </param>
                 <param argument="--indel_window_size" type="integer" value="5" label="Window to mask around indels"/>
                 <param argument="--min_percentage_alt" type="float" value="90"
@@ -145,19 +146,21 @@
 It currently has 5 main modes:
 
 1. Filter by region. Mask out variants in certain regions. Region lists available as:
-    1.  Refined Low Confidence (RLC) regions from `Marin et al 2021 <https://www.biorxiv.org/content/10.1101/2021.04.08.438862v1.full>`_
-    2.  PE/PPE genes from `Fishbein et al 2015 <https://onlinelibrary.wiley.com/doi/full/10.1111/mmi.12981>`_
-    3. `TBProfiler <http://tbdr.lshtm.ac.uk/>`_ list of antibiotic resistant genes
-    4. `MTBseq <https://github.com/ngs-fzb/MTBseq_source>`_ list of antibiotic resistant genes
-    5. `UVP <https://github.com/CPTR-ReSeqTB/UVP>`_ list of repetitive loci in M. tuberculosis genome
+    1.  Refined Low Confidence (RLC) regions from `Marin et al 2022 <https://doi.org/10.1093/bioinformatics/btac023>`_
+    2.  Refined Low Confidence (RLC) and Low Mappability regions from `Marin et al 2022 <https://doi.org/10.1093/bioinformatics/btac023>`_
+    3.  PE/PPE genes from `Fishbein et al 2015 <https://onlinelibrary.wiley.com/doi/full/10.1111/mmi.12981>`_
+    4. `TBProfiler <http://tbdr.lshtm.ac.uk/>`_ list of antibiotic resistant genes
+    5. `MTBseq <https://github.com/ngs-fzb/MTBseq_source>`_ list of antibiotic resistant genes
+    6. `UVP <https://github.com/CPTR-ReSeqTB/UVP>`_ list of repetitive loci in M. tuberculosis genome
 2. Filter by window around indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or deletion site.
 3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage (by default 90%) alternative alleles.
 4. Filter by depth of aligned reads.
 5. Filter out all variants that are not SNV (single nucleotide variants).
 
-For region filtering, the default choice is to use the PE/PPE and UVP regions to mask out variants. `Marin et al 2021 <https://www.biorxiv.org/content/10.1101/2021.04.08.438862v1.full>`_
-from Prof Maha Farhat's lab make a persuasive argument that their smaller list of Refined Low Confidence (RLC) regions is a better argument but this work has not yet been peer
-review so it is included as an option that is not currently the default.
+For region filtering, the default choice is to use the RLC regions. These are based on `Marin et al 2022 <https://doi.org/10.1093/bioinformatics/btac023>`_, 
+a study of regions of the M. tuberculosis H37Rv genome where Illumina reads don't map well. If you are using reads shower than 100 base pairs
+or single-ended reads, you should use the RLC and Low Mappability region filter. The PE/PPE and UVP region filters are retained for backward compatibility
+but the afore-mentioned paper has shown that they exclude too much of the genome from analysis.
 
 When used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).
     ]]></help>