# HG changeset patch # User iuc # Date 1671385170 0 # Node ID ee4a907608481f48cdd41980fbdad61a1bbdc5f9 # Parent e7aff4a85df572773f0591f1cfabd6910d7aeb79 planemo upload for repository https://github.com/COMBAT-TB/tb_variant_filter commit e064fb07acad057d3df849a6f153ed6ef90837f1 diff -r e7aff4a85df5 -r ee4a90760848 tb_variant_filter.xml --- a/tb_variant_filter.xml Sun Aug 29 21:46:17 2021 +0000 +++ b/tb_variant_filter.xml Sun Dec 18 17:39:30 2022 +0000 @@ -1,7 +1,7 @@ - + M. tuberculosis H37Rv VCF filter - 0.3.5 + 0.3.6 tb_variant_filter @@ -12,7 +12,7 @@ #if str($filter_options.show_filter_options) == "yes": --region_filter $filter_options.region_filter #else - --region_filter pe_ppe,uvp + --region_filter farhat_rlc #end if #end if #if "close_to_indel_filter" in str($filters).split(',') @@ -57,11 +57,12 @@ - - + + + - + `_ - 2. PE/PPE genes from `Fishbein et al 2015 `_ - 3. `TBProfiler `_ list of antibiotic resistant genes - 4. `MTBseq `_ list of antibiotic resistant genes - 5. `UVP `_ list of repetitive loci in M. tuberculosis genome + 1. Refined Low Confidence (RLC) regions from `Marin et al 2022 `_ + 2. Refined Low Confidence (RLC) and Low Mappability regions from `Marin et al 2022 `_ + 3. PE/PPE genes from `Fishbein et al 2015 `_ + 4. `TBProfiler `_ list of antibiotic resistant genes + 5. `MTBseq `_ list of antibiotic resistant genes + 6. `UVP `_ list of repetitive loci in M. tuberculosis genome 2. Filter by window around indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or deletion site. 3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage (by default 90%) alternative alleles. 4. Filter by depth of aligned reads. 5. Filter out all variants that are not SNV (single nucleotide variants). -For region filtering, the default choice is to use the PE/PPE and UVP regions to mask out variants. `Marin et al 2021 `_ -from Prof Maha Farhat's lab make a persuasive argument that their smaller list of Refined Low Confidence (RLC) regions is a better argument but this work has not yet been peer -review so it is included as an option that is not currently the default. +For region filtering, the default choice is to use the RLC regions. These are based on `Marin et al 2022 `_, +a study of regions of the M. tuberculosis H37Rv genome where Illumina reads don't map well. If you are using reads shower than 100 base pairs +or single-ended reads, you should use the RLC and Low Mappability region filter. The PE/PPE and UVP region filters are retained for backward compatibility +but the afore-mentioned paper has shown that they exclude too much of the genome from analysis. When used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters). ]]>