# HG changeset patch # User geert-vandeweyer # Date 1601022657 0 # Node ID 5c324f9a4e20b3c3b2f9d409dbe6fb9fc5d8e090 # Parent febc6023d37bf9412e7cfebc06582a107164f6b5 Uploaded diff -r febc6023d37b -r 5c324f9a4e20 README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,4 @@ +varAmpliCNV +=========== + +Wrappers for the varAmpliCNV package for HaloPlex CNV calling diff -r febc6023d37b -r 5c324f9a4e20 VarAmpliCNV_Anno.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/VarAmpliCNV_Anno.xml Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,51 @@ + + + cmgantwerpen/varamplicnv:1.0.0 + + + + + + + + + + + + + +**VarAmpliCNV : BED file PreProcessing** + +Preprocessing includes removing SNP (unwanted amplicons) coordinates and duplicate coordinates. It also adds Region annotations to input region of interest (ROI) used during plotting. + +**Parameters are :** + +* Amplicon Design File (BED) : This is the *exact* BED file provided by HaloPlex, containing the restriction fragments. +* ROI file (BED) : Typically the file provided *to* HaloPlex as the basis of the design. Names in column 4 are used to annotate CNV plots +* Ignore list (BED) : (optional) Provide amplicons present in amplicon design, to exclude during the analysis + +**Output files :** + +* DeDuplicated amplicon List (BED) : Use this file in subsequent steps (GC and Counting). +* ROI-Annotated List : This file is needed during CNV-calling. + + diff -r febc6023d37b -r 5c324f9a4e20 VarAmpliCNV_CallCNVs.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/VarAmpliCNV_CallCNVs.xml Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,96 @@ + + + cmgantwerpen/varamplicnv:1.0.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**VarAmpliCNV : Call CNVs** + +During CNV calling read counts are normalized over all samples inluced during "count merging", a set fraction of variance is removed and circular binary segmentation is applied to identify CNVs. If specified, a post-processing step is applied to take amplicon size and overlap into account to estimate reliability of the event. Passing CNVs are plotted gene-by-gene. + +**Parameters are :** + +* Sample Amplicon Counts (RData) : The result from the 'Merge Counts' tool. It contains a raw sample-by-amplicon count matrix. +* Amplicon BED file (BED) : This is the *exact* BED file provided by HaloPlex for the used library, with duplicates removed using the VarAmpliCNV "Annotate" tool. +* ROI file (BED) : This is the *exact* BED file provided by HaloPlex for the used library, with duplicates removed and annotated with gene symbols using the VarAmpliCNV "Annotate" tool. +* Amplicon GC Content (txt) : GC-content of individual amplicons, used for count correction. Generated using VarAmpliCNV 'Amplicon GC-Content' tool. +* Sample Genders (txt) : Optional. If specified, build gender-specific normalization sets for X and Y chromosomes. Format is tab-separated : SamplenName<tab>M/F/U +* Fraction of Variance to Remove : Using an approach similar to Principal component analysis, a preset fraction of noise is removed from the data. Higher values typically result in less CNVs. +* Analysis Type : Direct Segmentation applies only CBS and will not plot results. Amplicon Overlap Filtering is a post-processing filter to improve specificity, and will also enable plotting. The full CBS-results are always returned for manual inspection. +* Thresholds : Set mininal values for LogR-based filtering of the called Segments. + + +**Output files :** + +* Parameter_settings : Overview of set and derived settings + a list of discarded samples +* Plots.Quality_Measures : Quality metrics: coverage, variance by PC, GC-coverage-correlation +* Plots.Results : Gene-based CNV plots for segments passing the filters (if AOF is activated) +* Table.Results.Full : Full CBS results +* Table.Results.Filtered : Filtered CBS results (on LogR). + + diff -r febc6023d37b -r 5c324f9a4e20 VarAmpliCNV_Count.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/VarAmpliCNV_Count.xml Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,54 @@ + + + cmgantwerpen/varamplicnv:1.0.0 + + + + + + + + + + + + +**VarAmpliCNV : Counting** + +BAM files are parsed for readpairs exactly matching specified amplicons, based on start and end position. + +**Parameters are :** + +* Amplicon Design File (BED) : The de-duplicated amplicon list, generated by "varAmpliCNV Annotate". +* Sample Data (BAM) : The sample read data, provided as a single BAM file, or a collection of BAM files. + +**Output files :** + +* Count file (txt): the amplicon-read table. +* Unmapped (BAM) : Reads not matching amplicons. This bam file can be used to investigate issues. +* Stats (txt): Overview of the matching performance. Use it to investigate issues. + +In case a collection of BAM files is provided, the output files will be grouped in collections as well. + + + diff -r febc6023d37b -r 5c324f9a4e20 VarAmpliCNV_GC.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/VarAmpliCNV_GC.xml Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,43 @@ + + + cmgantwerpen/varamplicnv:1.0.0 + + + + + + + + + + + + + +**VarAmpliCNV : GC-calculation** + +Calculate the GC-content of entries in a BED file. + +**Parameters are :** + +* Amplicon Design File (BED) : The de-duplicated amplicon list, generated by "varAmpliCNV Annotate". +* Genome Build : Select a genome build from the configured options to extract GC content from. + +**Output files :** + +* GC-content (txt) : Tabular file containing GC information per region. + + + diff -r febc6023d37b -r 5c324f9a4e20 VarAmpliCNV_MergeCounts.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/VarAmpliCNV_MergeCounts.xml Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,39 @@ + + + cmgantwerpen/varamplicnv:1.0.0 + + + + + + + + + + +**VarAmpliCNV : Merge Count files** + +Merge a list of count files from "varAmpliCNV Count" into a single cohort for CNV-calling. + +**Parameters are :** + +* Count file (txt) : A list or collection of count files from "varAmpliNCV Count". + + +**Output files :** + +* Count-by-Sample Matrix (Rdata) : Rdata object containing counts for all provided samples. + + diff -r febc6023d37b -r 5c324f9a4e20 tool-data/TwoBit.loc.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/TwoBit.loc.sample Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,9 @@ +# This file lists 2bit indices used for GC-computation in VarAmpliCNV + +# white space is TAB ! + +# make sure to add the path to docker_volumes ! + +# +# +#hg19 hg19 Human: hg19/GRCh37 /opt/NGS/References/hg19/2bit/hg19.2bit diff -r febc6023d37b -r 5c324f9a4e20 tool_data_table_conf.xml.sample --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_data_table_conf.xml.sample Fri Sep 25 08:30:57 2020 +0000 @@ -0,0 +1,7 @@ + + + + short, dbkey, name, value + +
+
diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/README.rst --- a/varamplicnv-5bafb1c69d03/README.rst Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,4 +0,0 @@ -varAmpliCNV -=========== - -Wrappers for the varAmpliCNV package for HaloPlex CNV calling diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/VarAmpliCNV_Anno.xml --- a/varamplicnv-5bafb1c69d03/VarAmpliCNV_Anno.xml Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,51 +0,0 @@ - - - cmgantwerpen/varamplicnv:1.0.0 - - - - - - - - - - - - - -**VarAmpliCNV : BED file PreProcessing** - -Preprocessing includes removing SNP (unwanted amplicons) coordinates and duplicate coordinates. It also adds Region annotations to input region of interest (ROI) used during plotting. - -**Parameters are :** - -* Amplicon Design File (BED) : This is the *exact* BED file provided by HaloPlex, containing the restriction fragments. -* ROI file (BED) : Typically the file provided *to* HaloPlex as the basis of the design. Names in column 4 are used to annotate CNV plots -* Ignore list (BED) : (optional) Provide amplicons present in amplicon design, to exclude during the analysis - -**Output files :** - -* DeDuplicated amplicon List (BED) : Use this file in subsequent steps (GC and Counting). -* ROI-Annotated List : This file is needed during CNV-calling. - - diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/VarAmpliCNV_CallCNVs.xml --- a/varamplicnv-5bafb1c69d03/VarAmpliCNV_CallCNVs.xml Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,96 +0,0 @@ - - - cmgantwerpen/varamplicnv:1.0.0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -**VarAmpliCNV : Call CNVs** - -During CNV calling read counts are normalized over all samples inluced during "count merging", a set fraction of variance is removed and circular binary segmentation is applied to identify CNVs. If specified, a post-processing step is applied to take amplicon size and overlap into account to estimate reliability of the event. Passing CNVs are plotted gene-by-gene. - -**Parameters are :** - -* Sample Amplicon Counts (RData) : The result from the 'Merge Counts' tool. It contains a raw sample-by-amplicon count matrix. -* Amplicon BED file (BED) : This is the *exact* BED file provided by HaloPlex for the used library, with duplicates removed using the VarAmpliCNV "Annotate" tool. -* ROI file (BED) : This is the *exact* BED file provided by HaloPlex for the used library, with duplicates removed and annotated with gene symbols using the VarAmpliCNV "Annotate" tool. -* Amplicon GC Content (txt) : GC-content of individual amplicons, used for count correction. Generated using VarAmpliCNV 'Amplicon GC-Content' tool. -* Sample Genders (txt) : Optional. If specified, build gender-specific normalization sets for X and Y chromosomes. Format is tab-separated : SamplenName<tab>M/F/U -* Fraction of Variance to Remove : Using an approach similar to Principal component analysis, a preset fraction of noise is removed from the data. Higher values typically result in less CNVs. -* Analysis Type : Direct Segmentation applies only CBS and will not plot results. Amplicon Overlap Filtering is a post-processing filter to improve specificity, and will also enable plotting. The full CBS-results are always returned for manual inspection. -* Thresholds : Set mininal values for LogR-based filtering of the called Segments. - - -**Output files :** - -* Parameter_settings : Overview of set and derived settings + a list of discarded samples -* Plots.Quality_Measures : Quality metrics: coverage, variance by PC, GC-coverage-correlation -* Plots.Results : Gene-based CNV plots for segments passing the filters (if AOF is activated) -* Table.Results.Full : Full CBS results -* Table.Results.Filtered : Filtered CBS results (on LogR). - - diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/VarAmpliCNV_Count.xml --- a/varamplicnv-5bafb1c69d03/VarAmpliCNV_Count.xml Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,54 +0,0 @@ - - - cmgantwerpen/varamplicnv:1.0.0 - - - - - - - - - - - - -**VarAmpliCNV : Counting** - -BAM files are parsed for readpairs exactly matching specified amplicons, based on start and end position. - -**Parameters are :** - -* Amplicon Design File (BED) : The de-duplicated amplicon list, generated by "varAmpliCNV Annotate". -* Sample Data (BAM) : The sample read data, provided as a single BAM file, or a collection of BAM files. - -**Output files :** - -* Count file (txt): the amplicon-read table. -* Unmapped (BAM) : Reads not matching amplicons. This bam file can be used to investigate issues. -* Stats (txt): Overview of the matching performance. Use it to investigate issues. - -In case a collection of BAM files is provided, the output files will be grouped in collections as well. - - - diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/VarAmpliCNV_GC.xml --- a/varamplicnv-5bafb1c69d03/VarAmpliCNV_GC.xml Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,43 +0,0 @@ - - - cmgantwerpen/varamplicnv:1.0.0 - - - - - - - - - - - - - -**VarAmpliCNV : GC-calculation** - -Calculate the GC-content of entries in a BED file. - -**Parameters are :** - -* Amplicon Design File (BED) : The de-duplicated amplicon list, generated by "varAmpliCNV Annotate". -* Genome Build : Select a genome build from the configured options to extract GC content from. - -**Output files :** - -* GC-content (txt) : Tabular file containing GC information per region. - - - diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/VarAmpliCNV_MergeCounts.xml --- a/varamplicnv-5bafb1c69d03/VarAmpliCNV_MergeCounts.xml Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,39 +0,0 @@ - - - cmgantwerpen/varamplicnv:1.0.0 - - - - - - - - - - -**VarAmpliCNV : Merge Count files** - -Merge a list of count files from "varAmpliCNV Count" into a single cohort for CNV-calling. - -**Parameters are :** - -* Count file (txt) : A list or collection of count files from "varAmpliNCV Count". - - -**Output files :** - -* Count-by-Sample Matrix (Rdata) : Rdata object containing counts for all provided samples. - - diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/tool-data/TwoBit.loc.sample --- a/varamplicnv-5bafb1c69d03/tool-data/TwoBit.loc.sample Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,9 +0,0 @@ -# This file lists 2bit indices used for GC-computation in VarAmpliCNV - -# white space is TAB ! - -# make sure to add the path to docker_volumes ! - -# -# -#hg19 hg19 Human: hg19/GRCh37 /opt/NGS/References/hg19/2bit/hg19.2bit diff -r febc6023d37b -r 5c324f9a4e20 varamplicnv-5bafb1c69d03/tool_data_table_conf.xml.sample --- a/varamplicnv-5bafb1c69d03/tool_data_table_conf.xml.sample Fri Sep 25 08:29:36 2020 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,7 +0,0 @@ - - - - short, dbkey, name, value - -
-