cnv_facets: facets_analysis.xml comparison

comparison facets_analysis.xml @ 6:625038b7d764 draft

planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/main/tools/facets commit 8cced47697e5777fd60dacc60300e770bd409e9d

author	artbio
date	Mon, 06 Oct 2025 15:50:12 +0000
parents	1d56a6b5739f
children	86bcdc94b008

comparison

equal deleted inserted replaced

-:1d56a6b5739f
+:625038b7d764
 --output_vcf '$output_vcf'
 --cval $cval
 --min_nhet $min_nhet
 --snp_nbhd $snp_nbhd
 --gbuild '$gbuild'
+#if $merging.merge_select == "yes":
+--enable_merging
+--merge_gap_abs $merging.max_gap_abs
+--merge_gap_rel $merging.max_gap_rel
+#end if
 ]]></command>
 <inputs>
 <param name="pileup" type="data" format="tabular.gz" label="FACETS Pileup File" help="Output from the 'SNP Pileup for FACETS' tool."/>
 <param name="cval" type="float" value="150" label="Critical value for segmentation (cval)"
 <option value="hg19">Human (hg19)</option>
 <option value="hg18">Human (hg18)</option>
 <option value="mm10">Mouse (mm10)</option>
 <option value="mm9">Mouse (mm9)</option>
 </param>
 <param name="snp_nbhd" type="integer" value="300" label="SNP neighborhood size (snp.nbhd)" help="Should match the --pseudo-snps distance used to generate the pileup file. Default is 300."/>
+<conditional name="merging">
+<param name="merge_select" type="select" label="Post-process VCF to merge adjacent segments?" help="Optional step to merge adjacent CNV calls that likely represent a single biological event.">
+<option value="no" selected="true">No</option>
+<option value="yes">Yes</option>
+</param>
+<when value="no"/>
+<when value="yes">
+<param name="max_gap_abs" type="integer" value="1000000" label="Absolute maximum gap to merge (bp)" help="Maximum distance in base pairs allowed between two segments to consider them for merging."/>
+<param name="max_gap_rel" type="float" value="0.5" label="Relative maximum gap to merge (fraction)" help="Maximum relative distance, as a fraction of the average size of the two segments."/>
+</when>
+</conditional>
 </inputs>
 <outputs>
 <data name="output_seg" format="tsv" label="FACETS Segmentation on ${on_string}"/>
 <data name="output_summary" format="tabular" label="FACETS Summary on ${on_string}"/>
 <data name="output_plots" format="png" label="FACETS Plots on ${on_string}"/>
 <output name="output_plots" file="test_sample_01.plots.png" ftype="png" compare="sim_size" delta="20000"/>
 <output name="output_spider" file="test_sample_01.spider.png" ftype="png" compare="sim_size" delta="10000"/>
 <output name="output_vcf" file="test_sample_01.cnv.vcf" ftype="vcf" lines_diff="2" />
 </test>
 </tests>
 <help><![CDATA[
 **What it does**
-This tool runs the `FACETS` R package to perform allele-specific copy number and clonal heterogeneity analysis. It takes the compressed pileup file generated by the "SNP Pileup for FACETS" tool as its primary input.
+This tool runs the `FACETS` R package to perform allele-specific copy number
+and clonal heterogeneity analysis. It takes the compressed pileup file
+generated by the "SNP Pileup for FACETS" tool as its primary input and
+produces a set of standard FACETS outputs, including segmentation calls,
+purity/ploidy estimates, plots, and a VCF file summarizing the CNV events.
-**Outputs**
+---
-- A **Segmentation file (TSV)** with the genomic coordinates of each segment and their associated copy number (TCN, LCN).
+**Primary Parameters**
-- A **Summary file** with the main estimated parameters (purity, ploidy, etc.).
-- A **CNV calls file (VCF)** listing the detected copy number events in a standard VCF format.
+These parameters control the core of the FACETS segmentation algorithm.
-- A **Plots file (PNG)** with an enhanced visualization of the genome-wide results, including a legend for copy number states.
-- A **Spider Plot (PNG)** for diagnosing the quality of the purity/ploidy model fit.
+- **Critical value for segmentation (cval):** This is the most important
-]]></help>
+parameter for controlling the sensitivity of the segmentation. A *higher*
+value (e.g., 200-800) will result in fewer segments and is generally
+recommended for high-density data like Whole Genome Sequencing (WGS).
+A *lower* value (e.g., 50-150) increases sensitivity, resulting in more
+segments, and is more suitable for sparser data like Whole Exome
+Sequencing (WES).
+- **Minimum number of heterozygous SNPs (min.nhet):** This is a quality
+filter. After segmentation, any segment that is supported by fewer
+heterozygous SNPs than this threshold will be discarded. This helps
+to remove unreliable, small segments.
+- **SNP neighbourhood size (snp.nbhd):** This parameter defines the genomic
+window (in bp) around a SNP used for local read depth normalization.
+The default value is generally appropriate.
+---
+**Advanced VCF Post-processing: Merging Segments**
+You can optionally enable a post-processing step to merge adjacent CNV
+segments in the output VCF.
+*Why is this useful?*
+Segmentation algorithms can sometimes split a single, large biological event
+(e.g., a 10 Mb deletion) into several smaller, adjacent segments with the
+same copy number state. This feature attempts to correct this by merging
+these segments back together, providing a cleaner and more biologically
+accurate representation of the CNV landscape.
+The merging is controlled by an algorithm using two thresholds:
+- **Absolute maximum gap:** The maximum distance in base pairs allowed
+between two segments to even consider them for merging. This acts as a
+safeguard.
+- **Relative maximum gap:** The maximum distance allowed, expressed as a
+*fraction* of the average size of the two segments. This allows large
+gaps between large segments, but not between small ones, trying to mimic
+how a human expert would interpret the data.
+---
+**Outputs**
+- **Segmentation file (TSV):** The raw segment data with genomic coordinates
+and their associated copy number (TCN, LCN).
+- **Summary file:** The main estimated parameters like purity, ploidy, etc.
+- **CNV calls file (VCF):** A summary of the detected copy number events in
+a standard VCF format, suitable for downstream analysis.
+- **Plots file (PNG):** An enhanced visualization of the genome-wide results.
+- **Spider Plot (PNG):** This is the most important **diagnostic plot** for
+assessing the quality of the FACETS fit.
+On this plot (generated by the `logRlogORspider` function), each
+**circle** is a genomic segment from your data. The **curves** (labeled
+`2-1`, `1-0`, etc.) represent the theoretical positions for integer copy
+number states given the estimated purity and ploidy. A high-confidence
+result is achieved when your data (the circles) align closely with these
+theoretical curves. For a detailed interpretation, please refer to the
+original FACETS publication: Shen and Seshan, *NAR*, 2016.
+]]></help>
 <expand macro="citations"/>
 </tool>

Mercurial > repos > artbio > cnv_facets

comparison facets_analysis.xml @ 6:625038b7d764 draft