cnv_facets: facets_analysis.xml comparison

comparison facets_analysis.xml @ 7:86bcdc94b008 draft

planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/main/tools/facets commit 2da49e9385ddce5c74e077c81a52ff1ea4131b81

author	artbio
date	Wed, 08 Oct 2025 17:41:18 +0000
parents	625038b7d764
children

comparison

equal deleted inserted replaced

-:625038b7d764
+:86bcdc94b008
 #if $merging.merge_select == "yes":
 --enable_merging
 --merge_gap_abs $merging.max_gap_abs
 --merge_gap_rel $merging.max_gap_rel
 #end if
+--vcf_min_nhet $filtering.vcf_min_nhet
+--vcf_min_num_mark $filtering.vcf_min_num_mark
 ]]></command>
 <inputs>
 <param name="pileup" type="data" format="tabular.gz" label="FACETS Pileup File" help="Output from the 'SNP Pileup for FACETS' tool."/>
 <param name="cval" type="float" value="150" label="Critical value for segmentation (cval)"
 <when value="yes">
 <param name="max_gap_abs" type="integer" value="1000000" label="Absolute maximum gap to merge (bp)" help="Maximum distance in base pairs allowed between two segments to consider them for merging."/>
 <param name="max_gap_rel" type="float" value="0.5" label="Relative maximum gap to merge (fraction)" help="Maximum relative distance, as a fraction of the average size of the two segments."/>
 </when>
 </conditional>
+<section name="filtering" title="VCF Output Filtering" expanded="false">
+<param name="vcf_min_nhet" type="integer" value="2" label="Minimum heterozygous SNPs for VCF output" help="Post-filter to remove final segments with fewer than this many heterozygous SNPs."/>
+<param name="vcf_min_num_mark" type="integer" value="3" label="Minimum total markers for VCF output" help="Post-filter to remove final segments with fewer than this many total markers (SNPs). Helps remove SVLEN=0 artifacts."/>
+</section>
 </inputs>
 <outputs>
 <data name="output_seg" format="tsv" label="FACETS Segmentation on ${on_string}"/>
 <data name="output_summary" format="tabular" label="FACETS Summary on ${on_string}"/>
 <data name="output_plots" format="png" label="FACETS Plots on ${on_string}"/>
 <output name="output_plots" file="test_sample_01.plots.png" ftype="png" compare="sim_size" delta="20000"/>
 <output name="output_spider" file="test_sample_01.spider.png" ftype="png" compare="sim_size" delta="10000"/>
 <output name="output_vcf" file="test_sample_01.cnv.vcf" ftype="vcf" lines_diff="2" />
 </test>
 </tests>
 <help><![CDATA[
 **What it does**
 This tool runs the `FACETS` R package to perform allele-specific copy number
 and clonal heterogeneity analysis. It takes the compressed pileup file
 generated by the "SNP Pileup for FACETS" tool as its primary input and
-produces a set of standard FACETS outputs, including segmentation calls,
+produces a set of standard FACETS outputs.
-purity/ploidy estimates, plots, and a VCF file summarizing the CNV events.
 ---
 **Primary Parameters**
 These parameters control the core of the FACETS segmentation algorithm.
 - **Critical value for segmentation (cval):** This is the most important
-parameter for controlling the sensitivity of the segmentation. A *higher*
+parameter for controlling the sensitivity. A *higher* value (e.g., 200-800)
-value (e.g., 200-800) will result in fewer segments and is generally
+results in fewer segments (less sensitive) and is recommended for
-recommended for high-density data like Whole Genome Sequencing (WGS).
+high-density data (WGS). A *lower* value (e.g., 50-150) increases
-A *lower* value (e.g., 50-150) increases sensitivity, resulting in more
+sensitivity and is more suitable for sparser data (WES).
-segments, and is more suitable for sparser data like Whole Exome
-Sequencing (WES).
 - **Minimum number of heterozygous SNPs (min.nhet):** This is a quality
-filter. After segmentation, any segment that is supported by fewer
+filter. Segments supported by fewer heterozygous SNPs than this
-heterozygous SNPs than this threshold will be discarded. This helps
+threshold will be discarded during the initial segmentation pass.
-to remove unreliable, small segments.
-- **SNP neighbourhood size (snp.nbhd):** This parameter defines the genomic
+- **SNP neighbourhood size (snp.nbhd):** Defines the genomic window (in bp)
-window (in bp) around a SNP used for local read depth normalization.
+around a SNP used for local read depth normalization.
-The default value is generally appropriate.
 ---
-**Advanced VCF Post-processing: Merging Segments**
+**Advanced VCF Post-processing**
-You can optionally enable a post-processing step to merge adjacent CNV
+You can optionally enable post-processing steps to refine the final VCF.
-segments in the output VCF.
-*Why is this useful?*
+- **Merging Segments:** This option merges adjacent CNV segments that likely
-Segmentation algorithms can sometimes split a single, large biological event
+represent a single biological event, providing a cleaner and more
-(e.g., a 10 Mb deletion) into several smaller, adjacent segments with the
+biologically accurate output.
-same copy number state. This feature attempts to correct this by merging
-these segments back together, providing a cleaner and more biologically
-accurate representation of the CNV landscape.
-The merging is controlled by an algorithm using two thresholds:
+- **Filtering Segments:** This option removes low-quality or artefactual
+segments based on the number of SNPs supporting them. This is recommended
-- **Absolute maximum gap:** The maximum distance in base pairs allowed
+as FACETS can sometimes report micro-segments that are not biologically
-between two segments to even consider them for merging. This acts as a
+relevant.
-safeguard.
-- **Relative maximum gap:** The maximum distance allowed, expressed as a
-*fraction* of the average size of the two segments. This allows large
-gaps between large segments, but not between small ones, trying to mimic
-how a human expert would interpret the data.
 ---
 **Outputs**
 - **Segmentation file (TSV):** The raw segment data with genomic coordinates
 and their associated copy number (TCN, LCN).
 - **Summary file:** The main estimated parameters like purity, ploidy, etc.
+- **Plots file (PNG):** A genome-wide visualization of the copy number and
+allelic imbalance results across all chromosomes.
+- **Spider Plot (PNG):** The most important **diagnostic plot** for assessing
+the quality of the FACETS fit. See detailed explanation below.
 - **CNV calls file (VCF):** A summary of the detected copy number events in
-a standard VCF format, suitable for downstream analysis.
+a standard VCF format for structural variants. The `ALT` column contains
-- **Plots file (PNG):** An enhanced visualization of the genome-wide results.
+symbolic alleles (`<DEL>`, `<DUP>`). All FACETS-specific details are in
-- **Spider Plot (PNG):** This is the most important **diagnostic plot** for
+the `INFO` field:
-assessing the quality of the FACETS fit.
-On this plot (generated by the `logRlogORspider` function), each
+``SVTYPE``
-**circle** is a genomic segment from your data. The **curves** (labeled
+Type of variant (e.g., DEL, DUP).
-`2-1`, `1-0`, etc.) represent the theoretical positions for integer copy
+``EVENT``
-number states given the estimated purity and ploidy. A high-confidence
+FACETS classification (e.g., HOMOZYG_DEL, CN_LOH).
-result is achieved when your data (the circles) align closely with these
+``TCN``
-theoretical curves. For a detailed interpretation, please refer to the
+Total Copy Number.
-original FACETS publication: Shen and Seshan, *NAR*, 2016.
+``LCN``
-]]></help>
+Lesser Copy Number.
+``NUM_MARK``
+Total number of SNPs in the segment.
+``NHET``
+Number of heterozygous SNPs in the segment.
+**Interpreting the Spider Plot**
+On this plot (generated by the `logRlogORspider` function), each
+**circle** is a genomic segment from your data. The **curves** (labeled
+`2-1`, `1-0`, etc.) represent the theoretical positions for integer copy
+number states. A high-confidence result is achieved when your data (the
+circles) align closely with these curves. For details, refer to the
+original FACETS publication: Shen and Seshan, *NAR*, 2016.
+]]></help>
 <expand macro="citations"/>
 </tool>

Mercurial > repos > artbio > cnv_facets

comparison facets_analysis.xml @ 7:86bcdc94b008 draft