Mercurial > repos > artbio > cnv_facets
diff facets_analysis.xml @ 7:86bcdc94b008 draft
planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/main/tools/facets commit 2da49e9385ddce5c74e077c81a52ff1ea4131b81
| author | artbio |
|---|---|
| date | Wed, 08 Oct 2025 17:41:18 +0000 |
| parents | 625038b7d764 |
| children |
line wrap: on
line diff
--- a/facets_analysis.xml Mon Oct 06 15:50:12 2025 +0000 +++ b/facets_analysis.xml Wed Oct 08 17:41:18 2025 +0000 @@ -23,6 +23,8 @@ --merge_gap_abs $merging.max_gap_abs --merge_gap_rel $merging.max_gap_rel #end if + --vcf_min_nhet $filtering.vcf_min_nhet + --vcf_min_num_mark $filtering.vcf_min_num_mark ]]></command> <inputs> <param name="pileup" type="data" format="tabular.gz" label="FACETS Pileup File" help="Output from the 'SNP Pileup for FACETS' tool."/> @@ -50,6 +52,10 @@ <param name="max_gap_rel" type="float" value="0.5" label="Relative maximum gap to merge (fraction)" help="Maximum relative distance, as a fraction of the average size of the two segments."/> </when> </conditional> + <section name="filtering" title="VCF Output Filtering" expanded="false"> + <param name="vcf_min_nhet" type="integer" value="2" label="Minimum heterozygous SNPs for VCF output" help="Post-filter to remove final segments with fewer than this many heterozygous SNPs."/> + <param name="vcf_min_num_mark" type="integer" value="3" label="Minimum total markers for VCF output" help="Post-filter to remove final segments with fewer than this many total markers (SNPs). Helps remove SVLEN=0 artifacts."/> + </section> </inputs> <outputs> <data name="output_seg" format="tsv" label="FACETS Segmentation on ${on_string}"/> @@ -69,14 +75,13 @@ <output name="output_vcf" file="test_sample_01.cnv.vcf" ftype="vcf" lines_diff="2" /> </test> </tests> -<help><![CDATA[ + <help><![CDATA[ **What it does** This tool runs the `FACETS` R package to perform allele-specific copy number and clonal heterogeneity analysis. It takes the compressed pileup file generated by the "SNP Pileup for FACETS" tool as its primary input and - produces a set of standard FACETS outputs, including segmentation calls, - purity/ploidy estimates, plots, and a VCF file summarizing the CNV events. + produces a set of standard FACETS outputs. --- @@ -85,45 +90,32 @@ These parameters control the core of the FACETS segmentation algorithm. - **Critical value for segmentation (cval):** This is the most important - parameter for controlling the sensitivity of the segmentation. A *higher* - value (e.g., 200-800) will result in fewer segments and is generally - recommended for high-density data like Whole Genome Sequencing (WGS). - A *lower* value (e.g., 50-150) increases sensitivity, resulting in more - segments, and is more suitable for sparser data like Whole Exome - Sequencing (WES). + parameter for controlling the sensitivity. A *higher* value (e.g., 200-800) + results in fewer segments (less sensitive) and is recommended for + high-density data (WGS). A *lower* value (e.g., 50-150) increases + sensitivity and is more suitable for sparser data (WES). - **Minimum number of heterozygous SNPs (min.nhet):** This is a quality - filter. After segmentation, any segment that is supported by fewer - heterozygous SNPs than this threshold will be discarded. This helps - to remove unreliable, small segments. + filter. Segments supported by fewer heterozygous SNPs than this + threshold will be discarded during the initial segmentation pass. - - **SNP neighbourhood size (snp.nbhd):** This parameter defines the genomic - window (in bp) around a SNP used for local read depth normalization. - The default value is generally appropriate. + - **SNP neighbourhood size (snp.nbhd):** Defines the genomic window (in bp) + around a SNP used for local read depth normalization. --- - **Advanced VCF Post-processing: Merging Segments** + **Advanced VCF Post-processing** - You can optionally enable a post-processing step to merge adjacent CNV - segments in the output VCF. + You can optionally enable post-processing steps to refine the final VCF. - *Why is this useful?* - Segmentation algorithms can sometimes split a single, large biological event - (e.g., a 10 Mb deletion) into several smaller, adjacent segments with the - same copy number state. This feature attempts to correct this by merging - these segments back together, providing a cleaner and more biologically - accurate representation of the CNV landscape. + - **Merging Segments:** This option merges adjacent CNV segments that likely + represent a single biological event, providing a cleaner and more + biologically accurate output. - The merging is controlled by an algorithm using two thresholds: - - - **Absolute maximum gap:** The maximum distance in base pairs allowed - between two segments to even consider them for merging. This acts as a - safeguard. - - **Relative maximum gap:** The maximum distance allowed, expressed as a - *fraction* of the average size of the two segments. This allows large - gaps between large segments, but not between small ones, trying to mimic - how a human expert would interpret the data. + - **Filtering Segments:** This option removes low-quality or artefactual + segments based on the number of SNPs supporting them. This is recommended + as FACETS can sometimes report micro-segments that are not biologically + relevant. --- @@ -132,18 +124,37 @@ - **Segmentation file (TSV):** The raw segment data with genomic coordinates and their associated copy number (TCN, LCN). - **Summary file:** The main estimated parameters like purity, ploidy, etc. + - **Plots file (PNG):** A genome-wide visualization of the copy number and + allelic imbalance results across all chromosomes. + - **Spider Plot (PNG):** The most important **diagnostic plot** for assessing + the quality of the FACETS fit. See detailed explanation below. - **CNV calls file (VCF):** A summary of the detected copy number events in - a standard VCF format, suitable for downstream analysis. - - **Plots file (PNG):** An enhanced visualization of the genome-wide results. - - **Spider Plot (PNG):** This is the most important **diagnostic plot** for - assessing the quality of the FACETS fit. - On this plot (generated by the `logRlogORspider` function), each - **circle** is a genomic segment from your data. The **curves** (labeled - `2-1`, `1-0`, etc.) represent the theoretical positions for integer copy - number states given the estimated purity and ploidy. A high-confidence - result is achieved when your data (the circles) align closely with these - theoretical curves. For a detailed interpretation, please refer to the - original FACETS publication: Shen and Seshan, *NAR*, 2016. - ]]></help> + a standard VCF format for structural variants. The `ALT` column contains + symbolic alleles (`<DEL>`, `<DUP>`). All FACETS-specific details are in + the `INFO` field: + + ``SVTYPE`` + Type of variant (e.g., DEL, DUP). + ``EVENT`` + FACETS classification (e.g., HOMOZYG_DEL, CN_LOH). + ``TCN`` + Total Copy Number. + ``LCN`` + Lesser Copy Number. + ``NUM_MARK`` + Total number of SNPs in the segment. + ``NHET`` + Number of heterozygous SNPs in the segment. + + **Interpreting the Spider Plot** + + On this plot (generated by the `logRlogORspider` function), each + **circle** is a genomic segment from your data. The **curves** (labeled + `2-1`, `1-0`, etc.) represent the theoretical positions for integer copy + number states. A high-confidence result is achieved when your data (the + circles) align closely with these curves. For details, refer to the + original FACETS publication: Shen and Seshan, *NAR*, 2016. + + ]]></help> <expand macro="citations"/> </tool>
