diff COBRAxy/docs/tools/flux-to-map.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/COBRAxy/docs/tools/flux-to-map.md	Tue Sep 30 14:02:17 2025 +0000
@@ -0,0 +1,467 @@
+# Flux to Map
+
+Visualize metabolic flux data on pathway maps with statistical analysis and color coding.
+
+## Overview
+
+Flux to Map performs statistical analysis on flux distribution data and generates color-coded metabolic pathway maps. It compares flux values between sample groups and highlights significantly different reactions with appropriate colors and line weights.
+
+## Usage
+
+### Command Line
+
+```bash
+flux_to_map -td /path/to/COBRAxy \
+            -input_data_fluxes flux_data.tsv \
+            -input_class_fluxes sample_groups.tsv \
+            -comparison manyvsmany \
+            -test ks \
+            -pv 0.05 \
+            -fc 1.5 \
+            -choice_map ENGRO2 \
+            -generate_svg true \
+            -generate_pdf true \
+            -idop flux_maps/
+```
+
+### Galaxy Interface
+
+Select "Flux to Map" from the COBRAxy tool suite and configure flux analysis and visualization parameters.
+
+## Parameters
+
+### Required Parameters
+
+| Parameter | Flag | Description |
+|-----------|------|-------------|
+| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory |
+
+### Data Input Parameters
+
+| Parameter | Flag | Description | Default |
+|-----------|------|-------------|---------|
+| Flux Data | `-idf, --input_data_fluxes` | Flux values TSV file | - |
+| Flux Classes | `-icf, --input_class_fluxes` | Sample group labels for fluxes | - |
+| Multiple Flux Files | `-idsf, --input_datas_fluxes` | Multiple flux datasets (space-separated) | - |
+| Flux Names | `-naf, --names_fluxes` | Names for multiple flux datasets | - |
+| Analysis Option | `-op, --option` | Analysis mode (datasets or dataset_class) | - |
+
+### Statistical Parameters
+
+| Parameter | Flag | Description | Default |
+|-----------|------|-------------|---------|
+| Comparison Type | `-co, --comparison` | Statistical comparison mode | manyvsmany |
+| Statistical Test | `-te, --test` | Statistical test method | ks |
+| P-Value Threshold | `-pv, --pValue` | Significance threshold | 0.1 |
+| Adjusted P-values | `-adj, --adjusted` | Apply FDR correction | false |
+| Fold Change | `-fc, --fChange` | Minimum fold change threshold | 1.5 |
+
+### Visualization Parameters
+
+| Parameter | Flag | Description | Default |
+|-----------|------|-------------|---------|
+| Map Choice | `-mc, --choice_map` | Built-in metabolic map | HMRcore |
+| Custom Map | `-cm, --custom_map` | Path to custom SVG map | - |
+| Generate SVG | `-gs, --generate_svg` | Create SVG output | true |
+| Generate PDF | `-gp, --generate_pdf` | Create PDF output | true |
+| Color Map | `-colorm, --color_map` | Color scheme (jet, viridis) | - |
+| Output Directory | `-idop, --output_path` | Results directory | result/ |
+
+### Advanced Parameters
+
+| Parameter | Flag | Description | Default |
+|-----------|------|-------------|---------|
+| Output Log | `-ol, --out_log` | Log file path | - |
+| Control Sample | `-on, --control` | Control group identifier | - |
+
+## Input Formats
+
+### Flux Data File
+
+Tab-separated format with reactions as rows and samples as columns:
+
+```
+Reaction	Sample1	Sample2	Sample3	Control1	Control2
+R00001	15.23	-8.45	22.1	12.8	14.2
+R00002	0.0	12.67	-5.3	8.9	7.4
+R00003	45.8	38.2	51.7	42.1	39.8
+R00004	-12.4	-15.8	-9.2	-11.5	-13.1
+```
+
+### Sample Class File
+
+Group assignment for statistical comparisons:
+
+```
+Sample	Class
+Sample1	Treatment
+Sample2	Treatment  
+Sample3	Treatment
+Control1	Control
+Control2	Control
+```
+
+### Multiple Dataset Format
+
+When using multiple flux files, provide space-separated paths and corresponding names:
+
+```bash
+-idsf "dataset1_flux.tsv dataset2_flux.tsv dataset3_flux.tsv"
+-naf "Condition_A Condition_B Condition_C"
+```
+
+## Statistical Analysis
+
+### Comparison Types
+
+#### manyvsmany
+Compare all possible group pairs:
+- Treatment vs Control
+- Condition_A vs Condition_B
+- Condition_A vs Condition_C
+- Condition_B vs Condition_C
+
+#### onevsrest
+Compare each group against all others combined:
+- Treatment vs (Control + Other)
+- Control vs (Treatment + Other)
+
+#### onevsmany
+Compare one reference group against each other group:
+- Control vs Treatment
+- Control vs Condition_A
+- Control vs Condition_B
+
+### Statistical Tests
+
+| Test | Description | Best For |
+|------|-------------|----------|
+| `ks` | Kolmogorov-Smirnov | Non-parametric, distribution-free |
+| `ttest_p` | Paired t-test | Related samples, normal distributions |
+| `ttest_ind` | Independent t-test | Independent samples, normal distributions |
+| `wilcoxon` | Wilcoxon signed-rank | Non-parametric paired comparisons |
+| `mw` | Mann-Whitney U | Non-parametric independent comparisons |
+
+### Significance Assessment
+
+Reactions are considered significant when:
+1. **P-value** ≤ specified threshold (default: 0.1)
+2. **Fold change** ≥ specified threshold (default: 1.5)
+3. **FDR correction** (if enabled) maintains significance
+
+## Map Visualization
+
+### Built-in Maps
+
+#### HMRcore (Default)
+- **Scope**: Core human metabolic network
+- **Reactions**: ~300 essential reactions
+- **Coverage**: Central carbon, amino acid, lipid metabolism
+- **Use Case**: General overview, publication figures
+
+#### ENGRO2  
+- **Scope**: Extended human genome-scale reconstruction
+- **Reactions**: ~2,000 reactions
+- **Coverage**: Comprehensive metabolic network
+- **Use Case**: Detailed analysis, specialized tissues
+
+#### Custom Maps
+User-provided SVG files with reaction elements:
+```xml
+<rect id="R00001" class="reaction" fill="gray" stroke="black"/>
+<path id="R00002" class="reaction" fill="gray" stroke="black"/>
+```
+
+### Color Coding Scheme
+
+#### Significance Colors
+- **Red Gradient**: Significantly upregulated (positive fold change)
+- **Blue Gradient**: Significantly downregulated (negative fold change)  
+- **Gray**: Not statistically significant
+- **White**: No data available
+
+#### Visual Elements
+- **Line Width**: Proportional to fold change magnitude
+- **Color Intensity**: Proportional to statistical significance (-log10 p-value)
+- **Transparency**: Indicates confidence level
+
+### Color Maps
+
+#### Jet (Default)
+- High contrast color transitions
+- Blue (low) → Green → Yellow → Red (high)
+- Good for identifying extreme values
+
+#### Viridis
+- Perceptually uniform color scale
+- Colorblind-friendly
+- Purple (low) → Blue → Green → Yellow (high)
+
+## Output Files
+
+### Statistical Results
+- `flux_statistics.tsv`: P-values, fold changes, test statistics for all reactions
+- `significant_fluxes.tsv`: Only reactions meeting significance criteria
+- `comparison_summary.txt`: Analysis parameters and summary statistics
+
+### Visualizations
+- `flux_map.svg`: Interactive color-coded pathway map
+- `flux_map.pdf`: High-resolution PDF (if requested)  
+- `flux_map.png`: Raster image (if requested)
+- `legend.svg`: Color scale and statistical significance legend
+
+### Analysis Files
+- `fold_changes.tsv`: Detailed fold change calculations
+- `group_statistics.tsv`: Per-group summary statistics
+- `comparison_matrix.tsv`: Pairwise comparison results
+
+## Examples
+
+### Basic Flux Comparison
+
+```bash
+# Compare treatment vs control fluxes
+flux_to_map -td /opt/COBRAxy \
+            -idf treatment_vs_control_fluxes.tsv \
+            -icf sample_groups.tsv \
+            -co manyvsmany \
+            -te ks \
+            -pv 0.05 \
+            -fc 2.0 \
+            -mc HMRcore \
+            -gs true \
+            -gp true \
+            -idop flux_comparison/
+```
+
+### Multiple Condition Analysis
+
+```bash
+# Compare multiple experimental conditions
+flux_to_map -td /opt/COBRAxy \
+            -idsf "cond1_flux.tsv cond2_flux.tsv cond3_flux.tsv" \
+            -naf "Control Treatment1 Treatment2" \
+            -co onevsrest \
+            -te wilcoxon \
+            -adj true \
+            -pv 0.01 \
+            -fc 1.8 \
+            -mc ENGRO2 \
+            -colorm viridis \
+            -idop multi_condition_flux/
+```
+
+### Custom Map Visualization
+
+```bash
+# Use tissue-specific custom map
+flux_to_map -td /opt/COBRAxy \
+            -idf liver_flux_data.tsv \
+            -icf liver_conditions.tsv \
+            -co manyvsmany \
+            -te ttest_ind \
+            -pv 0.05 \
+            -fc 1.5 \
+            -cm maps/liver_specific_map.svg \
+            -gs true \
+            -gp true \
+            -idop liver_flux_analysis/ \
+            -ol liver_analysis.log
+```
+
+### High-Throughput Analysis
+
+```bash
+# Process multiple datasets with stringent criteria
+flux_to_map -td /opt/COBRAxy \
+            -idsf "exp1.tsv exp2.tsv exp3.tsv exp4.tsv" \
+            -naf "Exp1 Exp2 Exp3 Exp4" \
+            -co manyvsmany \
+            -te ks \
+            -adj true \
+            -pv 0.001 \
+            -fc 3.0 \
+            -mc HMRcore \
+            -colorm jet \
+            -gs true \
+            -gp true \
+            -idop high_throughput_flux/
+```
+
+## Quality Control
+
+### Data Validation
+
+#### Pre-analysis Checks
+- Verify flux value distributions (check for outliers)
+- Ensure sample names match between data and class files
+- Validate reaction coverage across samples
+- Check for missing values and their patterns
+
+#### Statistical Validation  
+- Assess normality assumptions for parametric tests
+- Verify adequate sample sizes per group (n≥3 recommended)
+- Check variance homogeneity between groups
+- Evaluate multiple testing burden
+
+### Result Interpretation
+
+#### Biological Validation
+- Compare results with known pathway activities
+- Check for pathway coherence (related reactions should cluster)
+- Validate against literature or experimental evidence
+- Assess metabolic network connectivity
+
+#### Technical Validation
+- Compare results across different statistical tests
+- Check sensitivity to parameter changes
+- Validate fold change calculations
+- Verify map element correspondence
+
+## Tips and Best Practices
+
+### Data Preparation
+- **Normalization**: Ensure consistent flux units across samples
+- **Filtering**: Remove reactions with excessive missing values (>50%)
+- **Outlier Detection**: Identify and handle extreme flux values
+- **Batch Effects**: Account for technical variation between experiments
+
+### Statistical Considerations
+- Use FDR correction for multiple comparisons (`-adj true`)
+- Choose appropriate statistical tests based on data distribution
+- Consider effect size (fold change) alongside significance
+- Validate results with independent datasets when possible
+
+### Visualization Optimization
+- Select appropriate color maps for your audience
+- Use high fold change thresholds (>2.0) for cleaner maps
+- Export both SVG (editable) and PDF (publication) formats
+- Include comprehensive legends and annotations
+
+### Performance Tips
+- Use HMRcore for faster processing and clearer visualizations
+- Reduce dataset size for initial exploratory analysis
+- Process large datasets in batches if memory constrained
+- Cache intermediate results for parameter optimization
+
+## Integration Workflow
+
+### Upstream Tools
+- [Flux Simulation](flux-simulation.md) - Generate flux distributions for comparison
+- [MAREA](marea.md) - Alternative analysis pathway for RAS/RPS data
+
+### Downstream Analysis
+- Export results to statistical software (R, Python) for advanced analysis
+- Integrate with pathway databases (KEGG, Reactome)
+- Combine with other omics data for systems-level insights
+
+### Typical Pipeline
+
+```bash
+# 1. Generate flux samples from constrained models
+flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \
+                -ni Sample1,Sample2,Control1,Control2 -a CBS \
+                -ot mean -idop fluxes/
+
+# 2. Analyze and visualize flux differences
+flux_to_map -td /opt/COBRAxy -idf fluxes/mean.csv \
+            -icf sample_groups.tsv -co manyvsmany -te ks \
+            -mc HMRcore -gs true -gp true -idop flux_maps/
+
+# 3. Further analysis with custom scripts
+python analyze_flux_results.py -i flux_maps/ -o final_results/
+```
+
+## Troubleshooting
+
+### Common Issues
+
+**No significant reactions found**
+- Lower p-value threshold (`-pv 0.2`)
+- Reduce fold change requirement (`-fc 1.2`)  
+- Check sample group definitions and sizes
+- Verify flux data quality and normalization
+
+**Map rendering problems**
+- Check SVG map file integrity and format
+- Verify reaction ID matching between data and map
+- Ensure sufficient system memory for large maps
+- Validate XML structure of custom maps
+
+**Statistical test failures**
+- Check data distribution assumptions
+- Verify sufficient sample sizes per group
+- Consider alternative non-parametric tests
+- Examine variance patterns between groups
+
+### Error Messages
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| "Map file not found" | Missing/invalid map path | Check file location and format |
+| "No matching reactions" | ID mismatch between data and map | Verify reaction naming consistency |
+| "Insufficient data" | Too few samples per group | Increase sample sizes or merge groups |
+| "Memory allocation failed" | Large dataset/map combination | Reduce data size or increase system memory |
+
+### Performance Issues
+
+**Slow processing**
+- Use HMRcore instead of ENGRO2 for faster rendering
+- Reduce dataset size for testing
+- Process subsets of reactions separately
+- Monitor system resource usage
+
+**Large output files**
+- Use compressed formats when possible
+- Reduce map resolution for preliminary analysis
+- Export only essential output formats
+- Clean temporary files regularly
+
+## Advanced Usage
+
+### Custom Statistical Functions
+
+Advanced users can implement custom statistical tests by modifying the analysis functions:
+
+```python
+def custom_test(group1, group2):
+    # Custom statistical test implementation
+    statistic, pvalue = your_test_function(group1, group2)
+    return statistic, pvalue
+```
+
+### Batch Processing Script
+
+Process multiple experiments systematically:
+
+```bash
+#!/bin/bash
+experiments=("exp1" "exp2" "exp3" "exp4")
+for exp in "${experiments[@]}"; do
+    flux_to_map -td /opt/COBRAxy \
+                -idf "data/${exp}_flux.tsv" \
+                -icf "data/${exp}_classes.tsv" \
+                -co manyvsmany -te ks -pv 0.05 \
+                -mc HMRcore -gs true -gp true \
+                -idop "results/${exp}/"
+done
+```
+
+### Result Aggregation
+
+Combine results across multiple analyses:
+
+```bash
+# Merge significant reactions across experiments
+python merge_flux_results.py \
+    -i results/exp*/significant_fluxes.tsv \
+    -o combined_significant_reactions.tsv \
+    --method intersection
+```
+
+## See Also
+
+- [Flux Simulation](flux-simulation.md) - Generate input flux distributions
+- [MAREA](marea.md) - Alternative pathway analysis approach
+- [Custom Map Creation Guide](../tutorials/custom-map-creation.md)
+- [Statistical Methods Reference](../tutorials/statistical-methods.md)
\ No newline at end of file