Mercurial > repos > bimib > cobraxy
diff COBRAxy/docs/tools/flux-simulation.md @ 547:73f2f7e2be17 draft
Uploaded
| author | francesco_lapi |
|---|---|
| date | Tue, 28 Oct 2025 10:44:07 +0000 |
| parents | fcdbc81feb45 |
| children |
line wrap: on
line diff
--- a/COBRAxy/docs/tools/flux-simulation.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/flux-simulation.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,406 +1,184 @@ # Flux Simulation -Sample metabolic fluxes using constraint-based modeling with CBS or OPTGP algorithms. +Simulate flux distributions from constraint-based metabolic models using different optimization or sampling strategies. ## Overview -Flux Simulation performs constraint-based sampling of metabolic flux distributions from constrained models. It supports two sampling algorithms (CBS and OPTGP) and provides comprehensive flux statistics including mean, median, quantiles, pFBA, FVA, and sensitivity analysis. +Two types of analysis are available: +- **flux optimization** +- **flux sampling** + +For flux optimization, one of the following methods can be performed: parsimonious-FBA, Flux Variability Analysis, Biomass sensitivity analysis (single reaction knock-out) +The objective function, a linear combination of fluxes weighted by specific coefficients, depends on the provided metabolic network. + +For flux sampling, one of the following methods can be performed: CBS (Corner-based sampling), OPTGP (Improved Artificial Centering Hit-and-Run sampler). -## Usage +## Galaxy Interface + +In Galaxy: **COBRAxy → Flux Simulation** -### Command Line +1. Select model and upload bounds files +2. Choose algorithm (CBS/OPTGP) and sampling parameters +3. Click **Run tool** + +## Command-line console ```bash -flux_simulation -td /path/to/COBRAxy \ - -ms ENGRO2 \ - -in bounds1.tsv,bounds2.tsv \ - -ni Sample1,Sample2 \ +flux_simulation -ms ENGRO2 \ + -in bounds/*.tsv \ + -ni Sample1,Sample2,Sample3 \ -a CBS \ -ns 1000 \ - -nb 1 \ - -sd 42 \ - -ot mean,median,quantiles \ - -ota pFBA,FVA,sensitivity \ - -idop flux_results/ + -idop output/ ``` -### Galaxy Interface - -Select "Flux Simulation" from the COBRAxy tool suite and configure sampling parameters through the web interface. - ## Parameters -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | -| Input Bounds | `-in, --input` | Comma-separated list of bounds files | -| Sample Names | `-ni, --names` | Comma-separated sample names | -| Algorithm | `-a, --algorithm` | Sampling algorithm (CBS or OPTGP) | -| Number of Samples | `-ns, --n_samples` | Samples per batch | -| Number of Batches | `-nb, --n_batches` | Number of sampling batches | -| Random Seed | `-sd, --seed` | Random seed for reproducibility | -| Output Types | `-ot, --output_type` | Flux statistics to compute | - -### Model Parameters - | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| -| Model Selector | `-ms, --model_selector` | Built-in model (ENGRO2, Custom) | ENGRO2 | -| Custom Model | `-mo, --model` | Path to custom SBML model | - | -| Model Name | `-mn, --model_name` | Custom model filename | - | - -### Sampling Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Algorithm | `-a, --algorithm` | CBS or OPTGP | - | -| Thinning | `-th, --thinning` | OPTGP thinning parameter | 100 | -| Samples | `-ns, --n_samples` | Samples per batch | - | -| Batches | `-nb, --n_batches` | Number of batches | - | -| Seed | `-sd, --seed` | Random seed | - | - -### Output Parameters - -| Parameter | Flag | Description | Options | -|-----------|------|-------------|---------| -| Output Types | `-ot, --output_type` | Flux statistics | mean,median,quantiles,fluxes | -| Analysis Types | `-ota, --output_type_analysis` | Additional analyses | pFBA,FVA,sensitivity | -| Output Path | `-idop, --output_path` | Results directory | flux_simulation/ | -| Output Log | `-ol, --out_log` | Log file path | - | +| Model Selector | `-ms` | ENGRO2, Recon, or Custom | ENGRO2 | +| Input Format | `--model_and_bounds` | Separate files (true) or complete models (false) | true | +| Input Bounds | `-in` | Bounds files | - | +| Name Input | `-ni` | Sample names (comma-separated) | - | +| Algorithm | `-a` | CBS or OPTGP | CBS | +| Num Samples | `-ns` | Number of samples per batch | 1000 | +| Num Batches | `-nb` | Number of batches | 1 | +| Thinning | `-th` | OPTGP thinning parameter | 100 | +| Output Type | `-ot` | mean, median, quantiles, fluxes | mean,median | +| FVA Optimality | `--perc_opt` | Optimality fraction (0.0-1.0) | 0.90 | +| Output Path | `-idop` | Output directory | flux_simulation/ | ## Algorithms -### CBS (Constraint-Based Sampling) - -**Method**: Random objective function optimization -- Generates random linear combinations of reactions -- Optimizes using LP solver (GLPK preferred, COBRApy fallback) -- Fast and memory-efficient +### CBS (Corner-Based Sampling) +- Random objective optimization +- Requires GLPK (recommended) or COBRApy solver - Suitable for large models -**Advantages**: -- High performance with GLPK -- Good coverage of solution space -- Robust to model size +### OPTGP (MCMC Sampling) +- Markov Chain Monte Carlo +- Uniform sampling guarantee +- Requires thinning parameter -### OPTGP (Optimal Growth Perturbation) +## Input Modes + +The tool supports two different input formats: -**Method**: MCMC-based sampling -- Markov Chain Monte Carlo with growth optimization -- Requires thinning to reduce autocorrelation -- More computationally intensive -- Better theoretical guarantees +### Mode 1: Model + Bounds (default, `--model_and_bounds true`) +Upload one base model + multiple bound files (one per sample/context): +- Base model: Tabular file with reaction structure (from Import Metabolic Model) +- Bounds: Individual TSV files with sample-specific constraints (from RAS to Bounds) +- Use when you have RAS-derived bounds for multiple samples -**Advantages**: -- Uniform sampling guarantee -- Well-established method -- Good for smaller models +### Mode 2: Multiple Complete Models (`--model_and_bounds false`) +Upload pre-built model files, each already containing integrated bounds: +- Each file is a complete tabular model with reaction structure + bounds +- Use when models are already prepared with specific constraints +- Useful for comparing different modelling scenarios -## Input Formats +## Input Format -### Bounds Files - -Tab-separated format with reaction bounds: +Bounds files (TSV): ``` -Reaction lower_bound upper_bound -R00001 -1000.0 1250.5 -R00002 -650.2 1000.0 -R00003 0.0 2150.8 +reaction lower_bound upper_bound +R00001 -125.0 125.0 +R00002 -65.0 65.0 ``` -Multiple bounds files can be processed simultaneously by providing comma-separated paths. - -### Custom Model File (Optional) +**File Format Notes:** +- Use **tab-separated** values (TSV) +- Column headers must be: reaction, lower_bound, upper_bound +- Reaction IDs must match model reaction IDs +- Numeric values for bounds -SBML format metabolic model compatible with COBRApy. +## Sampling Outputs -## Output Formats - -### Flux Statistics +The tool can generate different types of output from flux sampling: -#### Mean Fluxes (`mean.csv`) -``` -Reaction Sample1 Sample2 Sample3 -R00001 15.23 -8.45 22.1 -R00002 0.0 12.67 -5.3 -R00003 45.8 38.2 51.7 -``` +| Output Type | Description | +|-------------|-------------| +| **mean** | Mean flux across all samples | +| **median** | Median flux across all samples | +| **quantiles** | 25th, 50th, 75th percentiles | +| **fluxes** | Complete flux distributions (all samples, all reactions) | -#### Median Fluxes (`median.csv`) -``` -Reaction Sample1 Sample2 Sample3 -R00001 14.1 -7.8 21.5 -R00002 0.0 11.9 -4.8 -R00003 44.2 37.1 50.3 -``` +**Note**: The `fluxes` output can be very large for many samples. Use summary statistics (mean/median/quantiles) unless you need the complete distribution. + +## Optimization Methods + +In alternative to sampling, the tool can perform optimization analyses: -#### Quantiles (`quantiles.csv`) -``` -Reaction Sample1_q1 Sample1_q2 Sample1_q3 Sample2_q1 ... -R00001 10.5 14.1 18.7 -12.3 ... -R00002 -2.1 0.0 1.8 8.9 ... -R00003 38.9 44.2 49.8 32.1 ... -``` +| Method | Description | Output | +|--------|-------------|--------| +| **FVA** | Flux Variability Analysis | Min/max flux ranges for each reaction | +| **pFBA** | Parsimonious FBA | Flux distribution with minimal total flux | +| **sensitivity** | Reaction knockout analysis | Biomass impact of single reaction deletions | -### Additional Analyses +### FVA Optimality Fraction -#### pFBA (`pFBA.csv`) -Parsimonious Flux Balance Analysis results: -``` -Reaction Sample1 Sample2 Sample3 -R00001 12.5 -6.7 19.3 -R00002 0.0 8.9 -3.2 -R00003 41.2 35.8 47.9 -``` +The `--perc_opt` parameter (default: 0.90) controls the optimality constraint for FVA: +- **1.0**: Only optimal solutions (100% of maximum biomass) +- **0.90**: Allow suboptimal solutions (≥90% of maximum biomass) +- **Lower values**: Explore broader flux ranges + +## Output -#### FVA (`FVA.csv`) -Flux Variability Analysis bounds: -``` -Reaction Sample1_min Sample1_max Sample2_min Sample2_max ... -R00001 -5.2 35.8 -25.3 8.7 ... -R00002 -8.9 8.9 0.0 28.4 ... -R00003 15.6 78.3 10.2 65.9 ... -``` - -#### Sensitivity (`sensitivity.csv`) -Single reaction deletion effects: -``` -Reaction Sample1 Sample2 Sample3 -R00001 0.98 0.95 0.97 -R00002 1.0 0.87 1.0 -R00003 0.23 0.19 0.31 -``` +- `mean.csv`: Mean flux values +- `median.csv`: Median flux values +- `quantiles.csv`: Flux quantiles (25%, 50%, 75%) +- `fluxes/`: Complete flux distributions (if requested) +- `fva.csv`: FVA results (if requested) +- `pfba.csv`: pFBA results (if requested) +- `sensitivity.csv`: Knockout sensitivity analysis (if requested) +- `*.log`: Processing log ## Examples ### Basic CBS Sampling ```bash -# Simple CBS sampling with statistics -flux_simulation -td /opt/COBRAxy \ - -ms ENGRO2 \ - -in sample1_bounds.tsv,sample2_bounds.tsv \ +flux_simulation -ms ENGRO2 \ + -in bounds/*.tsv \ -ni Sample1,Sample2 \ -a CBS \ - -ns 500 \ - -nb 2 \ - -sd 42 \ - -ot mean,median \ - -ota pFBA \ - -idop cbs_results/ -``` - -### Comprehensive OPTGP Analysis - -```bash -# Full analysis with OPTGP -flux_simulation -td /opt/COBRAxy \ - -ms ENGRO2 \ - -in bounds/*.tsv \ - -ni Sample1,Sample2,Sample3,Control1,Control2 \ - -a OPTGP \ - -th 200 \ -ns 1000 \ - -nb 1 \ - -sd 123 \ - -ot mean,median,quantiles,fluxes \ - -ota pFBA,FVA,sensitivity \ - -idop comprehensive_analysis/ \ - -ol sampling.log -``` - -### Custom Model Sampling - -```bash -# Use custom model with CBS -flux_simulation -td /opt/COBRAxy \ - -ms Custom \ - -mo models/tissue_specific.xml \ - -mn tissue_specific.xml \ - -in patient_bounds.tsv \ - -ni PatientA \ - -a CBS \ - -ns 2000 \ - -nb 5 \ - -sd 456 \ - -ot mean,quantiles \ - -ota FVA,sensitivity \ - -idop patient_analysis/ -``` - -### Batch Processing Multiple Conditions - -```bash -# Process multiple experimental conditions -flux_simulation -td /opt/COBRAxy \ - -ms ENGRO2 \ - -in ctrl1.tsv,ctrl2.tsv,treat1.tsv,treat2.tsv \ - -ni Control1,Control2,Treatment1,Treatment2 \ - -a CBS \ - -ns 800 \ - -nb 3 \ - -sd 789 \ - -ot mean,median,fluxes \ - -ota pFBA,FVA \ - -idop batch_conditions/ + -idop output/ ``` -## Algorithm Selection Guide - -### Choose CBS When: -- Large models (>1000 reactions) -- High sample throughput required -- GLPK solver available -- Memory constraints present - -### Choose OPTGP When: -- Theoretical sampling guarantees needed -- Smaller models (<500 reactions) -- Sufficient computational resources -- Publication-quality sampling required - -## Performance Optimization - -### CBS Optimization -- Install GLPK and swiglpk for maximum performance -- Increase batch number rather than samples per batch -- Monitor memory usage for large models - -### OPTGP Optimization -- Adjust thinning based on model size (100-500) -- Use parallel processing when available -- Consider warmup period for chain convergence - -### General Tips -- Use appropriate sample sizes (500-2000 per condition) -- Balance batches vs samples for memory management -- Set consistent random seeds for reproducibility - -## Quality Control - -### Convergence Assessment -- Compare statistics across batches -- Check for systematic trends in sampling -- Validate against known flux ranges - -### Statistical Validation -- Ensure adequate sample sizes (n≥100 recommended) -- Check for outliers and artifacts -- Validate against experimental flux data when available - -### Output Verification -- Confirm mass balance constraints satisfied -- Check thermodynamic consistency -- Verify biological plausibility of results - -## Integration Workflow - -### Upstream Tools -- [RAS to Bounds](ras-to-bounds.md) - Generate constrained bounds from RAS -- [Import Metabolic Model](import-metabolic-model.md) - Extract model components - -### Downstream Tools -- [Flux to Map](flux-to-map.md) - Visualize flux distributions on metabolic maps -- [MAREA](marea.md) - Statistical analysis of flux differences - -### Typical Pipeline +### OPTGP Sampling ```bash -# 1. Generate sample-specific bounds -ras_to_bounds -td /opt/COBRAxy -ms ENGRO2 -ir ras.tsv -idop bounds/ +flux_simulation -ms ENGRO2 \ + -in bounds/*.tsv \ + -ni Sample1,Sample2 \ + -a OPTGP \ + -ns 1000 \ + -th 200 \ + -idop output/ +``` -# 2. Sample fluxes from constrained models -flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \ - -ni Sample1,Sample2,Sample3 -a CBS -ns 1000 \ - -ot mean,quantiles -ota pFBA,FVA -idop fluxes/ +### Custom Model with CBS Sampling -# 3. Visualize results on metabolic maps -flux_to_map -td /opt/COBRAxy -input_data_fluxes fluxes/mean.csv \ - -choice_map ENGRO2 -idop flux_maps/ +```bash +flux_simulation -ms Custom \ + -mo custom_model.xml \ + -in bounds/*.tsv \ + -ni Sample1 \ + -a CBS \ + -ns 2000 \ + -idop output/ ``` ## Troubleshooting -### Common Issues - -**CBS sampling fails** -- GLPK installation issues → Install GLPK and swiglpk -- Model infeasibility → Check bounds constraints -- Memory errors → Reduce samples per batch - -**OPTGP convergence problems** -- Poor mixing → Increase thinning parameter -- Slow convergence → Extend sampling time -- Chain stuck → Check model feasibility - -**Output files missing** -- Insufficient disk space → Check available storage -- Permission errors → Verify write permissions -- Invalid sample names → Check naming conventions - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "GLPK solver failed" | Missing GLPK/swiglpk | Install GLPK libraries | -| "Model infeasible" | Over-constrained bounds | Relax constraints or check model | -| "Sampling timeout" | Insufficient time/resources | Reduce sample size or increase resources | - -### Performance Issues - -**Slow sampling** -- Use CBS instead of OPTGP for speed -- Reduce model size if possible -- Increase system resources - -**Memory errors** -- Lower samples per batch -- Process samples sequentially -- Use more efficient data formats - -**Disk space issues** -- Monitor output file sizes -- Clean intermediate files -- Use compressed formats when possible - -## Advanced Usage - -### Custom Sampling Parameters - -For fine-tuning sampling behavior, advanced users can modify: -- Objective function generation (CBS) -- MCMC parameters (OPTGP) -- Convergence criteria -- Output precision and format - -### Parallel Processing - -```bash -# Split sampling across multiple cores/nodes -for i in {1..4}; do - flux_simulation -td /opt/COBRAxy -ms ENGRO2 \ - -in subset_${i}_bounds.tsv \ - -ni Batch${i} -a CBS -ns 250 \ - -sd $((42 + i)) -idop batch_${i}/ & -done -wait -``` - -### Result Aggregation - -Combine results from multiple simulation runs: - -```bash -# Merge statistics files -python merge_flux_results.py -i batch_*/mean.csv -o combined_mean.csv -``` +| Error | Solution | +|-------|----------| +| "GLPK solver failed" | Install GLPK libraries | +| "Model infeasible" | Check bounds constraints | ## See Also -- [RAS to Bounds](ras-to-bounds.md) - Generate input constraints -- [Flux to Map](flux-to-map.md) - Visualize flux results -- [CBS Algorithm Documentation](/tutorials/cbs-algorithm.md) -- [OPTGP Algorithm Documentation](/tutorials/optgp-algorithm.md) \ No newline at end of file +- [RAS to Bounds](tools/ras-to-bounds) +- [Flux to Map](tools/flux-to-map) +- [Built-in Models](reference/built-in-models)
