Mercurial > repos > bimib > cobraxy
changeset 547:73f2f7e2be17 draft
Uploaded
| author | francesco_lapi |
|---|---|
| date | Tue, 28 Oct 2025 10:44:07 +0000 |
| parents | 01147e83f43c |
| children | 5aef7b860706 |
| files | COBRAxy/README.md COBRAxy/docs/README.md COBRAxy/docs/_sidebar.md COBRAxy/docs/getting-started.md COBRAxy/docs/installation.md COBRAxy/docs/quickstart.md COBRAxy/docs/reference/built-in-models.md COBRAxy/docs/tools/README.md COBRAxy/docs/tools/export-metabolic-model.md COBRAxy/docs/tools/flux-simulation.md COBRAxy/docs/tools/flux-to-map.md COBRAxy/docs/tools/import-metabolic-model.md COBRAxy/docs/tools/marea-cluster.md COBRAxy/docs/tools/marea.md COBRAxy/docs/tools/ras-generator.md COBRAxy/docs/tools/ras-to-bounds.md COBRAxy/docs/tools/rps-generator.md COBRAxy/docs/troubleshooting.md COBRAxy/docs/tutorials/README.md COBRAxy/src/flux_simulation.py COBRAxy/src/test/testing.py COBRAxy/test/testing.py |
| diffstat | 22 files changed, 2238 insertions(+), 4579 deletions(-) [+] |
line wrap: on
line diff
--- a/COBRAxy/README.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/README.md Tue Oct 28 10:44:07 2025 +0000 @@ -6,17 +6,18 @@ A Python-based command-line suite for metabolic flux analysis and visualization, with [Galaxy](http://marea4galaxy.cloud.ba.infn.it/galaxy) integration. -COBRAxy transforms gene expression and metabolite data into meaningful metabolic insights through flux sampling and interactive pathway maps. +COBRAxy enables the integration of transcriptomics data with COBRA-based metabolic models, offering a comprehensive framework for studying metabolism in both health and disease. With COBRAxy, users can load and enrich metabolic models by incorporating transcriptomic data and adjusting the model's medium conditions. DOC: https://compbtbs.github.io/COBRAxy ## Features + +- **Galaxy Tools** - Web-based analysis with intuitive interface - **Import/Export** of metabolic models in multiple formats (SBML, JSON, MAT, YAML) -- **Reaction Activity Scores (RAS)** from gene expression data -- **Reaction Propensity Scores (RPS)** from metabolite abundance -- **Flux sampling** with CBS or OptGP algorithms -- **Statistical analysis** with pFBA, FVA, and sensitivity analysis -- **Interactive maps** with SVG/PDF export and custom styling -- **Galaxy tools** for web-based analysis -- **Built-in models** including ENGRO2 and Recon +- **Reaction Activity Scores (RAS)** - Compute metabolic activity from gene expression data +- **Reaction Propensity Scores (RPS)** - Infer metabolic preferences from metabolite abundance +- **Flux computation** - Compute metabolic flux distributions using different optimization or sampling algorithms +- **Statistical Analysis** - Perform statistically significant flux differences between groups of samples and report on an enriched metabolic map +- **Built-in Models** - Ready-to-use models including ENGRO2 and Recon3D + ## Requirements @@ -25,19 +26,6 @@ - **Dependencies**: Automatically installed via pip (COBRApy, pandas, numpy, etc.) - **Build tools**: C/C++ compiler (gcc, clang, or MSVC), CMake for compiling Python extensions, pkg-config -**System dependencies** (install before pip): -```bash -# Ubuntu/Debian -sudo apt-get install build-essential cmake pkg-config libvips libglpk40 glpk-utils - -# macOS -xcode-select --install -brew install cmake pkg-config vips glpk - -# Windows (with Chocolatey) -choco install cmake visualstudio2022buildtools pkgconfiglite -``` - ### Installation **Recommended: Using Conda** @@ -55,28 +43,16 @@ cd COBRAxy/src pip install . ``` - -### Basic Workflow +## Galaxy Integration -```bash -# 1. Generate RAS from expression data -ras_generator -td $(pwd) -in expression.tsv -ra ras_output.tsv -rs ENGRO2 - -# 2. Generate RPS from metabolite data (optional) -rps_generator -td $(pwd) -id metabolites.tsv -rp rps_output.tsv +COBRAxy provides Galaxy tool wrappers (`.xml` files) for web-based analysis: -# 3. Create enriched pathway maps with statistical analysis -marea -td $(pwd) -using_RAS true -input_data ras_output.tsv -choice_map ENGRO2 -gs true -idop base_maps - -# 4. Apply RAS constraints to model for flux simulation -ras_to_bounds -td $(pwd) -ms ENGRO2 -ir ras_output.tsv -rs true -idop bounds_output +- Upload data through Galaxy interface +- Chain tools in visual workflows +- Share and reproduce analyses -# 5. Sample metabolic fluxes with constrained model -flux_simulation -td $(pwd) -ms ENGRO2 -in bounds_output/*.tsv -a CBS -ns 1000 -idop flux_results +For Galaxy installation and setup, refer to the [official Galaxy documentation](https://docs.galaxyproject.org/). -# 6. Add flux data to enriched maps -flux_to_map -td $(pwd) -if flux_results/*.tsv -mp base_maps/*.svg -idop final_maps -``` ## Tools @@ -121,49 +97,27 @@ Final Enriched Maps ``` -## Built-in Models & Data - -COBRAxy includes ready-to-use resources: - -- **Models**: ENGRO2, Recon (human metabolism) -- **Gene mappings**: HGNC, Ensembl, Entrez ID conversions -- **Pathway map**: Pre-styled SVG templates for ENGRO2 -- **Medium compositions**: Standard growth conditions +### Basic Workflow -Located in `src/local/` directory for immediate use. - -## Command Line Usage +```bash +# 1. Generate RAS from expression data +ras_generator -in expression.tsv -ra ras_output.tsv -rs ENGRO2 -All tools support `--help` for detailed options. Key commands: - -### Generate RAS/RPS scores -```bash -# From gene expression -ras_generator -td $(pwd) -in expression.tsv -ra ras_output.tsv -rs ENGRO2 +# 2. Generate RPS from metabolite data (optional) +rps_generator -id metabolites.tsv -rp rps_output.tsv -# From metabolite data -rps_generator -td $(pwd) -id metabolites.tsv -rp rps_output.tsv -``` +# 3. Create enriched pathway maps with statistical analysis +marea -using_RAS true -input_data ras_output.tsv -choice_map ENGRO2 -gs true -idop base_maps -### Flux sampling -```bash -flux_simulation -td $(pwd) -ms ENGRO2 -in bounds/*.tsv -a CBS -ns 1000 -idop results/ -``` +# 4. Apply RAS constraints to model for flux simulation +ras_to_bounds -ms ENGRO2 -ir ras_output.tsv -rs true -idop bounds_output -### Statistical analysis & visualization -```bash -marea -td $(pwd) -using_RAS true -input_data ras.tsv -choice_map ENGRO2 -gs true -idop maps/ -``` - -## Galaxy Integration +# 5. Sample metabolic fluxes with constrained model +flux_simulation -ms ENGRO2 -in bounds_output/*.tsv -a CBS -ns 1000 -idop flux_results -COBRAxy provides Galaxy tool wrappers (`.xml` files) for web-based analysis: - -- Upload data through Galaxy interface -- Chain tools in visual workflows -- Share and reproduce analyses - -For Galaxy installation and setup, refer to the [official Galaxy documentation](https://docs.galaxyproject.org/). +# 6. Add flux data to enriched maps +flux_to_map -if flux_results/*.tsv -mp base_maps/*.svg -idop final_maps +``` ## Troubleshooting
--- a/COBRAxy/docs/README.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/README.md Tue Oct 28 10:44:07 2025 +0000 @@ -26,7 +26,7 @@ ## Quick Navigation -### [Installation](installation.md) +### [Installation](installation) Install COBRAxy and get it running on your system ### [Tutorials](tutorials/) @@ -54,4 +54,4 @@ - **Issues**: Report bugs and request features on [GitHub](https://github.com/CompBtBs/COBRAxy/issues) - **Contributing**: Help improve COBRAxy -**Ready to start?** Follow the [Installation Guide](installation.md) to get COBRAxy up and running! \ No newline at end of file +**Ready to start?** Follow the [Installation Guide](installation) to get COBRAxy up and running! \ No newline at end of file
--- a/COBRAxy/docs/_sidebar.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/_sidebar.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,19 +1,16 @@ -<!-- docs/_sidebar.md --> - * [Home](/) - * [Installation](/installation.md) - * [Tutorials](/tutorials/) * [Galaxy Setup](/tutorials/galaxy-setup.md) - -* [Tools Documentation](/tools/) - * [Import Metabolic Model](/tools/import-metabolic-model.md) - * [Export Metabolic Model](/tools/export-metabolic-model.md) +* [Tools](/tools/) + * [Import Model](/tools/import-metabolic-model.md) + * [Export Model](/tools/export-metabolic-model.md) * [RAS Generator](/tools/ras-generator.md) * [RPS Generator](/tools/rps-generator.md) - * [MAREA](/tools/marea.md) * [RAS to Bounds](/tools/ras-to-bounds.md) * [Flux Simulation](/tools/flux-simulation.md) + * [MAREA](/tools/marea.md) * [Flux to Map](/tools/flux-to-map.md) - * [MAREA Cluster](/tools/marea-cluster.md) \ No newline at end of file + * [MAREA Cluster](/tools/marea-cluster.md) +* [Reference](/reference/built-in-models.md) +* [Troubleshooting](/troubleshooting.md) \ No newline at end of file
--- a/COBRAxy/docs/getting-started.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/getting-started.md Tue Oct 28 10:44:07 2025 +0000 @@ -74,7 +74,6 @@ ```bash # Generate RAS from expression data -# Note: -td is optional and auto-detected after pip install ras_generator \ -in expression_data.tsv \ -ra ras_output.tsv \ @@ -85,7 +84,6 @@ ```bash # Generate enriched pathway maps -# Note: -td is optional and auto-detected after pip install marea \ -using_RAS true \ -input_data ras_output.tsv \ @@ -138,13 +136,12 @@ Now that you understand the basics: -1. **[Quick Start Guide](/quickstart.md)** - Complete walkthrough with example data -2. **[Galaxy Tutorial](/tutorials/galaxy-setup.md)** - Web-based analysis setup +1. **[Quick Start Guide](quickstart)** - Complete walkthrough with example data +2. **[Galaxy Tutorial](tutorials/galaxy-setup)** - Web-based analysis setup 3. **[Tools Reference](/tools/)** - Detailed documentation for each tool -4. **[Examples](/examples/)** - Real-world analysis examples ## Need Help? -- **[Troubleshooting](/troubleshooting.md)** - Common issues and solutions +- **[Troubleshooting](troubleshooting)** - Common issues and solutions - **[GitHub Issues](https://github.com/CompBtBs/COBRAxy/issues)** - Report bugs or ask questions -- **[Contributing](/contributing.md)** - Help improve COBRAxy \ No newline at end of file +- **[Contributing](contributing)** - Help improve COBRAxy \ No newline at end of file
--- a/COBRAxy/docs/installation.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/installation.md Tue Oct 28 10:44:07 2025 +0000 @@ -8,22 +8,6 @@ - **Operating System**: Linux (recommended), macOS, Windows - **Build tools**: C/C++ compiler (gcc, clang, or MSVC), CMake, pkg-config -## System Dependencies - -Install required build tools before installing COBRAxy: - -```bash -# Ubuntu/Debian -sudo apt-get install build-essential cmake pkg-config libvips libglpk40 glpk-utils - -# macOS -xcode-select --install -brew install cmake pkg-config vips glpk - -# Windows (with Chocolatey) -choco install cmake visualstudio2022buildtools pkgconfiglite -``` - ## Installation Methods ### Recommended: Using Conda @@ -44,29 +28,6 @@ pip install . ``` -### Alternative: Direct Installation - -If you have system dependencies already installed: - -```bash -# Clone the repository -git clone https://github.com/CompBtBs/COBRAxy.git -cd COBRAxy/src - -# Install COBRAxy -pip install . -``` - -### Development Install - -For development or if you want to modify COBRAxy: - -```bash -# Clone and install in editable mode -git clone https://github.com/CompBtBs/COBRAxy.git -cd COBRAxy/src -pip install -e . -``` ## Verify Installation @@ -146,17 +107,17 @@ After successful installation: -1. **[Quick Start Guide](/quickstart.md)** - Run your first analysis -2. **[Tutorial: Galaxy Setup](/tutorials/galaxy-setup.md)** - Set up web interface +1. **[Quick Start Guide](quickstart)** - Run your first analysis +2. **[Tutorial: Galaxy Setup](tutorials/galaxy-setup)** - Set up web interface ## Getting Help If you encounter issues: -1. Check the [Troubleshooting Guide](/troubleshooting.md) +1. Check the [Troubleshooting Guide](troubleshooting) 2. Search [existing issues](https://github.com/CompBtBs/COBRAxy/issues) 3. Create a [new issue](https://github.com/CompBtBs/COBRAxy/issues/new) with: - Your operating system - Python version (`python --version`) - Complete error message - - Installation method used \ No newline at end of file + - Installation method used
--- a/COBRAxy/docs/quickstart.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/quickstart.md Tue Oct 28 10:44:07 2025 +0000 @@ -38,7 +38,6 @@ ```bash # Generate RAS scores using built-in ENGRO2 model -# Note: -td is optional and auto-detected after pip install ras_generator \ -in sample_expression.tsv \ -ra ras_scores.tsv \ @@ -62,7 +61,6 @@ ```bash # Create pathway maps with statistical analysis -# Note: -td is optional and auto-detected after pip install marea \ -using_RAS true \ -input_data ras_scores.tsv \ @@ -97,19 +95,19 @@ ### Learn More About the Analysis -- **[Understanding RAS](/tools/ras-generator.md)** - How activity scores are computed -- **[MAREA Analysis](/tools/marea.md)** - Statistical enrichment methods +- **[Understanding RAS](tools/ras-generator)** - How activity scores are computed +- **[MAREA Analysis](tools/marea)** - Statistical enrichment methods - **[Data Flow](getting-started.md#analysis-workflows)** - Complete workflow overview ### Try Advanced Features - **[Flux Sampling](tutorials/workflow.md#flux-simulation-workflow)** - Predict metabolic flux distributions -- **[Galaxy Interface](/tutorials/galaxy-setup.md)** - Web-based analysis +- **[Galaxy Interface](tutorials/galaxy-setup)** - Web-based analysis ### Use Your Own Data -- **[Data Formats](/tutorials/data-formats.md)** - Prepare your expression data -- **[Troubleshooting](/troubleshooting.md)** - Common issues and solutions +- **[Data Formats](tutorials/data-formats)** - Prepare your expression data +- **[Troubleshooting](troubleshooting)** - Common issues and solutions ## Complete Example Pipeline @@ -125,7 +123,6 @@ EOF # Run analysis pipeline -# Note: -td is optional and auto-detected after pip install ras_generator -in expression.tsv -ra ras.tsv -rs ENGRO2 marea -using_RAS true -input_data ras.tsv -choice_map ENGRO2 -gs true -idop maps @@ -140,4 +137,4 @@ 1. **Check Prerequisites**: Ensure COBRAxy is properly installed 2. **Verify File Format**: Make sure your data is tab-separated TSV 3. **Review Logs**: Look for error messages in the terminal output -4. **Consult Guides**: [Troubleshooting](/troubleshooting.md) and [Installation](/installation.md) \ No newline at end of file +4. **Consult Guides**: [Troubleshooting](troubleshooting) and [Installation](installation) \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/COBRAxy/docs/reference/built-in-models.md Tue Oct 28 10:44:07 2025 +0000 @@ -0,0 +1,44 @@ +# Built-in Models + +COBRAxy includes two pre-installed metabolic models for human metabolism analysis. + +## ENGRO2 (Recommended) + +**Best for**: General metabolic analysis + +- ~2,000 reactions, ~1,500 metabolites, ~500 genes +- Balanced coverage +- Core metabolic pathways well-represented +- **Use for**: Tissue profiling, disease comparisons, time-series analysis + +## Recon (Comprehensive) + +**Best for**: Genome-wide studies + +- ~10,000 reactions, ~5,000 metabolites, ~2,000 genes +- Most complete human metabolic network +- Includes rare and specialized pathways +- **Use for**: Comprehensive studies, rare diseases + +## Usage + +```bash +# Specify model name in any COBRAxy tool +tool_name --model ENGRO2 [options] +``` + +## Technical Notes + +**Gene IDs**: All models support HGNC ID (e.g., `HGNC:5`), HGNC Symbol (e.g., `ALDOA`), Ensembl, and Entrez formats. + +**GPR Rules**: Gene-Protein-Reaction associations use Boolean logic (`and`, `or`). + +**Custom Models**: Use [Import Metabolic Model](../tools/import-metabolic-model) to extract and [Export Metabolic Model](../tools/export-metabolic-model) to create custom models. + +## See Also + +- [RAS Generator](../tools/ras-generator) - Uses model GPR rules +- [RAS to Bounds](../tools/ras-to-bounds) - Applies flux constraints to models +- [Flux Simulation](../tools/flux-simulation) - Samples model flux distributions +- [Import Metabolic Model](../tools/import-metabolic-model) - Extract model data +- [Export Metabolic Model](../tools/export-metabolic-model) - Create custom models
--- a/COBRAxy/docs/tools/README.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/README.md Tue Oct 28 10:44:07 2025 +0000 @@ -2,18 +2,17 @@ ## Available Tools -| Tool | Purpose | Input | Output | -|------|---------|--------|--------| -| [Import Metabolic Model](tools/import-metabolic-model) | Import and extract model components | SBML/JSON/MAT/YAML model | Tabular model data | -| [Export Metabolic Model](tools/export-metabolic-model) | Export tabular data to model format | Tabular model data | SBML/JSON/MAT/YAML model | -| [RAS Generator](tools/ras-generator) | Compute reaction activity scores | Gene expression + GPR rules | RAS values | -| [RPS Generator](tools/rps-generator) | Compute reaction propensity scores | Metabolite abundance | RPS values | -| [MAREA](tools/marea) | Statistical pathway enrichment | RAS/RPS data | Enriched maps + statistics | -| [RAS to Bounds](tools/ras-to-bounds) | Apply RAS constraints to model | RAS + SBML model | Constrained bounds | -| [Flux Simulation](tools/flux-simulation) | Sample metabolic fluxes | Constrained model | Flux distributions | -| [Flux to Map](tools/flux-to-map) | Visualize flux data on maps | Flux samples + statistical comparison | Color-coded pathway maps | -| [MAREA Cluster](tools/marea-cluster) | Cluster analysis | Expression/RAS/RPS/flux data | Sample clusters + validation plots | - +| Galaxy Tool | Python script | Purpose | Input | Output | +|------|---------|---------|--------|--------| +| [Import Metabolic Model](tools/import-metabolic-model) | importMetabolicModel | Import and extract model components | SBML/JSON/MAT/YAML model | Tabular model data | +| [Export Metabolic Model](tools/export-metabolic-model) | exportMetabolicModel |Export tabular data to model format | Tabular model data | SBML/JSON/MAT/YAML model | +| [Expression2RAS](tools/ras-generator) | ras_generator | Compute reaction activity scores | Gene expression + GPR rules | RAS values | +| [Expression2RPS](tools/rps-generator) | rps_generator | Compute reaction propensity scores | Metabolite abundance | RPS values | +| [Metabolic Reaction Enrichment Analysis](tools/marea) | marea | Statistical pathway enrichment | RAS/RPS data | Enriched maps + statistics | +| [RAS2Bounds](tools/ras-to-bounds) | ras_to_bounds | Apply RAS constraints to model | RAS + SBML model | Constrained bounds | +| [Flux Simulation](tools/flux-simulation) | flux_simulation | Sample metabolic fluxes | Constrained model | Flux distributions | +| [Metabolic Flux Enrichment Analysis](tools/flux-to-map) | flux_to_map | Visualize flux data on maps | Flux samples + statistical comparison | Color-coded pathway maps | +| [Cluster Analysis](tools/marea-cluster) | marea_cluster | Cluster analysis | Expression/RAS/RPS/flux data | Sample clusters + validation plots | ## Analysis Workflows @@ -29,39 +28,12 @@ ### Command Line Usage ```bash # Basic pattern for all tools -tool_name -td $(pwd) [tool-specific options] +tool_name [tool-specific options] # Example: Generate RAS scores -ras_generator -td $(pwd) -in expression.tsv -ra ras_output.tsv -rs ENGRO2 +ras_generator -in expression.tsv -ra ras_output.tsv -rs ENGRO2 ``` -## Parameter Reference - -### File Format Requirements - -**Gene Expression Files** -- Format: TSV (tab-separated values) -- Structure: Genes (rows) × Samples (columns) -- First column: Gene IDs (HGNC, Ensembl, etc.) -- Subsequent columns: Expression values - -**Metabolite Files** -- Format: TSV (tab-separated values) -- Structure: Metabolites (rows) × Samples (columns) -- First column: Metabolite names -- Subsequent columns: Abundance values - -**Model Files** -- Format: SBML (.xml) or tabular rules (.tsv) -- Content: Metabolic network with GPR rules - -### Built-in Models - -| Model | Organism | Reactions | Genes | Best For | -|-------|----------|-----------|-------|----------| -| **ENGRO2** | Human | ~500 | ~500 | Focused analysis, faster computation | -| **RECON3D** | Human | ~10,000 | ~2,000 | Comprehensive metabolism | - ## Tool Selection Guide ### Choose Your Analysis Path @@ -77,20 +49,10 @@ 3. [Flux Simulation](tools/flux-simulation) → Sample fluxes 4. [Flux to Map](tools/flux-to-map) → Create visualizations -**For Model Exploration** -1. [Import Metabolic Model](tools/import-metabolic-model) → Extract model info -2. Analyze model structure and gene coverage - **For Model Creation** 1. Create/edit tabular model data 2. [Export Metabolic Model](tools/export-metabolic-model) → Create SBML/JSON/MAT/YAML model -**For Sample Classification** -1. Generate RAS/RPS scores -2. [MAREA Cluster](tools/marea-cluster) → Cluster samples - - - ## Troubleshooting ### Common Issues Across Tools @@ -98,7 +60,6 @@ **Model Issues** - Verify model file format and gene ID consistency - Check gene ID mapping between data and model -- Use built-in models to avoid compatibility issues ### Getting Help @@ -106,7 +67,7 @@ 1. Check individual tool documentation 2. Review parameter requirements and formats 3. Test with smaller datasets first -4. Consult [troubleshooting guide](/troubleshooting.md) +4. Consult [troubleshooting guide](troubleshooting) ## Contributing @@ -116,4 +77,4 @@ - Contribute usage patterns - Fix documentation errors -Each tool page includes detailed parameter descriptions, examples, and troubleshooting tips. Select a tool from the sidebar to get started! \ No newline at end of file +Each tool page includes detailed parameter descriptions, examples, and troubleshooting tips.
--- a/COBRAxy/docs/tools/export-metabolic-model.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/export-metabolic-model.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,493 +1,113 @@ # Export Metabolic Model -Export tabular data (CSV/TSV) into COBRA metabolic models in various formats. +Convert tabular data into COBRA metabolic model. ## Overview -Export Metabolic Model (exportMetabolicModel) converts structured tabular data containing reaction information into fully functional COBRA metabolic models. This tool enables creation of custom models from spreadsheet data and supports multiple output formats including SBML, JSON, MATLAB, and YAML. +Export Metabolic Model converts structured tabular data (CSV/TSV) into functional COBRA models in SBML, JSON, MATLAB, or YAML formats. + +**Input**: Tabular model data (CSV/TSV) +**Output**: SBML/JSON/MAT/YAML model files + +## Galaxy Interface -## Usage +In Galaxy: **COBRAxy → Export Metabolic Model** -### Command Line +1. Upload tabular model data file +2. Select output format (SBML/JSON/MAT/YAML) +3. Click **Run tool** + +## Command-line console ```bash -exportMetabolicModel --input model_data.csv \ - --format sbml \ - --output custom_model.xml \ - --out_log conversion.log \ - --tool_dir /path/to/COBRAxy/src +exportMetabolicModel \ + --input model_data.csv \ + --format sbml \ + --output custom_model.xml \ + --out_log conversion.log ``` -### Galaxy Interface - -Select "Export Metabolic Model" from the COBRAxy tool suite and configure conversion parameters. - ## Parameters -### Required Parameters - | Parameter | Flag | Description | |-----------|------|-------------| | Input File | `--input` | Tabular file (CSV/TSV) with model data | -| Output Format | `--format` | Model format (sbml, json, mat, yaml) | +| Output Format | `--format` | Model format: sbml, json, mat, yaml | | Output File | `--output` | Output model file path | -| Output Log | `--out_log` | Log file for conversion process | - -### Optional Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Tool Directory | `--tool_dir` | COBRAxy installation directory | Current directory | +| Output Log | `--out_log` | Log file | ## Input Format -### Tabular Model Data - -The input file must contain structured model information with the following columns: +Required columns: ```csv -Reaction_ID,GPR_Rule,Reaction_Formula,Lower_Bound,Upper_Bound,Objective_Coefficient,Medium_Member,Compartment,Subsystem -R00001,GENE1 or GENE2,A + B -> C + D,-1000.0,1000.0,0.0,FALSE,cytosol,Glycolysis -R00002,GENE3 and GENE4,E <-> F,-1000.0,1000.0,0.0,FALSE,mitochondria,TCA_Cycle -EX_glc_e,-,glc_e <->,-1000.0,1000.0,0.0,TRUE,extracellular,Exchange -BIOMASS,GENE5,0.5 A + 0.3 B -> 1 BIOMASS,0.0,1000.0,1.0,FALSE,cytosol,Biomass +ReactionID,Formula,GPR,lower_bound,upper_bound,ObjectiveCoefficient,InMedium,TranslationIssues +R00001,A + B -> C + D,GENE1 or GENE2,-1000.0,1000.0,0.0,FALSE, +EX_glc_e,glc_e <->,-,-1000.0,1000.0,0.0,TRUE, ``` -### Required Columns - -| Column | Description | Format | -|--------|-------------|--------| -| **Reaction_ID** | Unique reaction identifier | String | -| **Reaction_Formula** | Stoichiometric equation | Metabolite notation | -| **Lower_Bound** | Minimum flux constraint | Numeric | -| **Upper_Bound** | Maximum flux constraint | Numeric | - -### Optional Columns - -| Column | Description | Default | -|--------|-------------|---------| -| **GPR_Rule** | Gene-protein-reaction association | Empty string | -| **Objective_Coefficient** | Biomass/objective weight | 0.0 | -| **Medium_Member** | Exchange reaction flag | FALSE | -| **Compartment** | Subcellular location | Empty | -| **Subsystem** | Metabolic pathway | Empty | - -## Output Formats - -### SBML (Systems Biology Markup Language) -- **Format**: XML-based standard -- **Extension**: `.xml` or `.sbml` -- **Use Case**: Interoperability with other tools -- **Advantages**: Widely supported, standardized - -### JSON (JavaScript Object Notation) -- **Format**: COBRApy native JSON -- **Extension**: `.json` -- **Use Case**: Python/COBRApy workflows -- **Advantages**: Human-readable, lightweight - -### MATLAB (.mat) -- **Format**: MATLAB workspace format -- **Extension**: `.mat` -- **Use Case**: MATLAB COBRA Toolbox -- **Advantages**: Direct MATLAB compatibility - -### YAML (YAML Ain't Markup Language) -- **Format**: Human-readable data serialization -- **Extension**: `.yml` or `.yaml` -- **Use Case**: Configuration and documentation -- **Advantages**: Most human-readable format +**File Format Notes:** +- Use **comma-separated** (CSV) or **tab-separated** (TSV) +- First row must contain column headers +- Required columns: ReactionID, Formula, lower_bound, upper_bound +- Optional columns: GPR, ObjectiveCoefficient, InMedium, Pathway_1, Pathway_2 ## Reaction Formula Syntax -### Standard Notation ``` -# Irreversible reaction +# Irreversible A + B -> C + D -# Reversible reaction +# Reversible A + B <-> C + D -# With stoichiometric coefficients +# With stoichiometry 2 A + 3 B -> 1 C + 4 D - -# Compartmentalized metabolites -glc_c + atp_c -> g6p_c + adp_c -``` - -### Compartment Suffixes -- `_c`: Cytosol -- `_m`: Mitochondria -- `_e`: Extracellular -- `_r`: Endoplasmic reticulum -- `_x`: Peroxisome -- `_n`: Nucleus - -### Exchange Reactions -``` -# Import reaction -EX_glc_e: glc_e <-> - -# Export reaction -EX_co2_e: co2_e <-> ``` ## GPR Rule Syntax -### Logical Operators -- **AND**: Gene products required together -- **OR**: Alternative gene products -- **Parentheses**: Grouping for complex logic - -### Examples ``` # Single gene GENE1 -# Alternative genes (isozymes) -GENE1 or GENE2 or GENE3 +# Alternative genes (OR) +GENE1 or GENE2 -# Required genes (complex) +# Required complex (AND) GENE1 and GENE2 -# Complex logic -(GENE1 and GENE2) or (GENE3 and GENE4) +# Nested logic +(GENE1 and GENE2) or GENE3 ``` +## Output Formats + +- **SBML**: XML standard, maximum compatibility +- **JSON**: COBRApy native format +- **MATLAB**: COBRA Toolbox compatibility +- **YAML**: Human-readable format + ## Examples -### Create Basic Model - -```bash -# Convert simple CSV to SBML model -exportMetabolicModel --input simple_model.csv \ - --format sbml \ - --output simple_model.xml \ - --out_log simple_conversion.log \ - --tool_dir /opt/COBRAxy/src -``` - -### Multi-format Export - -```bash -# Create models in all supported formats -formats=("sbml" "json" "mat" "yaml") -for fmt in "${formats[@]}"; do - exportMetabolicModel --input comprehensive_model.csv \ - --format "$fmt" \ - --output "model.$fmt" \ - --out_log "conversion_$fmt.log" \ - --tool_dir /opt/COBRAxy/src -done -``` - -### Custom Model Creation - -```bash -# Build tissue-specific model from curated data -exportMetabolicModel --input liver_reactions.tsv \ - --format sbml \ - --output liver_model.xml \ - --out_log liver_model.log \ - --tool_dir /opt/COBRAxy/src -``` - -### Model Integration Pipeline +### Basic Export ```bash -# Extract existing model, modify, and recreate -importMetabolicModel --model ENGRO2 \ - --out_tabular base_model.csv \ - --tool_dir /opt/COBRAxy/src - -# Edit base_model.csv with custom reactions/constraints - -# Create modified model -exportMetabolicModel --input modified_model.csv \ - --format sbml \ - --output custom_model.xml \ - --out_log custom_creation.log \ - --tool_dir /opt/COBRAxy/src -``` - -## Model Validation - -### Automatic Checks - -The tool performs validation during conversion: -- **Stoichiometric Balance**: Reaction mass balance -- **Metabolite Consistency**: Compartment assignments -- **Bound Validation**: Feasible constraint ranges -- **Objective Function**: Valid biomass reaction - -### Post-conversion Validation - -```python -import cobra - -# Load and validate model -model = cobra.io.read_sbml_model('custom_model.xml') - -# Check basic properties -print(f"Reactions: {len(model.reactions)}") -print(f"Metabolites: {len(model.metabolites)}") -print(f"Genes: {len(model.genes)}") - -# Test model solvability -solution = model.optimize() -print(f"Growth rate: {solution.objective_value}") - -# Validate mass balance -unbalanced = cobra.flux_analysis.check_mass_balance(model) -if unbalanced: - print("Unbalanced reactions found:", unbalanced) -``` - -## Integration Workflow - -### Upstream Data Sources - -#### COBRAxy Tools -- [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data for modification - -#### External Sources -- **Databases**: KEGG, Reactome, BiGG -- **Literature**: Manually curated reactions -- **Spreadsheets**: User-defined custom models - -### Downstream Applications - -#### COBRAxy Analysis -- [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom model -- [Flux Simulation](flux-simulation.md) - Sample fluxes from custom model -- [MAREA](marea.md) - Analyze custom pathways - -#### External Tools -- **COBRApy**: Python-based analysis -- **COBRA Toolbox**: MATLAB analysis -- **OptFlux**: Strain design -- **Escher**: Pathway visualization - -### Typical Pipeline - -```bash -# 1. Start with existing model data -importMetabolicModel --model ENGRO2 \ - --out_tabular base_reactions.csv \ - --tool_dir /opt/COBRAxy/src - -# 2. Modify/extend the reaction data -# Edit base_reactions.csv to add tissue-specific reactions - -# 3. Create custom model -exportMetabolicModel --input modified_reactions.csv \ +exportMetabolicModel --input model.csv \ --format sbml \ - --output tissue_model.xml \ - --out_log tissue_creation.log \ - --tool_dir /opt/COBRAxy/src - -# 4. Validate and use custom model -ras_to_bounds --model Custom --input tissue_model.xml \ - --ras_input tissue_expression.tsv \ - --idop tissue_bounds/ \ - --tool_dir /opt/COBRAxy/src - -# 5. Perform flux analysis -flux_simulation --model Custom --input tissue_model.xml \ - --bounds tissue_bounds/*.tsv \ - --algorithm CBS --idop tissue_fluxes/ \ - --tool_dir /opt/COBRAxy/src -``` - -## Quality Control - -### Input Data Validation - -#### Pre-conversion Checks -- **Format Consistency**: Verify column headers and data types -- **Reaction Completeness**: Check for missing required fields -- **Stoichiometric Validity**: Validate reaction formulas -- **Bound Feasibility**: Ensure lower ≤ upper bounds - -#### Common Data Issues -```bash -# Check for missing reaction IDs -awk -F',' 'NR>1 && ($1=="" || $1=="NA") {print "Empty ID in line " NR}' input.csv - -# Validate reaction directions -awk -F',' 'NR>1 && $3 !~ /->|<->/ {print "Invalid formula: " $1 ", " $3}' input.csv - -# Check bound consistency -awk -F',' 'NR>1 && $4>$5 {print "Invalid bounds: " $1 ", LB=" $4 " > UB=" $5}' input.csv + --output model.xml \ + --out_log conversion.log ``` -### Model Quality Assessment - -#### Structural Properties -- **Network Connectivity**: Ensure realistic pathway structure -- **Compartmentalization**: Validate transport reactions -- **Exchange Reactions**: Verify medium composition -- **Biomass Function**: Check objective reaction completeness - -#### Functional Testing -```python -# Test model functionality -model = cobra.io.read_sbml_model('custom_model.xml') - -# Check growth capability -growth = model.optimize().objective_value -print(f"Maximum growth rate: {growth}") - -# Flux Variability Analysis -fva_result = cobra.flux_analysis.flux_variability_analysis(model) -blocked_reactions = fva_result[(fva_result.minimum == 0) & (fva_result.maximum == 0)] -print(f"Blocked reactions: {len(blocked_reactions)}") - -# Essential gene analysis -essential_genes = cobra.flux_analysis.find_essential_genes(model) -print(f"Essential genes: {len(essential_genes)}") -``` - -## Tips and Best Practices - -### Data Preparation -- **Consistent Naming**: Use systematic metabolite/reaction IDs -- **Compartment Notation**: Follow standard suffixes (_c, _m, _e) -- **Balanced Reactions**: Verify mass and charge balance -- **Realistic Bounds**: Use physiologically relevant constraints - -### Model Design -- **Modular Structure**: Organize reactions by pathway/subsystem -- **Exchange Reactions**: Include all necessary transport processes -- **Biomass Function**: Define appropriate growth objective -- **Gene Associations**: Add GPR rules where available - -### Format Selection -- **SBML**: Choose for maximum compatibility and sharing -- **JSON**: Use for COBRApy-specific workflows -- **MATLAB**: Select for COBRA Toolbox integration -- **YAML**: Pick for human-readable documentation - -### Performance Optimization -- **Model Size**: Balance comprehensiveness with computational efficiency -- **Reaction Pruning**: Remove unnecessary or blocked reactions -- **Compartmentalization**: Minimize unnecessary compartments -- **Validation**: Test model properties before distribution - ## Troubleshooting -### Common Issues - -**Conversion fails with format error** -- Check CSV/TSV column headers and data consistency -- Verify reaction formula syntax -- Ensure numeric fields contain valid numbers - -**Model is infeasible after conversion** -- Check reaction bounds for conflicts -- Verify exchange reaction setup -- Validate stoichiometric balance - -**Missing metabolites or reactions** -- Confirm all required columns present in input -- Check for empty rows or malformed data -- Validate reaction formula parsing - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "Input file not found" | Invalid file path | Check file location and permissions | -| "Unknown format" | Invalid output format | Use: sbml, json, mat, or yaml | -| "Formula parsing failed" | Malformed reaction equation | Check reaction formula syntax | -| "Model infeasible" | Conflicting constraints | Review bounds and exchange reactions | - -### Performance Issues - -**Slow conversion** -- Large input files require more processing time -- Complex GPR rules increase parsing overhead -- Monitor system memory usage - -**Memory errors** -- Reduce model size or split into smaller files -- Increase available system memory -- Use more efficient data structures - -**Output file corruption** -- Ensure sufficient disk space -- Check file write permissions -- Verify format-specific requirements - -## Advanced Usage - -### Batch Model Creation - -```python -#!/usr/bin/env python3 -import subprocess -import pandas as pd - -# Create multiple tissue-specific models -tissues = ['liver', 'muscle', 'brain', 'heart'] -base_data = pd.read_csv('base_model.csv') - -for tissue in tissues: - # Modify base data for tissue specificity - tissue_data = customize_for_tissue(base_data, tissue) - tissue_data.to_csv(f'{tissue}_model.csv', index=False) - - # Convert to SBML - subprocess.run([ - 'exportMetabolicModel', - '--input', f'{tissue}_model.csv', - '--format', 'sbml', - '--output', f'{tissue}_model.xml', - '--out_log', f'{tissue}_conversion.log', - '--tool_dir', '/opt/COBRAxy/src' - ]) -``` - -### Model Merging - -Combine multiple tabular files into comprehensive models: - -```bash -# Merge core metabolism with tissue-specific pathways -cat core_reactions.csv > combined_model.csv -tail -n +2 tissue_reactions.csv >> combined_model.csv -tail -n +2 disease_reactions.csv >> combined_model.csv - -# Create merged model -exportMetabolicModel --input combined_model.csv \ - --format sbml \ - --output comprehensive_model.xml \ - --tool_dir /opt/COBRAxy/src -``` - -### Model Versioning - -Track model versions and changes: - -```bash -# Version control for model development -git add model_v1.csv -git commit -m "Initial model version" - -# Create versioned models -exportMetabolicModel --input model_v1.csv --format sbml \ - --output model_v1.xml --tool_dir /opt/COBRAxy/src -exportMetabolicModel --input model_v2.csv --format sbml \ - --output model_v2.xml --tool_dir /opt/COBRAxy/src - -# Compare model versions -cobra_compare_models model_v1.xml model_v2.xml -``` +| Error | Solution | +|-------|----------| +| "Formula parsing failed" | Check reaction formula syntax | +| "Model infeasible" | Review bounds and exchange reactions | ## See Also -- [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data from existing models -- [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom models -- [Flux Simulation](flux-simulation.md) - Analyze custom models with flux sampling -- [Model Creation Tutorial](/tutorials/custom-model-creation.md) -- [COBRA Model Standards](/tutorials/cobra-model-standards.md) \ No newline at end of file +- [Import Metabolic Model](reference/import-metabolic-model) +- [RAS to Bounds](tools/ras-to-bounds) +- [Flux Simulation](tools/flux-simulation)
--- a/COBRAxy/docs/tools/flux-simulation.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/flux-simulation.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,406 +1,184 @@ # Flux Simulation -Sample metabolic fluxes using constraint-based modeling with CBS or OPTGP algorithms. +Simulate flux distributions from constraint-based metabolic models using different optimization or sampling strategies. ## Overview -Flux Simulation performs constraint-based sampling of metabolic flux distributions from constrained models. It supports two sampling algorithms (CBS and OPTGP) and provides comprehensive flux statistics including mean, median, quantiles, pFBA, FVA, and sensitivity analysis. +Two types of analysis are available: +- **flux optimization** +- **flux sampling** + +For flux optimization, one of the following methods can be performed: parsimonious-FBA, Flux Variability Analysis, Biomass sensitivity analysis (single reaction knock-out) +The objective function, a linear combination of fluxes weighted by specific coefficients, depends on the provided metabolic network. + +For flux sampling, one of the following methods can be performed: CBS (Corner-based sampling), OPTGP (Improved Artificial Centering Hit-and-Run sampler). -## Usage +## Galaxy Interface + +In Galaxy: **COBRAxy → Flux Simulation** -### Command Line +1. Select model and upload bounds files +2. Choose algorithm (CBS/OPTGP) and sampling parameters +3. Click **Run tool** + +## Command-line console ```bash -flux_simulation -td /path/to/COBRAxy \ - -ms ENGRO2 \ - -in bounds1.tsv,bounds2.tsv \ - -ni Sample1,Sample2 \ +flux_simulation -ms ENGRO2 \ + -in bounds/*.tsv \ + -ni Sample1,Sample2,Sample3 \ -a CBS \ -ns 1000 \ - -nb 1 \ - -sd 42 \ - -ot mean,median,quantiles \ - -ota pFBA,FVA,sensitivity \ - -idop flux_results/ + -idop output/ ``` -### Galaxy Interface - -Select "Flux Simulation" from the COBRAxy tool suite and configure sampling parameters through the web interface. - ## Parameters -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | -| Input Bounds | `-in, --input` | Comma-separated list of bounds files | -| Sample Names | `-ni, --names` | Comma-separated sample names | -| Algorithm | `-a, --algorithm` | Sampling algorithm (CBS or OPTGP) | -| Number of Samples | `-ns, --n_samples` | Samples per batch | -| Number of Batches | `-nb, --n_batches` | Number of sampling batches | -| Random Seed | `-sd, --seed` | Random seed for reproducibility | -| Output Types | `-ot, --output_type` | Flux statistics to compute | - -### Model Parameters - | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| -| Model Selector | `-ms, --model_selector` | Built-in model (ENGRO2, Custom) | ENGRO2 | -| Custom Model | `-mo, --model` | Path to custom SBML model | - | -| Model Name | `-mn, --model_name` | Custom model filename | - | - -### Sampling Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Algorithm | `-a, --algorithm` | CBS or OPTGP | - | -| Thinning | `-th, --thinning` | OPTGP thinning parameter | 100 | -| Samples | `-ns, --n_samples` | Samples per batch | - | -| Batches | `-nb, --n_batches` | Number of batches | - | -| Seed | `-sd, --seed` | Random seed | - | - -### Output Parameters - -| Parameter | Flag | Description | Options | -|-----------|------|-------------|---------| -| Output Types | `-ot, --output_type` | Flux statistics | mean,median,quantiles,fluxes | -| Analysis Types | `-ota, --output_type_analysis` | Additional analyses | pFBA,FVA,sensitivity | -| Output Path | `-idop, --output_path` | Results directory | flux_simulation/ | -| Output Log | `-ol, --out_log` | Log file path | - | +| Model Selector | `-ms` | ENGRO2, Recon, or Custom | ENGRO2 | +| Input Format | `--model_and_bounds` | Separate files (true) or complete models (false) | true | +| Input Bounds | `-in` | Bounds files | - | +| Name Input | `-ni` | Sample names (comma-separated) | - | +| Algorithm | `-a` | CBS or OPTGP | CBS | +| Num Samples | `-ns` | Number of samples per batch | 1000 | +| Num Batches | `-nb` | Number of batches | 1 | +| Thinning | `-th` | OPTGP thinning parameter | 100 | +| Output Type | `-ot` | mean, median, quantiles, fluxes | mean,median | +| FVA Optimality | `--perc_opt` | Optimality fraction (0.0-1.0) | 0.90 | +| Output Path | `-idop` | Output directory | flux_simulation/ | ## Algorithms -### CBS (Constraint-Based Sampling) - -**Method**: Random objective function optimization -- Generates random linear combinations of reactions -- Optimizes using LP solver (GLPK preferred, COBRApy fallback) -- Fast and memory-efficient +### CBS (Corner-Based Sampling) +- Random objective optimization +- Requires GLPK (recommended) or COBRApy solver - Suitable for large models -**Advantages**: -- High performance with GLPK -- Good coverage of solution space -- Robust to model size +### OPTGP (MCMC Sampling) +- Markov Chain Monte Carlo +- Uniform sampling guarantee +- Requires thinning parameter -### OPTGP (Optimal Growth Perturbation) +## Input Modes + +The tool supports two different input formats: -**Method**: MCMC-based sampling -- Markov Chain Monte Carlo with growth optimization -- Requires thinning to reduce autocorrelation -- More computationally intensive -- Better theoretical guarantees +### Mode 1: Model + Bounds (default, `--model_and_bounds true`) +Upload one base model + multiple bound files (one per sample/context): +- Base model: Tabular file with reaction structure (from Import Metabolic Model) +- Bounds: Individual TSV files with sample-specific constraints (from RAS to Bounds) +- Use when you have RAS-derived bounds for multiple samples -**Advantages**: -- Uniform sampling guarantee -- Well-established method -- Good for smaller models +### Mode 2: Multiple Complete Models (`--model_and_bounds false`) +Upload pre-built model files, each already containing integrated bounds: +- Each file is a complete tabular model with reaction structure + bounds +- Use when models are already prepared with specific constraints +- Useful for comparing different modelling scenarios -## Input Formats +## Input Format -### Bounds Files - -Tab-separated format with reaction bounds: +Bounds files (TSV): ``` -Reaction lower_bound upper_bound -R00001 -1000.0 1250.5 -R00002 -650.2 1000.0 -R00003 0.0 2150.8 +reaction lower_bound upper_bound +R00001 -125.0 125.0 +R00002 -65.0 65.0 ``` -Multiple bounds files can be processed simultaneously by providing comma-separated paths. - -### Custom Model File (Optional) +**File Format Notes:** +- Use **tab-separated** values (TSV) +- Column headers must be: reaction, lower_bound, upper_bound +- Reaction IDs must match model reaction IDs +- Numeric values for bounds -SBML format metabolic model compatible with COBRApy. +## Sampling Outputs -## Output Formats - -### Flux Statistics +The tool can generate different types of output from flux sampling: -#### Mean Fluxes (`mean.csv`) -``` -Reaction Sample1 Sample2 Sample3 -R00001 15.23 -8.45 22.1 -R00002 0.0 12.67 -5.3 -R00003 45.8 38.2 51.7 -``` +| Output Type | Description | +|-------------|-------------| +| **mean** | Mean flux across all samples | +| **median** | Median flux across all samples | +| **quantiles** | 25th, 50th, 75th percentiles | +| **fluxes** | Complete flux distributions (all samples, all reactions) | -#### Median Fluxes (`median.csv`) -``` -Reaction Sample1 Sample2 Sample3 -R00001 14.1 -7.8 21.5 -R00002 0.0 11.9 -4.8 -R00003 44.2 37.1 50.3 -``` +**Note**: The `fluxes` output can be very large for many samples. Use summary statistics (mean/median/quantiles) unless you need the complete distribution. + +## Optimization Methods + +In alternative to sampling, the tool can perform optimization analyses: -#### Quantiles (`quantiles.csv`) -``` -Reaction Sample1_q1 Sample1_q2 Sample1_q3 Sample2_q1 ... -R00001 10.5 14.1 18.7 -12.3 ... -R00002 -2.1 0.0 1.8 8.9 ... -R00003 38.9 44.2 49.8 32.1 ... -``` +| Method | Description | Output | +|--------|-------------|--------| +| **FVA** | Flux Variability Analysis | Min/max flux ranges for each reaction | +| **pFBA** | Parsimonious FBA | Flux distribution with minimal total flux | +| **sensitivity** | Reaction knockout analysis | Biomass impact of single reaction deletions | -### Additional Analyses +### FVA Optimality Fraction -#### pFBA (`pFBA.csv`) -Parsimonious Flux Balance Analysis results: -``` -Reaction Sample1 Sample2 Sample3 -R00001 12.5 -6.7 19.3 -R00002 0.0 8.9 -3.2 -R00003 41.2 35.8 47.9 -``` +The `--perc_opt` parameter (default: 0.90) controls the optimality constraint for FVA: +- **1.0**: Only optimal solutions (100% of maximum biomass) +- **0.90**: Allow suboptimal solutions (≥90% of maximum biomass) +- **Lower values**: Explore broader flux ranges + +## Output -#### FVA (`FVA.csv`) -Flux Variability Analysis bounds: -``` -Reaction Sample1_min Sample1_max Sample2_min Sample2_max ... -R00001 -5.2 35.8 -25.3 8.7 ... -R00002 -8.9 8.9 0.0 28.4 ... -R00003 15.6 78.3 10.2 65.9 ... -``` - -#### Sensitivity (`sensitivity.csv`) -Single reaction deletion effects: -``` -Reaction Sample1 Sample2 Sample3 -R00001 0.98 0.95 0.97 -R00002 1.0 0.87 1.0 -R00003 0.23 0.19 0.31 -``` +- `mean.csv`: Mean flux values +- `median.csv`: Median flux values +- `quantiles.csv`: Flux quantiles (25%, 50%, 75%) +- `fluxes/`: Complete flux distributions (if requested) +- `fva.csv`: FVA results (if requested) +- `pfba.csv`: pFBA results (if requested) +- `sensitivity.csv`: Knockout sensitivity analysis (if requested) +- `*.log`: Processing log ## Examples ### Basic CBS Sampling ```bash -# Simple CBS sampling with statistics -flux_simulation -td /opt/COBRAxy \ - -ms ENGRO2 \ - -in sample1_bounds.tsv,sample2_bounds.tsv \ +flux_simulation -ms ENGRO2 \ + -in bounds/*.tsv \ -ni Sample1,Sample2 \ -a CBS \ - -ns 500 \ - -nb 2 \ - -sd 42 \ - -ot mean,median \ - -ota pFBA \ - -idop cbs_results/ -``` - -### Comprehensive OPTGP Analysis - -```bash -# Full analysis with OPTGP -flux_simulation -td /opt/COBRAxy \ - -ms ENGRO2 \ - -in bounds/*.tsv \ - -ni Sample1,Sample2,Sample3,Control1,Control2 \ - -a OPTGP \ - -th 200 \ -ns 1000 \ - -nb 1 \ - -sd 123 \ - -ot mean,median,quantiles,fluxes \ - -ota pFBA,FVA,sensitivity \ - -idop comprehensive_analysis/ \ - -ol sampling.log -``` - -### Custom Model Sampling - -```bash -# Use custom model with CBS -flux_simulation -td /opt/COBRAxy \ - -ms Custom \ - -mo models/tissue_specific.xml \ - -mn tissue_specific.xml \ - -in patient_bounds.tsv \ - -ni PatientA \ - -a CBS \ - -ns 2000 \ - -nb 5 \ - -sd 456 \ - -ot mean,quantiles \ - -ota FVA,sensitivity \ - -idop patient_analysis/ -``` - -### Batch Processing Multiple Conditions - -```bash -# Process multiple experimental conditions -flux_simulation -td /opt/COBRAxy \ - -ms ENGRO2 \ - -in ctrl1.tsv,ctrl2.tsv,treat1.tsv,treat2.tsv \ - -ni Control1,Control2,Treatment1,Treatment2 \ - -a CBS \ - -ns 800 \ - -nb 3 \ - -sd 789 \ - -ot mean,median,fluxes \ - -ota pFBA,FVA \ - -idop batch_conditions/ + -idop output/ ``` -## Algorithm Selection Guide - -### Choose CBS When: -- Large models (>1000 reactions) -- High sample throughput required -- GLPK solver available -- Memory constraints present - -### Choose OPTGP When: -- Theoretical sampling guarantees needed -- Smaller models (<500 reactions) -- Sufficient computational resources -- Publication-quality sampling required - -## Performance Optimization - -### CBS Optimization -- Install GLPK and swiglpk for maximum performance -- Increase batch number rather than samples per batch -- Monitor memory usage for large models - -### OPTGP Optimization -- Adjust thinning based on model size (100-500) -- Use parallel processing when available -- Consider warmup period for chain convergence - -### General Tips -- Use appropriate sample sizes (500-2000 per condition) -- Balance batches vs samples for memory management -- Set consistent random seeds for reproducibility - -## Quality Control - -### Convergence Assessment -- Compare statistics across batches -- Check for systematic trends in sampling -- Validate against known flux ranges - -### Statistical Validation -- Ensure adequate sample sizes (n≥100 recommended) -- Check for outliers and artifacts -- Validate against experimental flux data when available - -### Output Verification -- Confirm mass balance constraints satisfied -- Check thermodynamic consistency -- Verify biological plausibility of results - -## Integration Workflow - -### Upstream Tools -- [RAS to Bounds](ras-to-bounds.md) - Generate constrained bounds from RAS -- [Import Metabolic Model](import-metabolic-model.md) - Extract model components - -### Downstream Tools -- [Flux to Map](flux-to-map.md) - Visualize flux distributions on metabolic maps -- [MAREA](marea.md) - Statistical analysis of flux differences - -### Typical Pipeline +### OPTGP Sampling ```bash -# 1. Generate sample-specific bounds -ras_to_bounds -td /opt/COBRAxy -ms ENGRO2 -ir ras.tsv -idop bounds/ +flux_simulation -ms ENGRO2 \ + -in bounds/*.tsv \ + -ni Sample1,Sample2 \ + -a OPTGP \ + -ns 1000 \ + -th 200 \ + -idop output/ +``` -# 2. Sample fluxes from constrained models -flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \ - -ni Sample1,Sample2,Sample3 -a CBS -ns 1000 \ - -ot mean,quantiles -ota pFBA,FVA -idop fluxes/ +### Custom Model with CBS Sampling -# 3. Visualize results on metabolic maps -flux_to_map -td /opt/COBRAxy -input_data_fluxes fluxes/mean.csv \ - -choice_map ENGRO2 -idop flux_maps/ +```bash +flux_simulation -ms Custom \ + -mo custom_model.xml \ + -in bounds/*.tsv \ + -ni Sample1 \ + -a CBS \ + -ns 2000 \ + -idop output/ ``` ## Troubleshooting -### Common Issues - -**CBS sampling fails** -- GLPK installation issues → Install GLPK and swiglpk -- Model infeasibility → Check bounds constraints -- Memory errors → Reduce samples per batch - -**OPTGP convergence problems** -- Poor mixing → Increase thinning parameter -- Slow convergence → Extend sampling time -- Chain stuck → Check model feasibility - -**Output files missing** -- Insufficient disk space → Check available storage -- Permission errors → Verify write permissions -- Invalid sample names → Check naming conventions - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "GLPK solver failed" | Missing GLPK/swiglpk | Install GLPK libraries | -| "Model infeasible" | Over-constrained bounds | Relax constraints or check model | -| "Sampling timeout" | Insufficient time/resources | Reduce sample size or increase resources | - -### Performance Issues - -**Slow sampling** -- Use CBS instead of OPTGP for speed -- Reduce model size if possible -- Increase system resources - -**Memory errors** -- Lower samples per batch -- Process samples sequentially -- Use more efficient data formats - -**Disk space issues** -- Monitor output file sizes -- Clean intermediate files -- Use compressed formats when possible - -## Advanced Usage - -### Custom Sampling Parameters - -For fine-tuning sampling behavior, advanced users can modify: -- Objective function generation (CBS) -- MCMC parameters (OPTGP) -- Convergence criteria -- Output precision and format - -### Parallel Processing - -```bash -# Split sampling across multiple cores/nodes -for i in {1..4}; do - flux_simulation -td /opt/COBRAxy -ms ENGRO2 \ - -in subset_${i}_bounds.tsv \ - -ni Batch${i} -a CBS -ns 250 \ - -sd $((42 + i)) -idop batch_${i}/ & -done -wait -``` - -### Result Aggregation - -Combine results from multiple simulation runs: - -```bash -# Merge statistics files -python merge_flux_results.py -i batch_*/mean.csv -o combined_mean.csv -``` +| Error | Solution | +|-------|----------| +| "GLPK solver failed" | Install GLPK libraries | +| "Model infeasible" | Check bounds constraints | ## See Also -- [RAS to Bounds](ras-to-bounds.md) - Generate input constraints -- [Flux to Map](flux-to-map.md) - Visualize flux results -- [CBS Algorithm Documentation](/tutorials/cbs-algorithm.md) -- [OPTGP Algorithm Documentation](/tutorials/optgp-algorithm.md) \ No newline at end of file +- [RAS to Bounds](tools/ras-to-bounds) +- [Flux to Map](tools/flux-to-map) +- [Built-in Models](reference/built-in-models)
--- a/COBRAxy/docs/tools/flux-to-map.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/flux-to-map.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,467 +1,125 @@ # Flux to Map -Visualize metabolic flux data on pathway maps with statistical analysis and color coding. +Visualize flux distributions on metabolic pathway maps. ## Overview -Flux to Map performs statistical analysis on flux distribution data and generates color-coded metabolic pathway maps. It compares flux values between sample groups and highlights significantly different reactions with appropriate colors and line weights. +This tool analyzes and visualizes statistical differences in reaction fluxes of groups of samples, returned by the Flux Simulation tool. The results can be visualized on s SVG metabolic map. + +## Galaxy Interface + +In Galaxy: **COBRAxy → Metabolic Flux Enrichment Analysis** -## Usage +1. Upload flux data and sample class file +2. Select the map and configure the comparison type +3. Click **Run tool** -### Command Line +## Command-line console ```bash -flux_to_map -td /path/to/COBRAxy \ - -input_data_fluxes flux_data.tsv \ - -input_class_fluxes sample_groups.tsv \ - -comparison manyvsmany \ - -test ks \ - -pv 0.05 \ - -fc 1.5 \ +flux_to_map -input_data fluxes.csv \ + -input_class classes.csv \ -choice_map ENGRO2 \ - -generate_svg true \ - -generate_pdf true \ - -idop flux_maps/ + -comparison manyvsmany \ + -pvalue 0.05 \ + -idop output/ ``` -### Galaxy Interface - -Select "Flux to Map" from the COBRAxy tool suite and configure flux analysis and visualization parameters. - ## Parameters -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | - -### Data Input Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Flux Data | `-idf, --input_data_fluxes` | Flux values TSV file | - | -| Flux Classes | `-icf, --input_class_fluxes` | Sample group labels for fluxes | - | -| Multiple Flux Files | `-idsf, --input_datas_fluxes` | Multiple flux datasets (space-separated) | - | -| Flux Names | `-naf, --names_fluxes` | Names for multiple flux datasets | - | -| Analysis Option | `-op, --option` | Analysis mode (datasets or dataset_class) | - | - -### Statistical Parameters - | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| -| Comparison Type | `-co, --comparison` | Statistical comparison mode | manyvsmany | -| Statistical Test | `-te, --test` | Statistical test method | ks | -| P-Value Threshold | `-pv, --pValue` | Significance threshold | 0.1 | -| Adjusted P-values | `-adj, --adjusted` | Apply FDR correction | false | -| Fold Change | `-fc, --fChange` | Minimum fold change threshold | 1.5 | - -### Visualization Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Map Choice | `-mc, --choice_map` | Built-in metabolic map | HMRcore | -| Custom Map | `-cm, --custom_map` | Path to custom SVG map | - | -| Generate SVG | `-gs, --generate_svg` | Create SVG output | true | -| Generate PDF | `-gp, --generate_pdf` | Create PDF output | true | -| Color Map | `-colorm, --color_map` | Color scheme (jet, viridis) | - | -| Output Directory | `-idop, --output_path` | Results directory | result/ | - -### Advanced Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Output Log | `-ol, --out_log` | Log file path | - | -| Control Sample | `-on, --control` | Control group identifier | - | +| Input Data | `-input_data` | Flux data file | - | +| Input Class | `-input_class` | Sample class definitions | - | +| Map Choice | `-choice_map` | ENGRO2, Recon, or Custom | ENGRO2 | +| Custom Map | `-custom_map` | Path to custom SVG map | - | +| Comparison | `-comparison` | manyvsmany, onevsrest, onevsmany | manyvsmany | +| P-value | `-pvalue` | Significance threshold | 0.05 | +| FDR Correction | `-fdr` | Apply FDR correction | true | +| Test Type | `-test_type` | t, wilcoxon, ks | t | +| Color Map | `--color_map` | Color scheme: viridis or jet | viridis | +| Output Path | `-idop` | Output directory | flux_to_map/ | ## Input Formats -### Flux Data File - -Tab-separated format with reactions as rows and samples as columns: +### Flux Data ``` -Reaction Sample1 Sample2 Sample3 Control1 Control2 -R00001 15.23 -8.45 22.1 12.8 14.2 -R00002 0.0 12.67 -5.3 8.9 7.4 -R00003 45.8 38.2 51.7 42.1 39.8 -R00004 -12.4 -15.8 -9.2 -11.5 -13.1 +Reaction Sample1 Sample2 Sample3 +R00001 12.5 8.5 14.2 +R00002 -6.5 13.5 7.2 ``` -### Sample Class File - -Group assignment for statistical comparisons: +### Sample Classes ``` -Sample Class -Sample1 Treatment -Sample2 Treatment +SampleID Class +Sample1 Control +Sample2 Treatment Sample3 Treatment -Control1 Control -Control2 Control -``` - -### Multiple Dataset Format - -When using multiple flux files, provide space-separated paths and corresponding names: - -```bash --idsf "dataset1_flux.tsv dataset2_flux.tsv dataset3_flux.tsv" --naf "Condition_A Condition_B Condition_C" ``` -## Statistical Analysis - -### Comparison Types - -#### manyvsmany -Compare all possible group pairs: -- Treatment vs Control -- Condition_A vs Condition_B -- Condition_A vs Condition_C -- Condition_B vs Condition_C - -#### onevsrest -Compare each group against all others combined: -- Treatment vs (Control + Other) -- Control vs (Treatment + Other) - -#### onevsmany -Compare one reference group against each other group: -- Control vs Treatment -- Control vs Condition_A -- Control vs Condition_B - -### Statistical Tests +**Note on Metabolic Map** +We provide a default SVG map for the ENGRO2 model. If another model is used, we suggest uploading a custom SVG map. -| Test | Description | Best For | -|------|-------------|----------| -| `ks` | Kolmogorov-Smirnov | Non-parametric, distribution-free | -| `ttest_p` | Paired t-test | Related samples, normal distributions | -| `ttest_ind` | Independent t-test | Independent samples, normal distributions | -| `wilcoxon` | Wilcoxon signed-rank | Non-parametric paired comparisons | -| `mw` | Mann-Whitney U | Non-parametric independent comparisons | - -### Significance Assessment +**File Format Notes:** +- Use **tab-separated** values (TSV) or **comma-separated** (CSV) +- First row must contain column headers +- Sample names must match between flux data and class file +- Class names should not contain spaces -Reactions are considered significant when: -1. **P-value** ≤ specified threshold (default: 0.1) -2. **Fold change** ≥ specified threshold (default: 1.5) -3. **FDR correction** (if enabled) maintains significance - -## Map Visualization - -### Built-in Maps - -#### HMRcore (Default) -- **Scope**: Core human metabolic network -- **Reactions**: ~300 essential reactions -- **Coverage**: Central carbon, amino acid, lipid metabolism -- **Use Case**: General overview, publication figures +## Statistical Tests -#### ENGRO2 -- **Scope**: Extended human genome-scale reconstruction -- **Reactions**: ~2,000 reactions -- **Coverage**: Comprehensive metabolic network -- **Use Case**: Detailed analysis, specialized tissues - -#### Custom Maps -User-provided SVG files with reaction elements: -```xml -<rect id="R00001" class="reaction" fill="gray" stroke="black"/> -<path id="R00002" class="reaction" fill="gray" stroke="black"/> -``` +- **t**: Student's t-test (parametric, assumes normality) +- **wilcoxon**: Wilcoxon/Mann-Whitney (non-parametric) +- **ks**: Kolmogorov-Smirnov (distribution-free) -### Color Coding Scheme - -#### Significance Colors -- **Red Gradient**: Significantly upregulated (positive fold change) -- **Blue Gradient**: Significantly downregulated (negative fold change) -- **Gray**: Not statistically significant -- **White**: No data available - -#### Visual Elements -- **Line Width**: Proportional to fold change magnitude -- **Color Intensity**: Proportional to statistical significance (-log10 p-value) -- **Transparency**: Indicates confidence level - -### Color Maps +## Comparison Types -#### Jet (Default) -- High contrast color transitions -- Blue (low) → Green → Yellow → Red (high) -- Good for identifying extreme values - -#### Viridis -- Perceptually uniform color scale -- Colorblind-friendly -- Purple (low) → Blue → Green → Yellow (high) - -## Output Files +- **manyvsmany**: All pairwise class comparisons +- **onevsrest**: Each class vs all others +- **onevsmany**: One reference vs multiple classes -### Statistical Results -- `flux_statistics.tsv`: P-values, fold changes, test statistics for all reactions -- `significant_fluxes.tsv`: Only reactions meeting significance criteria -- `comparison_summary.txt`: Analysis parameters and summary statistics +## Output -### Visualizations -- `flux_map.svg`: Interactive color-coded pathway map -- `flux_map.pdf`: High-resolution PDF (if requested) -- `flux_map.png`: Raster image (if requested) -- `legend.svg`: Color scale and statistical significance legend - -### Analysis Files -- `fold_changes.tsv`: Detailed fold change calculations -- `group_statistics.tsv`: Per-group summary statistics -- `comparison_matrix.tsv`: Pairwise comparison results +- `*_map.svg`: Annotated pathway maps +- `comparison_results.tsv`: Statistical results +- `*.log`: Processing log ## Examples -### Basic Flux Comparison - -```bash -# Compare treatment vs control fluxes -flux_to_map -td /opt/COBRAxy \ - -idf treatment_vs_control_fluxes.tsv \ - -icf sample_groups.tsv \ - -co manyvsmany \ - -te ks \ - -pv 0.05 \ - -fc 2.0 \ - -mc HMRcore \ - -gs true \ - -gp true \ - -idop flux_comparison/ -``` - -### Multiple Condition Analysis +### Basic Comparison ```bash -# Compare multiple experimental conditions -flux_to_map -td /opt/COBRAxy \ - -idsf "cond1_flux.tsv cond2_flux.tsv cond3_flux.tsv" \ - -naf "Control Treatment1 Treatment2" \ - -co onevsrest \ - -te wilcoxon \ - -adj true \ - -pv 0.01 \ - -fc 1.8 \ - -mc ENGRO2 \ - -colorm viridis \ - -idop multi_condition_flux/ -``` - -### Custom Map Visualization - -```bash -# Use tissue-specific custom map -flux_to_map -td /opt/COBRAxy \ - -idf liver_flux_data.tsv \ - -icf liver_conditions.tsv \ - -co manyvsmany \ - -te ttest_ind \ - -pv 0.05 \ - -fc 1.5 \ - -cm maps/liver_specific_map.svg \ - -gs true \ - -gp true \ - -idop liver_flux_analysis/ \ - -ol liver_analysis.log -``` - -### High-Throughput Analysis - -```bash -# Process multiple datasets with stringent criteria -flux_to_map -td /opt/COBRAxy \ - -idsf "exp1.tsv exp2.tsv exp3.tsv exp4.tsv" \ - -naf "Exp1 Exp2 Exp3 Exp4" \ - -co manyvsmany \ - -te ks \ - -adj true \ - -pv 0.001 \ - -fc 3.0 \ - -mc HMRcore \ - -colorm jet \ - -gs true \ - -gp true \ - -idop high_throughput_flux/ +flux_to_map -input_data fluxes.csv \ + -input_class classes.csv \ + -choice_map ENGRO2 \ + -comparison manyvsmany \ + -pvalue 0.05 \ + -idop results/ ``` -## Quality Control - -### Data Validation - -#### Pre-analysis Checks -- Verify flux value distributions (check for outliers) -- Ensure sample names match between data and class files -- Validate reaction coverage across samples -- Check for missing values and their patterns - -#### Statistical Validation -- Assess normality assumptions for parametric tests -- Verify adequate sample sizes per group (n≥3 recommended) -- Check variance homogeneity between groups -- Evaluate multiple testing burden - -### Result Interpretation - -#### Biological Validation -- Compare results with known pathway activities -- Check for pathway coherence (related reactions should cluster) -- Validate against literature or experimental evidence -- Assess metabolic network connectivity - -#### Technical Validation -- Compare results across different statistical tests -- Check sensitivity to parameter changes -- Validate fold change calculations -- Verify map element correspondence - -## Tips and Best Practices - -### Data Preparation -- **Normalization**: Ensure consistent flux units across samples -- **Filtering**: Remove reactions with excessive missing values (>50%) -- **Outlier Detection**: Identify and handle extreme flux values -- **Batch Effects**: Account for technical variation between experiments - -### Statistical Considerations -- Use FDR correction for multiple comparisons (`-adj true`) -- Choose appropriate statistical tests based on data distribution -- Consider effect size (fold change) alongside significance -- Validate results with independent datasets when possible - -### Visualization Optimization -- Select appropriate color maps for your audience -- Use high fold change thresholds (>2.0) for cleaner maps -- Export both SVG (editable) and PDF (publication) formats -- Include comprehensive legends and annotations - -### Performance Tips -- Use HMRcore for faster processing and clearer visualizations -- Reduce dataset size for initial exploratory analysis -- Process large datasets in batches if memory constrained -- Cache intermediate results for parameter optimization - -## Integration Workflow - -### Upstream Tools -- [Flux Simulation](flux-simulation.md) - Generate flux distributions for comparison -- [MAREA](marea.md) - Alternative analysis pathway for RAS/RPS data - -### Downstream Analysis -- Export results to statistical software (R, Python) for advanced analysis -- Integrate with pathway databases (KEGG, Reactome) -- Combine with other omics data for systems-level insights - -### Typical Pipeline +### With Custom Map ```bash -# 1. Generate flux samples from constrained models -flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv \ - -ni Sample1,Sample2,Control1,Control2 -a CBS \ - -ot mean -idop fluxes/ - -# 2. Analyze and visualize flux differences -flux_to_map -td /opt/COBRAxy -idf fluxes/mean.csv \ - -icf sample_groups.tsv -co manyvsmany -te ks \ - -mc HMRcore -gs true -gp true -idop flux_maps/ - -# 3. Further analysis with custom scripts -python analyze_flux_results.py -i flux_maps/ -o final_results/ +flux_to_map -input_data fluxes.csv \ + -input_class classes.csv \ + -choice_map Custom \ + -custom_map pathway.svg \ + -comparison onevsrest \ + -test_type wilcoxon \ + -idop results/ ``` ## Troubleshooting -### Common Issues - -**No significant reactions found** -- Lower p-value threshold (`-pv 0.2`) -- Reduce fold change requirement (`-fc 1.2`) -- Check sample group definitions and sizes -- Verify flux data quality and normalization - -**Map rendering problems** -- Check SVG map file integrity and format -- Verify reaction ID matching between data and map -- Ensure sufficient system memory for large maps -- Validate XML structure of custom maps - -**Statistical test failures** -- Check data distribution assumptions -- Verify sufficient sample sizes per group -- Consider alternative non-parametric tests -- Examine variance patterns between groups - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "Map file not found" | Missing/invalid map path | Check file location and format | -| "No matching reactions" | ID mismatch between data and map | Verify reaction naming consistency | -| "Insufficient data" | Too few samples per group | Increase sample sizes or merge groups | -| "Memory allocation failed" | Large dataset/map combination | Reduce data size or increase system memory | - -### Performance Issues - -**Slow processing** -- Use HMRcore instead of ENGRO2 for faster rendering -- Reduce dataset size for testing -- Process subsets of reactions separately -- Monitor system resource usage - -**Large output files** -- Use compressed formats when possible -- Reduce map resolution for preliminary analysis -- Export only essential output formats -- Clean temporary files regularly - -## Advanced Usage - -### Custom Statistical Functions - -Advanced users can implement custom statistical tests by modifying the analysis functions: - -```python -def custom_test(group1, group2): - # Custom statistical test implementation - statistic, pvalue = your_test_function(group1, group2) - return statistic, pvalue -``` - -### Batch Processing Script - -Process multiple experiments systematically: - -```bash -#!/bin/bash -experiments=("exp1" "exp2" "exp3" "exp4") -for exp in "${experiments[@]}"; do - flux_to_map -td /opt/COBRAxy \ - -idf "data/${exp}_flux.tsv" \ - -icf "data/${exp}_classes.tsv" \ - -co manyvsmany -te ks -pv 0.05 \ - -mc HMRcore -gs true -gp true \ - -idop "results/${exp}/" -done -``` - -### Result Aggregation - -Combine results across multiple analyses: - -```bash -# Merge significant reactions across experiments -python merge_flux_results.py \ - -i results/exp*/significant_fluxes.tsv \ - -o combined_significant_reactions.tsv \ - --method intersection -``` +| Error | Solution | +|-------|----------| +| "No matching reactions" | Verify reaction ID consistency | +| "Insufficient data" | Increase sample sizes | ## See Also -- [Flux Simulation](flux-simulation.md) - Generate input flux distributions -- [MAREA](marea.md) - Alternative pathway analysis approach -- [Custom Map Creation Guide](/tutorials/custom-map-creation.md) -- [Statistical Methods Reference](/tutorials/statistical-methods.md) \ No newline at end of file +- [MAREA](tools/marea) +- [Flux Simulation](tools/flux-simulation) +- [Built-in Models](reference/built-in-models)
--- a/COBRAxy/docs/tools/import-metabolic-model.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/import-metabolic-model.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,387 +1,142 @@ # Import Metabolic Model -Import and extract metabolic model components into tabular format for analysis and integration. +Import and extract metabolic model components into tabular format. ## Overview -Import Metabolic Model (importMetabolicModel) imports metabolic models from various formats (SBML, JSON, MAT, YAML) and extracts key components into comprehensive tabular summaries. This tool processes built-in or custom models, applies medium constraints, handles gene nomenclature conversion, and outputs structured data for downstream analysis. +Import Metabolic Model extracts metabolic models from SBML/JSON/MAT/YAML formats into tabular summary for analysis. + +**Input**: Model file or built-in models +**Output**: Tabular data (CSV/TSV) + +## Galaxy Interface -## Usage +In Galaxy: **COBRAxy → Import Metabolic Model** -### Command Line +1. Select built-in model or upload custom file +2. Set model name and medium configuration +3. Click **Run tool** + +## Command-line console ```bash -importMetabolicModel --model ENGRO2 \ - --name ENGRO2 \ - --medium_selector allOpen \ - --out_tabular model_data.csv \ - --out_log extraction.log \ - --tool_dir /path/to/COBRAxy/src +# Import built-in model +importMetabolicModel \ + --model ENGRO2 \ + --name ENGRO2 \ + --medium_selector allOpen \ + --out_tabular model_data.csv \ + --out_log extraction.log ``` -### Galaxy Interface - -Select "Import Metabolic Model" from the COBRAxy tool suite and configure model extraction parameters. - ## Parameters -### Required Parameters +### Model Selection | Parameter | Flag | Description | |-----------|------|-------------| -| Model Name | `--name` | Model identifier for output files | -| Medium Selector | `--medium_selector` | Medium configuration option | -| Output Tabular | `--out_tabular` | Output file path (CSV or XLSX) | -| Output Log | `--out_log` | Log file for processing information | -| Tool Directory | `--tool_dir` | COBRAxy installation directory | +| Built-in Model | `--model` | ENGRO2 or Recon | +| Custom Model | `--input` | Path to SBML/JSON/MAT/YAML file | -### Model Selection Parameters +**Note**: Use either `--model` OR `--input`. + + +### Required -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Built-in Model | `--model` | Pre-installed model (ENGRO2, Recon, HMRcore) | - | -| Custom Model | `--input` | Path to custom SBML/JSON model file | - | +| Parameter | Flag | Description | +|-----------|------|-------------| +| Model Name | `--name` | Model identifier | +| Medium Selector | `--medium_selector` | Medium configuration (use `allOpen`) | +| Output Tabular | `--out_tabular` | Output file (CSV/XLSX) | +| Output Log | `--out_log` | Log file | -**Note**: Provide either `--model` OR `--input`, not both. - -### Optional Parameters +### Optional | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| | Custom Medium | `--custom_medium` | CSV file with medium constraints | - | - -## Model Selection - -### Built-in Models +| Gene Format | `--gene_format` | Gene ID conversion: Default, ENSG, HGNC_ID, entrez_id | Default | -#### ENGRO2 -- **Species**: Homo sapiens -- **Scope**: Genome-scale reconstruction -- **Reactions**: ~2,000 reactions -- **Metabolites**: ~1,500 metabolites -- **Coverage**: Comprehensive human metabolism +## Built-in Models -#### Recon -- **Species**: Homo sapiens -- **Scope**: Recon3D human reconstruction -- **Reactions**: ~10,000+ reactions -- **Metabolites**: ~5,000+ metabolites -- **Coverage**: Most comprehensive human model +- **ENGRO2**: ~500 reactions (recommended) +- **Recon**: ~10,000 reactions (genome-wide) -#### HMRcore -- **Species**: Homo sapiens -- **Scope**: Core metabolic network -- **Reactions**: ~300 essential reactions -- **Metabolites**: ~200 core metabolites -- **Coverage**: Central carbon and energy metabolism +See [Built-in Models](reference/built-in-models) for details. -### Custom Models +## Supported Formats -Supported formats for custom model import: -- **SBML**: Systems Biology Markup Language (.xml, .sbml) -- **JSON**: COBRApy JSON format (.json) -- **MAT**: MATLAB format (.mat) -- **YML**: YAML format (.yml, .yaml) -- **Compressed**: All formats support .gz, .zip, .bz2 compression +- **Model formats**: SBML (.xml), JSON (.json), MAT (.mat), YAML (.yml) +- **Compression**: .zip, .gz, .bz2 (e.g., `model.xml.gz`) -## Medium Configuration - -### allOpen (Default) -- All exchange reactions unconstrained -- Maximum metabolic flexibility -- Suitable for general analysis - -### Custom Medium -Users can specify custom medium constraints by providing a CSV file with exchange reaction bounds. +Compressed files are automatically detected and extracted. ## Output Format -### Tabular Summary File - -The output contains comprehensive model information in CSV or XLSX format: +**ENGRO2 model:** +``` +ReactionID Formula GPR lower_bound upper_bound ObjectiveCoefficient Pathway_1 Pathway_2 InMedium TranslationIssues +R00001 A + B -> C + D GENE1 or GENE2 -1000.0 1000.0 0.0 Glycolysis Central_Metabolism FALSE +EX_glc_e glc_e <-> - -1000.0 1000.0 0.0 Exchange Transport TRUE +``` -#### Column Structure +**Other models (Recon):** ``` -Reaction_ID GPR_Rule Reaction_Formula Lower_Bound Upper_Bound Objective_Coefficient Medium_Member Compartment Subsystem -R00001 GENE1 or GENE2 A + B -> C + D -1000.0 1000.0 0.0 FALSE cytosol Glycolysis -R00002 GENE3 and GENE4 E <-> F -1000.0 1000.0 0.0 FALSE mitochondria TCA_Cycle -EX_glc_e - glc_e <-> -1000.0 1000.0 0.0 TRUE extracellular Exchange +ReactionID Formula GPR lower_bound upper_bound ObjectiveCoefficient InMedium TranslationIssues +R00001 A + B -> C + D GENE1 or GENE2 -1000.0 1000.0 0.0 FALSE +EX_glc_e glc_e <-> - -1000.0 1000.0 0.0 TRUE ``` -#### Data Fields +**File Format Notes:** +- Output can be **tab-separated** (CSV) or Excel (XLSX) +- Contains all model information in tabular format +- Can be edited and re-imported using Export Metabolic Model + +## Understanding Medium Composition -| Field | Description | Values | -|-------|-------------|---------| -| Reaction_ID | Unique reaction identifier | String | -| GPR_Rule | Gene-protein-reaction association | Logical expression | -| Reaction_Formula | Stoichiometric equation | Metabolites with coefficients | -| Lower_Bound | Minimum flux constraint | Numeric (typically -1000) | -| Upper_Bound | Maximum flux constraint | Numeric (typically 1000) | -| Objective_Coefficient | Biomass/objective weight | Numeric (0 or 1) | -| Medium_Member | Exchange reaction flag | TRUE/FALSE | -| Compartment | Subcellular location | String (for ENGRO2 only) | -| Subsystem | Metabolic pathway | String | +Exchange reactions with `InMedium = TRUE` represent nutrients in the medium: +- **Lower bound**: Uptake rate (negative value, e.g., -10 = uptake 10 mmol/gDW/hr) +- **Upper bound**: Secretion rate (positive value) + +Example: +``` +EX_glc_e glc_e <-> - -10.0 1000.0 0.0 TRUE +``` +Glucose uptake: 10 mmol/gDW/hr (lower bound = -10) + +More info: [COBRApy Media Documentation](https://cobrapy.readthedocs.io/en/latest/media.html) ## Examples -### Extract Built-in Model Data +### Extract Built-in Model ```bash -# Extract ENGRO2 model with default settings importMetabolicModel --model ENGRO2 \ --name ENGRO2_extraction \ --medium_selector allOpen \ --out_tabular ENGRO2_data.csv \ - --out_log ENGRO2_log.txt \ - --tool_dir /opt/COBRAxy/src + --out_log ENGRO2_log.txt ``` ### Process Custom Model ```bash -# Extract custom SBML model -importMetabolicModel --input /data/custom_model.xml \ +importMetabolicModel --input custom_model.xml \ --name CustomModel \ --medium_selector allOpen \ - --out_tabular custom_model_data.csv \ - --out_log custom_extraction.log \ - --tool_dir /opt/COBRAxy/src -``` - -### Extract Core Model for Quick Analysis - -```bash -# Extract HMRcore for rapid prototyping -importMetabolicModel --model HMRcore \ - --name CoreModel \ - --medium_selector allOpen \ - --out_tabular core_reactions.csv \ - --out_log core_log.txt \ - --tool_dir /opt/COBRAxy/src -``` - -### Batch Processing Multiple Models - -```bash -#!/bin/bash -models=("ENGRO2" "HMRcore" "Recon") -for model in "${models[@]}"; do - importMetabolicModel --model "$model" \ - --name "${model}_extract" \ - --medium_selector allOpen \ - --out_tabular "${model}_data.csv" \ - --out_log "${model}_log.txt" \ - --tool_dir /opt/COBRAxy/src -done + --out_tabular custom_data.csv \ + --out_log custom_log.txt ``` -## Use Cases - -### Model Comparison -Extract multiple models to compare: -- Reaction coverage across different reconstructions -- Gene-reaction associations -- Pathway representation -- Metabolite compartmentalization - -### Data Integration -Prepare model data for: -- Custom analysis pipelines -- Database integration -- Pathway annotation -- Cross-reference mapping - -### Quality Control -Validate model properties: -- Check reaction balancing -- Verify gene associations -- Assess network connectivity -- Identify missing annotations - -### Custom Analysis -Export structured data for: -- Network analysis (graph theory) -- Machine learning applications -- Statistical modeling -- Comparative genomics - -## Integration Workflow - -### Downstream Tools - -The extracted tabular data serves as input for: - -#### COBRAxy Tools -- [RAS Generator](ras-generator.md) - Use extracted GPR rules -- [RPS Generator](rps-generator.md) - Use reaction formulas -- [RAS to Bounds](ras-to-bounds.md) - Use reaction bounds -- [MAREA](marea.md) - Use reaction annotations - -#### External Analysis -- **R/Bioconductor**: Import CSV for pathway analysis -- **Python/pandas**: Load data for network analysis -- **MATLAB**: Process XLSX for modeling -- **Cytoscape**: Network visualization -- **Databases**: Populate reaction databases - -### Typical Pipeline - -```bash -# 1. Extract model components -importMetabolicModel --model ENGRO2 --name ModelData \ - --out_tabular model_components.csv \ - --tool_dir /opt/COBRAxy/src - -# 2. Use extracted data for RAS analysis -ras_generator -td /opt/COBRAxy/src -rs Custom \ - -rl model_components.csv \ - -in expression_data.tsv -ra ras_scores.tsv - -# 3. Apply constraints and sample fluxes -ras_to_bounds -td /opt/COBRAxy/src -ms Custom -mo model_components.csv \ - -ir ras_scores.tsv -idop constrained_bounds/ - -# 4. Visualize results -marea -td /opt/COBRAxy/src -input_data ras_scores.tsv \ - -choice_map Custom -custom_map custom.svg -idop results/ -``` - -## Quality Control - -### Pre-extraction Validation -- Verify model file integrity and format -- Check SBML compliance for custom models -- Validate gene ID formats and coverage -- Confirm medium constraint specifications - -### Post-extraction Checks -- **Completeness**: Verify all expected reactions extracted -- **Consistency**: Check stoichiometric balance -- **Annotations**: Validate gene-reaction associations -- **Formatting**: Confirm output file structure - -### Data Validation - -#### Reaction Balancing -```bash -# Check for unbalanced reactions -awk -F'\t' 'NR>1 && $3 !~ /\<->\|->/ {print $1, $3}' model_data.csv -``` - -#### Gene Coverage -```bash -# Count reactions with GPR rules -awk -F'\t' 'NR>1 && $2 != "" {count++} END {print count " reactions with GPR"}' model_data.csv -``` - -#### Exchange Reactions -```bash -# List medium components -awk -F'\t' 'NR>1 && $7 == "TRUE" {print $1}' model_data.csv -``` - -## Tips and Best Practices - -### Model Selection -- **ENGRO2**: Balanced coverage for human tissue analysis -- **HMRcore**: Fast processing for algorithm development -- **Recon**: Comprehensive analysis requiring computational resources -- **Custom**: Organism-specific or specialized models - -### Output Format Optimization -- **CSV**: Lightweight, universal compatibility -- Choose based on downstream analysis requirements - -### Performance Considerations -- Large models (Recon) may require substantial memory -- Consider batch processing for multiple extractions - ## Troubleshooting -### Common Issues - -**Model loading fails** -- Check file format and compression -- Verify SBML/JSON/MAT/YAML validity for custom models -- Ensure sufficient system memory - -**Empty output file** -- Model may contain no reactions -- Check model file integrity -- Verify tool directory configuration - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "Model file not found" | Invalid file path | Check file location and permissions | -| "Unsupported format" | Invalid model format | Use SBML, JSON, MAT, or YAML | -| "Memory allocation error" | Insufficient system memory | Use smaller model or increase memory | - -### Performance Issues - -**Slow processing** -- Large models require more time -- Monitor system resource usage - -**Memory errors** -- Reduce model size if possible -- Process in smaller batches -- Increase available system memory - -**Output file corruption** -- Check disk space availability -- Verify file write permissions -- Monitor for system interruptions - -## Advanced Usage - -### Batch Extraction Script - -```python -#!/usr/bin/env python3 -import subprocess -import sys - -models = ['ENGRO2', 'HMRcore', 'Recon'] - -for model in models: - cmd = [ - 'importMetabolicModel', - '--model', model, - '--name', f'{model}_data', - '--medium_selector', 'allOpen', - '--out_tabular', f'{model}.csv', - '--out_log', f'{model}.log', - '--tool_dir', '/opt/COBRAxy/src' - ] - subprocess.run(cmd, check=True) -``` - -### Database Integration - -Export model data to databases: - -```sql --- Load CSV into PostgreSQL -CREATE TABLE model_reactions ( - reaction_id VARCHAR(50), - gpr_rule TEXT, - reaction_formula TEXT, - lower_bound FLOAT, - upper_bound FLOAT, - objective_coefficient FLOAT, - medium_member BOOLEAN, - compartment VARCHAR(50), - subsystem VARCHAR(100) -); - -COPY model_reactions FROM 'model_data.csv' WITH CSV HEADER; -``` +| Error | Solution | +|-------|----------| +| "Model file not found" | Check file path | +| "Unsupported format" | Use SBML, JSON, MAT, or YAML | ## See Also -- [Export Metabolic Model](export-metabolic-model.md) - Export tabular data to model formats -- [RAS Generator](ras-generator.md) - Use extracted GPR rules for RAS computation -- [RPS Generator](rps-generator.md) - Use reaction formulas for RPS analysis -- [Custom Model Tutorial](/tutorials/custom-model-integration.md) \ No newline at end of file +- [Export Metabolic Model](reference/export-metabolic-model) +- [RAS Generator](tools/ras-generator) +- [RPS Generator](tools/rps-generator)
--- a/COBRAxy/docs/tools/marea-cluster.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/marea-cluster.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,512 +1,74 @@ # MAREA Cluster -Perform clustering analysis on metabolic data to identify sample groups and patterns. +Cluster analysis for metabolic data (RAS/RPS scores, flux distributions). ## Overview -MAREA Cluster performs unsupervised clustering analysis on RAS, RPS, or flux data to identify natural groupings among samples. It supports multiple clustering algorithms (K-means, DBSCAN, Hierarchical) with optional data scaling and validation metrics including elbow plots and silhouette analysis. +MAREA Cluster performs unsupervised clustering on metabolic data using K-means, DBSCAN, or hierarchical algorithms. + +## Galaxy Interface + +In Galaxy: **COBRAxy → Cluster Analysis** -## Usage +1. Upload metabolic data file +2. Select clustering algorithm and parameters +3. Click **Run tool** -### Command Line +## Command-line console ```bash -marea_cluster -td /path/to/COBRAxy \ - -in metabolic_data.tsv \ - -cy kmeans \ - -sc true \ - -k1 2 \ - -k2 8 \ - -el true \ - -si true \ - -idop clustering_results/ \ - -ol cluster.log -``` - -### Galaxy Interface - -Select "MAREA Cluster" from the COBRAxy tool suite and configure clustering parameters through the web interface. - -## Parameters - -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | -| Input Data | `-in, --input` | Metabolic data file (TSV format) | - -### Clustering Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Cluster Type | `-cy, --cluster_type` | Clustering algorithm | kmeans | -| Data Scaling | `-sc, --scaling` | Apply data normalization | true | -| Minimum K | `-k1, --k_min` | Minimum number of clusters | 2 | -| Maximum K | `-k2, --k_max` | Maximum number of clusters | 7 | - -### Analysis Options - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Elbow Plot | `-el, --elbow` | Generate elbow plot for K-means | false | -| Silhouette Analysis | `-si, --silhouette` | Generate silhouette plots | false | - -### DBSCAN Specific Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Min Samples | `-ms, --min_samples` | Minimum samples per cluster | - | -| Epsilon | `-ep, --eps` | Maximum distance between samples | - | - -### Output Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Output Path | `-idop, --output_path` | Results directory | clustering/ | -| Output Log | `-ol, --out_log` | Log file path | - | -| Best Cluster | `-bc, --best_cluster` | Best clustering result file | - | - -## Clustering Algorithms - -### K-means -**Method**: Partitional clustering using centroids -- Assumes spherical clusters -- Requires pre-specified number of clusters (k) -- Fast and scalable -- Works well with normalized data - -**Best for**: -- Well-separated, compact clusters -- Large datasets -- When cluster number is approximately known - -### DBSCAN -**Method**: Density-based clustering -- Identifies clusters of varying shapes -- Automatically determines cluster number -- Robust to outliers and noise -- Requires epsilon and min_samples parameters - -**Best for**: -- Irregular cluster shapes -- Datasets with noise/outliers -- Unknown number of clusters - -### Hierarchical -**Method**: Agglomerative clustering with dendrograms -- Creates tree-like cluster hierarchy -- No need to specify cluster number initially -- Deterministic results -- Provides multiple resolution levels - -**Best for**: -- Small to medium datasets -- When cluster hierarchy is important -- Exploratory analysis - -## Input Format - -### Metabolic Data File - -Tab-separated format with samples as rows and reactions/metabolites as columns: - -``` -Sample R00001 R00002 R00003 R00004 ... -Sample1 1.25 0.85 1.42 0.78 ... -Sample2 0.65 1.35 0.72 1.28 ... -Sample3 2.15 2.05 0.45 0.52 ... -Control1 1.05 0.98 1.15 1.08 ... -Control2 0.95 1.12 0.88 0.92 ... -``` - -**Requirements**: -- First column: sample identifiers -- Subsequent columns: feature values (RAS, RPS, fluxes) -- Missing values: use 0 or leave empty -- Numeric data only (excluding sample names) - -## Data Preprocessing - -### Scaling Options - -#### Standard Scaling (Recommended) -- Mean centering and unit variance scaling -- Formula: `(x - mean) / std` -- Ensures equal feature contribution -- Required for distance-based algorithms - -#### No Scaling -- Use original data values -- May be appropriate for already normalized data -- Risk of feature dominance by high-magnitude variables - -### Feature Selection - -Consider preprocessing steps: -- Remove low-variance features -- Apply dimensionality reduction (PCA) -- Select most variable reactions/metabolites -- Handle missing data appropriately - -## Output Files - -### Cluster Assignments - -#### Best Clustering Result (`best_clusters.tsv`) -``` -Sample Cluster Silhouette_Score -Sample1 1 0.73 -Sample2 1 0.68 -Sample3 2 0.81 -Control1 0 0.59 -Control2 0 0.62 -``` - -#### All K Results (`clustering_results_k{n}.tsv`) -Individual files for each tested cluster number. - -### Validation Metrics - -#### Elbow Plot (`elbow_plot.png`) -- X-axis: Number of clusters (k) -- Y-axis: Within-cluster sum of squares (WCSS) -- Identifies optimal k at the "elbow" point - -#### Silhouette Plots (`silhouette_k{n}.png`) -- Individual sample silhouette scores -- Average silhouette width per cluster -- Overall clustering quality assessment - -### Summary Statistics - -#### Clustering Summary (`clustering_summary.txt`) -``` -Algorithm: kmeans -Scaling: true -Optimal K: 3 -Best Silhouette Score: 0.72 -Number of Samples: 20 -Feature Dimensions: 150 -``` - -#### Cluster Characteristics (`cluster_stats.tsv`) -``` -Cluster Size Centroid_R00001 Centroid_R00002 Avg_Silhouette -0 8 0.95 1.12 0.68 -1 7 1.35 0.82 0.74 -2 5 0.65 1.55 0.69 -``` - -## Examples - -### Basic K-means Clustering - -```bash -# Simple K-means with elbow analysis -marea_cluster -td /opt/COBRAxy \ - -in ras_data.tsv \ +marea_cluster -in metabolic_data.tsv \ -cy kmeans \ -sc true \ -k1 2 \ -k2 10 \ - -el true \ - -si true \ - -idop kmeans_results/ \ - -ol kmeans.log -``` - -### DBSCAN Analysis - -```bash -# Density-based clustering with custom parameters -marea_cluster -td /opt/COBRAxy \ - -in flux_samples.tsv \ - -cy dbscan \ - -sc true \ - -ms 5 \ - -ep 0.5 \ - -idop dbscan_results/ \ - -bc best_dbscan_clusters.tsv \ - -ol dbscan.log -``` - -### Hierarchical Clustering - -```bash -# Hierarchical clustering for small dataset -marea_cluster -td /opt/COBRAxy \ - -in rps_scores.tsv \ - -cy hierarchy \ - -sc true \ - -k1 2 \ - -k2 6 \ - -si true \ - -idop hierarchical_results/ \ - -ol hierarchy.log -``` - -### Comprehensive Clustering Analysis - -```bash -# Compare multiple algorithms -algorithms=("kmeans" "dbscan" "hierarchy") -for alg in "${algorithms[@]}"; do - marea_cluster -td /opt/COBRAxy \ - -in metabolomics_data.tsv \ - -cy "$alg" \ - -sc true \ - -k1 2 \ - -k2 8 \ - -el true \ - -si true \ - -idop "${alg}_clustering/" \ - -ol "${alg}_cluster.log" -done + -idop output/ ``` -## Parameter Optimization - -### K-means Optimization - -#### Elbow Method -1. Run K-means for k = 2 to k_max -2. Plot WCSS vs k -3. Identify "elbow" point where improvement diminishes -4. Select k at elbow as optimal - -#### Silhouette Analysis -1. Compute silhouette scores for each k -2. Select k with highest average silhouette score -3. Validate with silhouette plots -4. Ensure clusters are well-separated - -### DBSCAN Parameter Tuning - -#### Epsilon (eps) Selection -- Use k-distance plot to identify knee point -- Start with eps = average distance to k-th nearest neighbor -- Adjust based on cluster quality metrics - -#### Min Samples Selection -- Rule of thumb: min_samples ≥ dimensionality + 1 -- Higher values create denser clusters -- Lower values may increase noise sensitivity - -### Hierarchical Clustering - -#### Linkage Method -- Ward: Minimizes within-cluster variance -- Complete: Maximum distance between clusters -- Average: Mean distance between clusters -- Single: Minimum distance (prone to chaining) - -## Quality Assessment - -### Internal Validation Metrics - -#### Silhouette Score -- Range: [-1, 1] -- >0.7: Strong clustering -- 0.5-0.7: Reasonable clustering -- <0.5: Weak clustering - -#### Calinski-Harabasz Index -- Higher values indicate better clustering -- Ratio of between-cluster to within-cluster variance - -#### Davies-Bouldin Index -- Lower values indicate better clustering -- Average similarity between clusters - -### External Validation - -When ground truth labels available: -- Adjusted Rand Index (ARI) -- Normalized Mutual Information (NMI) -- Homogeneity and Completeness scores +## Parameters -## Biological Interpretation - -### Cluster Characterization - -#### Metabolic Pathway Analysis -- Identify enriched pathways per cluster -- Compare metabolic profiles between clusters -- Relate clusters to biological conditions - -#### Sample Annotation -- Map clusters to experimental conditions -- Identify batch effects or confounders -- Validate with independent datasets - -#### Feature Importance -- Determine reactions/metabolites driving clustering -- Analyze cluster centroids for biological insights -- Connect to known metabolic phenotypes - -## Integration Workflow - -### Upstream Data Sources - -#### COBRAxy Tools -- [RAS Generator](ras-generator.md) - Cluster based on reaction activities -- [RPS Generator](rps-generator.md) - Cluster based on reaction propensities -- [Flux Simulation](flux-simulation.md) - Cluster flux distributions +| Parameter | Flag | Description | Default | +|-----------|------|-------------|---------| +| Input Data | `-in` | Metabolic data TSV file | - | +| Algorithm | `-cy` | kmeans, dbscan, hierarchy | kmeans | +| Scaling | `-sc` | Scale data | false | +| K Min | `-k1` | Minimum clusters (K-means/hierarchy) | 2 | +| K Max | `-k2` | Maximum clusters (K-means/hierarchy) | 10 | +| Epsilon | `-ep` | DBSCAN radius | 0.5 | +| Min Samples | `-ms` | DBSCAN minimum samples | 5 | +| Elbow Plot | `-el` | Generate elbow plot | false | +| Silhouette | `-si` | Compute silhouette scores | false | +| Output Path | `-idop` | Output directory | marea_cluster/ | -#### External Data -- Gene expression matrices -- Metabolomics datasets -- Clinical metadata - -### Downstream Analysis - -#### Supervised Learning -Use cluster labels for: -- Classification model training -- Biomarker discovery -- Outcome prediction +## Input Format -#### Differential Analysis -- Compare clusters with [MAREA](marea.md) -- Identify cluster-specific metabolic signatures -- Pathway enrichment analysis - -### Typical Pipeline - -```bash -# 1. Generate metabolic scores -ras_generator -td /opt/COBRAxy -in expression.tsv -ra ras.tsv - -# 2. Perform clustering analysis -marea_cluster -td /opt/COBRAxy -in ras.tsv -cy kmeans \ - -sc true -k1 2 -k2 8 -el true -si true \ - -idop clusters/ -bc best_clusters.tsv - -# 3. Analyze cluster differences -marea -td /opt/COBRAxy -input_data ras.tsv \ - -input_class best_clusters.tsv -comparison manyvsmany \ - -test ks -choice_map ENGRO2 -idop cluster_analysis/ +``` +Reaction Sample1 Sample2 Sample3 +R00001 1.25 0.85 1.42 +R00002 0.65 1.35 0.72 ``` -## Tips and Best Practices - -### Data Preparation -- **Normalization**: Always scale features for distance-based methods -- **Dimensionality**: Consider PCA for high-dimensional data (>1000 features) -- **Missing Values**: Handle appropriately (imputation or removal) -- **Outliers**: Identify and consider removal for K-means - -### Algorithm Selection -- **K-means**: Start here for most applications -- **DBSCAN**: Use when clusters have irregular shapes or noise present -- **Hierarchical**: Choose for small datasets or when hierarchy matters - -### Parameter Selection -- **Start Simple**: Begin with default parameters -- **Use Validation**: Always employ silhouette analysis -- **Cross-Validate**: Test stability across parameter ranges -- **Biological Validation**: Ensure clusters make biological sense - -### Result Interpretation -- **Multiple Algorithms**: Compare results across methods -- **Stability Assessment**: Check clustering reproducibility -- **Biological Context**: Integrate with known sample characteristics -- **Statistical Testing**: Validate cluster differences formally - -## Troubleshooting +**File Format Notes:** +- Use **tab-separated** values (TSV) or **comma-separated** (CSV) +- First row must contain column headers (Reaction, Sample names) +- Numeric values only for metabolic data +- Missing values should be avoided or handled before clustering -### Common Issues - -**Poor clustering quality** -- Check data scaling and normalization -- Assess feature selection and dimensionality -- Try different algorithms or parameters -- Evaluate data structure with PCA/t-SNE - -**Algorithm doesn't converge** -- Increase iteration limits for K-means -- Adjust epsilon/min_samples for DBSCAN -- Check for numerical stability issues -- Verify input data format - -**Memory or performance issues** -- Reduce dataset size or dimensionality -- Use sampling for large datasets -- Consider approximate algorithms -- Monitor system resources - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "Convergence failed" | K-means iteration limit | Increase max iterations or check data | -| "No clusters found" | DBSCAN parameters too strict | Reduce eps or min_samples | -| "Memory allocation error" | Dataset too large | Reduce size or increase memory | -| "Invalid silhouette score" | Single cluster found | Adjust parameters or algorithm | +## Algorithms -### Performance Optimization - -**Large Datasets** -- Use mini-batch K-means for speed -- Sample data for parameter optimization -- Employ dimensionality reduction -- Consider distributed computing - -**High-Dimensional Data** -- Apply feature selection -- Use PCA preprocessing -- Consider specialized algorithms -- Validate results carefully - -## Advanced Usage - -### Custom Distance Metrics - -For specialized applications, modify distance calculations: - -```python -# Custom distance function for metabolic data -def metabolic_distance(x, y): - # Implement pathway-aware distance metric - return custom_distance_value -``` - -### Ensemble Clustering +- **K-means**: Fast, requires number of clusters +- **DBSCAN**: Density-based, handles noise and irregular shapes +- **Hierarchical**: Tree-based, good for small datasets -Combine multiple clustering results: - -```bash -# Run multiple algorithms and combine -for method in kmeans dbscan hierarchy; do - marea_cluster -cy $method -in data.tsv -idop ${method}_results/ -done - -# Consensus clustering (requires custom script) -python consensus_clustering.py -i *_results/best_clusters.tsv -o consensus.tsv -``` - -### Interactive Analysis +## Output -Generate interactive plots for exploration: - -```python -import plotly.express as px -import pandas as pd - -# Load clustering results -results = pd.read_csv('best_clusters.tsv', sep='\t') -data = pd.read_csv('metabolic_data.tsv', sep='\t') - -# Interactive scatter plot -fig = px.scatter(data, x='PC1', y='PC2', color=results['Cluster']) -fig.show() -``` +- `clusters.tsv`: Sample assignments +- `silhouette_scores.tsv`: Cluster quality metrics +- `elbow_plot.svg`: Optimal K visualization (K-means) +- `*.log`: Processing log ## See Also -- [MAREA](marea.md) - Statistical analysis of cluster differences -- [RAS Generator](ras-generator.md) - Generate clustering input data -- [Flux Simulation](flux-simulation.md) - Alternative clustering data source -- [Clustering Tutorial](/tutorials/clustering-analysis.md) -- [Validation Methods Reference](/tutorials/cluster-validation.md) \ No newline at end of file +- [MAREA](tools/marea) +- [RAS Generator](tools/ras-generator) +- [Flux Simulation](tools/flux-simulation)
--- a/COBRAxy/docs/tools/marea.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/marea.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,279 +1,151 @@ # MAREA -Metabolic Reaction Enrichment Analysis for statistical comparison and map visualization. +Metabolic Enrichment Analysis and Visualization. ## Overview -MAREA performs statistical enrichment analysis on RAS/RPS data to identify significantly different metabolic reactions between sample groups. It generates enriched pathway maps with color-coded reactions showing statistical significance and fold changes. +MAREA performs statistical comparison of metabolic scores (RAS/RPS) and visualizes results on pathway maps. + +## Galaxy Interface + +In Galaxy: **COBRAxy → Metabolic Reaction Enrichment Analysis** -## Usage +1. Upload RAS/RPS scores and sample class file +2. Select map and configure statistical parameters +3. Click **Run tool** -### Command Line +## Command-line console ```bash -marea -td /path/to/COBRAxy \ - -using_RAS true \ - -input_data ras_data.tsv \ - -input_class class_labels.tsv \ +marea -input_data scores.tsv \ + -input_class classes.csv \ + -choice_map ENGRO2 \ -comparison manyvsmany \ - -test ks \ - -pv 0.05 \ - -fc 1.5 \ - -choice_map ENGRO2 \ - -generate_svg true \ - -idop results/ + -pvalue 0.05 \ + -idop output/ ``` -### Galaxy Interface - -Select "MAREA" from the COBRAxy tool suite and configure analysis parameters through the web interface. - ## Parameters -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | - -### Data Input Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Use RAS | `-using_RAS, --using_RAS` | Include RAS analysis | true | -| RAS Data | `-input_data, --input_data` | RAS scores TSV file | - | -| RAS Classes | `-input_class, --input_class` | Sample group labels | - | -| Multiple RAS | `-input_datas, --input_datas` | Multiple RAS files (space-separated) | - | -| RAS Names | `-names, --names` | Names for multiple datasets | - | -| Use RPS | `-using_RPS, --using_RPS` | Include RPS analysis | false | -| RPS Data | `-input_data_rps, --input_data_rps` | RPS scores TSV file | - | -| RPS Classes | `-input_class_rps, --input_class_rps` | RPS sample groups | - | - -### Statistical Parameters - | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| -| Comparison Type | `-co, --comparison` | Statistical comparison mode | manyvsmany | -| Statistical Test | `-te, --test` | Statistical test method | ks | -| P-Value Threshold | `-pv, --pValue` | Significance threshold | 0.1 | -| Adjusted P-values | `-adj, --adjusted` | Apply FDR correction | false | -| Fold Change | `-fc, --fChange` | Minimum fold change | 1.5 | -| Net Enrichment | `-ne, --net` | Use net enrichment for RPS | false | -| Analysis Option | `-op, --option` | Analysis mode | datasets | - -### Visualization Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Map Choice | `-choice_map, --choice_map` | Built-in metabolic map | - | -| Custom Map | `-custom_map, --custom_map` | Path to custom SVG map | - | -| Generate SVG | `-gs, --generate_svg` | Create SVG output | true | -| Generate PDF | `-gp, --generate_pdf` | Create PDF output | false | -| Generate PNG | `-gpng, --generate_png` | Create PNG output | false | -| Color Map | `-colorm, --color_map` | Color scheme (jet/viridis) | jet | -| Output Directory | `-idop, --output_path` | Results directory | result/ | - -### Advanced Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Output Log | `-ol, --out_log` | Log file path | - | -| Control Sample | `-on, --control` | Control group identifier | - | +| Input Data | `-input_data` | RAS/RPS scores file | - | +| Input Class | `-input_class` | Sample class definitions | - | +| Map Choice | `-choice_map` | ENGRO2, Recon, or Custom | ENGRO2 | +| Custom Map | `-custom_map` | Path to custom SVG map | - | +| Comparison | `-comparison` | manyvsmany, onevsrest, onevsmany | manyvsmany | +| P-value | `-pvalue` | Significance threshold | 0.05 | +| FDR Correction | `-fdr` | Apply FDR correction | true | +| Test Type | `-test_type` | t, wilcoxon, ks, DESeq | t | +| Net RPS | `--net` | Use net contribution for reversible reactions (RPS only) | false | +| Output Path | `-idop` | Output directory | marea/ | ## Input Formats -### RAS/RPS Data File - -Tab-separated format with reactions as rows and samples as columns: +### Metabolic Scores ``` -Reaction Sample1 Sample2 Sample3 Sample4 -R00001 1.25 0.85 1.42 0.78 -R00002 0.65 1.35 0.72 1.28 -R00003 2.15 2.05 0.45 0.52 +Reaction Sample1 Sample2 Sample3 +R00001 1.25 0.85 1.42 +R00002 0.65 1.35 0.72 ``` -### Class Labels File - -Sample grouping information: +### Sample Classes ``` -Sample Class +SampleID Class Sample1 Control Sample2 Treatment -Sample3 Control -Sample4 Treatment +Sample3 Treatment ``` -## Comparison Types - -### manyvsmany -Compare all possible pairs of groups: -- Group A vs Group B -- Group A vs Group C -- Group B vs Group C - -### onevsrest -Compare each group against all others combined: -- Group A vs (Group B + Group C) -- Group B vs (Group A + Group C) - -### onevsmany -Compare one specific group against each other group separately: -- Control vs Treatment1 -- Control vs Treatment2 +**File Format Notes:** +- Use **tab-separated** values (TSV) or **comma-separated** (CSV) +- First row must contain column headers +- Sample names must match between scores and class file +- Class names should not contain spaces ## Statistical Tests -| Test | Description | Use Case | -|------|-------------|----------| -| `ks` | Kolmogorov-Smirnov | Non-parametric, distribution-free | -| `ttest_p` | Paired t-test | Related samples | -| `ttest_ind` | Independent t-test | Unrelated samples | -| `wilcoxon` | Wilcoxon signed-rank | Non-parametric paired | -| `mw` | Mann-Whitney U | Non-parametric independent | -| `DESeq` | DESeq2-style analysis | Count-like data with dispersion | +- **t**: Student's t-test (parametric, assumes normality) +- **wilcoxon**: Wilcoxon/Mann-Whitney (non-parametric) +- **ks**: Kolmogorov-Smirnov (distribution-free) +- **DESeq**: DESeq2-like test (**RAS only**, requires ≥2 replicates per sample) + +## Comparison Types -## Output Files +- **manyvsmany**: All pairwise comparisons +- **onevsrest**: Each class vs all others +- **onevsmany**: One reference vs multiple classes + +## Advanced Options + +### Net RPS Values -### Statistical Results -- `comparison_stats.tsv`: P-values, fold changes, and test statistics -- `enriched_reactions.tsv`: Significantly enriched reactions only -- `comparison_summary.txt`: Analysis summary and parameters +When analyzing RPS data with reversible reactions, the `--net` parameter controls arrow coloring: + +**When `--net false` (default):** +- Each direction of a reversible reaction colored independently +- Forward and backward contributions shown separately -### Visualization -- `pathway_map.svg`: Color-coded metabolic map -- `pathway_map.pdf`: PDF version (if requested) -- `pathway_map.png`: PNG version (if requested) -- `legend.svg`: Color scale and significance indicators +**When `--net true` (RPS only):** +- Arrow tips colored with net contribution of both directions +- Useful for visualizing overall metabolite flow direction + +**Note**: This option only applies to RPS analysis and affects visualization of reversible reactions on metabolic maps. + +## Output + +- `*_map.svg`: Annotated pathway maps +- `comparison_results.tsv`: Statistical results +- `*.log`: Processing log ## Examples -### Basic RAS Analysis +### Basic Analysis ```bash -# Simple two-group comparison -marea -td /opt/COBRAxy \ - -using_RAS true \ - -input_data ras_scores.tsv \ - -input_class sample_groups.tsv \ +marea -input_data ras_scores.tsv \ + -input_class classes.csv \ + -choice_map ENGRO2 \ -comparison manyvsmany \ - -test ks \ - -pv 0.05 \ - -choice_map ENGRO2 \ + -pvalue 0.05 \ -idop results/ ``` -### Combined RAS + RPS Analysis +### Custom Map ```bash -# Multi-modal analysis -marea -td /opt/COBRAxy \ - -using_RAS true \ - -input_data ras_scores.tsv \ - -input_class ras_groups.tsv \ - -using_RPS true \ - -input_data_rps rps_scores.tsv \ - -input_class_rps rps_groups.tsv \ +marea -input_data rps_scores.tsv \ + -input_class classes.csv \ + -choice_map Custom \ + -custom_map pathway.svg \ -comparison onevsrest \ - -test DESeq \ - -adj true \ - -fc 2.0 \ - -choice_map HMRcore \ - -generate_pdf true \ - -idop multimodal_results/ -``` - -### Multiple Dataset Analysis - -```bash -# Compare multiple experiments -marea -td /opt/COBRAxy \ - -using_RAS true \ - -input_datas exp1_ras.tsv exp2_ras.tsv exp3_ras.tsv \ - -names "Experiment1" "Experiment2" "Experiment3" \ - -comparison onevsmany \ - -test wilcoxon \ - -pv 0.01 \ - -custom_map custom_pathway.svg \ - -idop multi_experiment/ + -idop results/ ``` -## Map Visualization - -### Built-in Maps -- **ENGRO2**: Human genome-scale reconstruction -- **HMRcore**: Core human metabolic network -- **Recon**: Recon3D human model - -### Color Coding -- **Red**: Significantly upregulated (high activity) -- **Blue**: Significantly downregulated (low activity) -- **Gray**: Not significant -- **Line Width**: Proportional to fold change magnitude - -### Custom Maps -SVG files with reaction elements having IDs matching your data: -```xml -<rect id="R00001" class="reaction" .../> -<path id="R00002" class="reaction" .../> -``` - -## Quality Control +### Non-parametric Test -### Pre-analysis Checks -- Verify sample names match between data and class files -- Check for missing values and outliers -- Ensure adequate sample sizes per group (n ≥ 3 recommended) - -### Post-analysis Validation -- Review statistical distribution assumptions -- Check multiple testing correction effects -- Validate biological relevance of enriched pathways - -## Tips and Best Practices - -### Statistical Considerations -- Use FDR correction (`-adj true`) for multiple comparisons -- Choose appropriate tests based on data distribution -- Consider effect size alongside significance - -### Visualization Optimization -- Use high fold change thresholds (>2.0) for cleaner maps -- Export both SVG (editable) and PDF (publication-ready) formats -- Adjust color schemes for colorblind accessibility +```bash +marea -input_data scores.tsv \ + -input_class classes.csv \ + -choice_map ENGRO2 \ + -test_type wilcoxon \ + -pvalue 0.01 \ + -fdr true \ + -idop results/ +``` ## Troubleshooting -### Common Issues - -**No significant reactions found** -- Lower p-value threshold (`-pv 0.1`) -- Reduce fold change requirement (`-fc 1.2`) -- Check sample group definitions - -**Map rendering errors** -- Verify SVG map file integrity -- Check reaction ID matching between data and map -- Ensure sufficient system memory for large maps - -**Statistical test failures** -- Verify data normality for parametric tests -- Check for sufficient sample sizes -- Consider alternative test methods - -## Integration - -### Upstream Tools -- [RAS Generator](ras-generator.md) - Generate RAS scores -- [RPS Generator](rps-generator.md) - Generate RPS scores - -### Downstream Analysis -- [Flux to Map](flux-to-map.md) - Additional visualization options -- [MAREA Cluster](marea-cluster.md) - Sample clustering analysis +| Error | Solution | +|-------|----------| +| "No matching reactions" | Verify reaction IDs | +| "Insufficient samples" | Increase sample sizes per group | ## See Also -- [Statistical Tests Documentation](/tutorials/statistical-tests.md) -- [Map Customization Guide](/tutorials/custom-maps.md) -- [Multi-modal Analysis Tutorial](/tutorials/multimodal-analysis.md) \ No newline at end of file +- [RAS Generator](tools/ras-generator) +- [RPS Generator](tools/rps-generator) +- [Flux to Map](tools/flux-to-map) +- [Built-in Models](reference/built-in-models)
--- a/COBRAxy/docs/tools/ras-generator.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/ras-generator.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,294 +1,107 @@ # RAS Generator -Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules. +Compute Reaction Activity Scores (RAS) from gene expression data. ## Overview -The RAS Generator computes metabolic reaction activity by: -1. Mapping gene expression to reactions via GPR rules -2. Applying logical operations (AND/OR) for enzyme complexes -3. Producing activity scores for each reaction in each sample +RAS Generator computes reaction activity scores by evaluating GPR rules with gene expression values. + +## Galaxy Interface + +In Galaxy: **COBRAxy → Expression2RAS** -**Input**: Gene expression data + GPR rules -**Output**: Reaction activity scores (RAS) +1. Select built-in model or upload custom GPR rules +2. Upload gene expression data +3. Click **Run tool** + +## Command-line console + +```bash +ras_generator -rs ENGRO2 \ + -in expression_data.tsv \ + -ra ras_scores.tsv \ + -ol ras_generation.log +``` ## Parameters -### Required Parameters - -| Parameter | Short | Type | Description | -|-----------|--------|------|-------------| -| `--input` | `-in` | file | Gene expression dataset (TSV format) | -| `--ras_output` | `-ra` | file | Output file for RAS values | -| `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) | - -### Optional Parameters - -| Parameter | Short | Type | Default | Description | -|-----------|--------|------|---------|-------------| -| `--tool_dir` | `-td` | string | auto-detected | COBRAxy installation directory (automatically detected after pip install) | -| `--none` | `-n` | boolean | true | Handle missing gene values | -| `--model_upload` | `-rl` | file | - | Custom GPR rules file | -| `--model_upload_name` | `-rn` | string | - | Custom model name | -| `--out_log` | - | file | log.txt | Output log file | - -> **Note**: After installing COBRAxy via pip, the `--tool_dir` parameter is automatically detected and doesn't need to be specified. +| Parameter | Flag | Description | Default | +|-----------|------|-------------|---------| +| Rules Selector | `-rs` | ENGRO2, Recon, or Custom | ENGRO2 | +| Input Data | `-in` | Gene expression TSV file | - | +| Output RAS | `-ra` | Output RAS scores file | - | +| Output Log | `-ol` | Log file | - | +| Custom Rules | `-rl` | Custom GPR rules file | - | +| Gene Names | `-gn` | Gene ID type | HGNC_Symbol | +| Remove Gene | `-rg` | Remove missing genes | true | +| Ignore NaN | `--none` | Handle missing gene expression | true | ## Input Format -### Gene Expression File -```tsv -Gene_ID Sample_1 Sample_2 Sample_3 Sample_4 -HGNC:5 10.5 11.2 15.7 14.3 -HGNC:10 3.2 4.1 8.8 7.9 -HGNC:15 7.9 8.2 4.4 5.1 -HGNC:25 12.1 13.5 18.2 17.8 +Gene expression file (TSV): + +``` +Gene Sample1 Sample2 Sample3 +ALDOA 125.5 98.3 142.7 +ENO1 85.2 110.4 95.8 +PFKM 200.3 185.6 210.1 ``` -**Requirements**: -- First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.) -- Subsequent columns: Expression values (numeric) -- Header row with sample names -- Tab-separated format +**File Format Notes:** +- Use **tab-separated** values (TSV) +- First row must contain column headers (Gene, Sample names) +- Gene names must match the selected gene ID type +- Numeric values only for expression data -### Custom GPR Rules File (Optional) -```tsv -Reaction_ID GPR -R_HEX1 HGNC:4922 -R_PGI HGNC:8906 -R_PFK HGNC:8877 or HGNC:8878 -R_ALDOA HGNC:414 and HGNC:417 -``` +## GPR Rules -## Algorithm Details +- **AND**: All genes required +- **OR**: Any gene sufficient +- Example: `(GENE1 and GENE2) or GENE3` -### GPR Rule Processing +## NaN Handling -**Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules. +The `--none` parameter controls how missing gene expression values are treated in GPR rules: -**Logical Operations**: -- **OR**: `Gene1 or Gene2` → `expr1 + expr2` -- **AND**: `Gene1 and Gene2` → `min(expr1, expr2)` +**When `--none true` (default):** +- `(GENE1 and NaN)` → evaluated as `GENE1` value +- `(GENE1 or NaN)` → evaluated as `GENE1` value +- Missing genes don't block reaction activity calculation -**Missing Gene Handling**: -- `-n true`: Ignore missing genes in the GPR rules. -- `-n false`: Missing genes cause reaction score to be NaN - -### RAS Computation +**When `--none false` (strict mode):** +- `(GENE1 and NaN)` → `NaN` (reaction cannot be evaluated) +- `(GENE1 or NaN)` → `NaN` (reaction cannot be evaluated) +- Any missing gene propagates NaN through the entire GPR expression -**Example**: -``` -GPR: (HGNC:5 and HGNC:10) or HGNC:15 -Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9 -RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9 -``` +**Recommendation**: Use default (`true`) for datasets with incomplete gene coverage. ## Output Format -### RAS Values File -```tsv -Reactions Sample_1 Sample_2 Sample_3 Sample_4 -R_HEX1 8.5 9.2 12.1 11.3 -R_PGI 7.3 8.1 6.4 7.2 -R_PFK 15.2 16.8 20.1 18.9 -R_ALDOA 3.2 4.1 4.4 5.1 +``` +Reaction Sample1 Sample2 Sample3 +R00001 125.5 98.3 142.7 +R00002 85.2 110.4 95.8 ``` -**Format**: -- First column: Reaction identifiers -- Subsequent columns: RAS values for each sample -- Missing values represented as "None" +## Examples -## Usage Examples - -### Command Line +### Basic Usage ```bash -# Basic usage with built-in model (after pip install) -ras_generator \ - -in expression_data.tsv \ - -ra ras_output.tsv \ - -rs ENGRO2 - -# With custom model and strict missing gene handling -ras_generator \ - -in expression_data.tsv \ - -ra ras_output.tsv \ - -rl custom_rules.tsv \ - -rn "CustomModel" \ - -n false - -# Explicitly specify tool directory (only needed if not using pip install) -ras_generator -td /path/to/COBRAxy \ - -in expression_data.tsv \ - -ra ras_output.tsv \ - -rs ENGRO2 +ras_generator -rs ENGRO2 \ + -in expression.tsv \ + -ra ras_scores.tsv ``` -### Galaxy Usage - -1. Upload gene expression file to Galaxy -2. Select **RAS Generator** from COBRAxy tools -3. Configure parameters: - - **Input dataset**: Your expression file - - **Rule selector**: ENGRO2 (or other model) - - **Handle missing genes**: Yes/No -4. Click **Execute** - -## Built-in Models - -### ENGRO2 (Recommended for most analyses) -- **Scope**: Focused human metabolism -- **Reactions**: ~500 -- **Genes**: ~500 -- **Use case**: Core metabolic analysis - -### Recon (Comprehensive analysis) -- **Scope**: Complete human metabolism -- **Reactions**: ~10,000 -- **Genes**: ~2,000 -- **Use case**: Genome-wide metabolic studies - -## Gene ID Mapping - -COBRAxy supports multiple gene identifier formats: - -| Format | Example | Notes | -|--------|---------|--------| -| **HGNC ID** | HGNC:5 | Recommended, most stable | -| **HGNC Symbol** | ALDOA | Human-readable but may change | -| **Ensembl** | ENSG00000149925 | Version-specific | -| **Entrez** | 226 | Numeric identifier | - -**Recommendation**: Use HGNC IDs for best compatibility and stability. - - - ## Troubleshooting -### Common Issues - -**"Gene not found" warnings** -``` -Solution: Check gene ID format matches model expectations -- Verify gene identifiers (HGNC vs symbols vs Ensembl) -- Use gene mapping tools if needed -- Set -n true to handle missing genes -``` - -**"No computable scores" error** -``` -Solution: Insufficient gene overlap between data and model -- Check gene ID format compatibility -- Verify expression file format -- Try different built-in model -``` - -**Empty output file** -``` -Solution: Check input file format and permissions -- Ensure TSV format with proper headers -- Verify file paths are correct -- Check write permissions for output directory -``` - - - -### Debug Mode - -Enable detailed logging: - -```bash -ras_generator -td /path/to/COBRAxy \ - -in expression_data.tsv \ - -ra ras_output.tsv \ - -rs ENGRO2 \ - --out_log detailed_log.txt -``` - -Check log file for detailed error messages and processing statistics. - -## Validation - -### Check Output Quality - -```python -import pandas as pd - -# Read RAS output -ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0) - -# Basic statistics -print(f"RAS matrix shape: {ras_df.shape}") -print(f"Non-null values: {ras_df.count().sum()}") -print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}") - -# Check for problematic reactions -null_reactions = ras_df.isnull().all(axis=1).sum() -print(f"Reactions with no data: {null_reactions}") -``` - +| Error | Solution | +|-------|----------| +| "Gene not found" | Check gene ID format | +| "Invalid GPR" | Verify GPR rule syntax | -## Integration with Other Tools - -### Downstream Analysis - -RAS output can be used with: - -- **[MAREA](marea.md)**: Statistical enrichment analysis -- **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application -- **[MAREA Cluster](marea-cluster.md)**: Sample clustering - -### Preprocessing Options - -Before RAS generation: -- **Normalize** expression data (log2, quantile, etc.) -- **Filter** low-expression genes -- **Batch correct** if multiple datasets - -## Advanced Usage - -### Custom Model Integration - -```python -# Create custom GPR rules -custom_rules = { - 'R_CUSTOM1': 'HGNC:5 and HGNC:10', - 'R_CUSTOM2': 'HGNC:15 or HGNC:20' -} +## See Also -# Save as TSV -import pandas as pd -rules_df = pd.DataFrame(list(custom_rules.items()), - columns=['Reaction_ID', 'GPR']) -rules_df.to_csv('custom_rules.tsv', sep='\t', index=False) - -# Use with RAS generator -args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel'] -``` - -### Batch Processing - -```python -# Process multiple expression files -expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv'] - -for i, exp_file in enumerate(expression_files): - output_file = f'ras_output_{i}.tsv' - - args = [ - '-td', '/path/to/COBRAxy', - '-in', exp_file, - '-ra', output_file, - '-rs', 'ENGRO2' - ] - - ras_generator.main(args) - print(f"Processed {exp_file} → {output_file}") -``` - -## References - -- [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling -- [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification -- [HGNC database](https://www.genenames.org/) - Gene nomenclature standards \ No newline at end of file +- [RAS to Bounds](tools/ras-to-bounds) +- [MAREA](tools/marea) +- [Built-in Models](reference/built-in-models)
--- a/COBRAxy/docs/tools/ras-to-bounds.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/ras-to-bounds.md Tue Oct 28 10:44:07 2025 +0000 @@ -4,330 +4,106 @@ ## Overview -The RAS to Bounds tool integrates RAS values into metabolic model flux bounds, creating sample-specific constrained models for flux sampling. This enables personalized metabolic modeling based on gene expression patterns. +RAS to Bounds integrates RAS values into metabolic model flux bounds, creating sample-specific constrained models. + +## Galaxy Interface + +In Galaxy: **COBRAxy → RAS to Bounds** + +1. Select model and upload RAS scores file +2. Configure medium and constraint options +3. Click **Execute** ## Usage -### Command Line - ```bash -ras_to_bounds -td /path/to/COBRAxy \ - -ms ENGRO2 \ +ras_to_bounds -ms ENGRO2 \ -ir ras_scores.tsv \ -rs true \ -mes allOpen \ -idop constrained_bounds/ ``` -### Galaxy Interface - -Select "RAS to Bounds" from the COBRAxy tool suite and configure model and constraint parameters. - ## Parameters -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | -| Model Selector | `-ms, --model_selector` | Built-in model (ENGRO2, Custom) | -| RAS Selector | `-rs, --ras_selector` | Enable RAS constraint application | - -### Model Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Model Selector | `-ms, --model_selector` | Built-in model choice | ENGRO2 | -| Custom Model | `-mo, --model` | Path to custom SBML model | - | -| Model Name | `-mn, --model_name` | Custom model filename | - | - -### Medium Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| Medium Selector | `-mes, --medium_selector` | Medium configuration | allOpen | -| Custom Medium | `-meo, --medium` | Path to custom medium file | - | - -### Constraint Parameters - -| Parameter | Flag | Description | Default | -|-----------|------|-------------|---------| -| RAS Input | `-ir, --input_ras` | RAS scores TSV file | - | -| RAS Names | `-rn, --name` | Sample names for RAS data | - | -| Cell Class | `-cc, --cell_class` | Output cell class information | - | - -### Output Parameters - | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| -| Output Path | `-idop, --output_path` | Directory for constrained bounds | ras_to_bounds/ | -| Output Log | `-ol, --out_log` | Log file path | - | - -## Input Formats - -### RAS Scores File - -Tab-separated format with reactions as rows and samples as columns: +| Model Selector | `-ms` | ENGRO2, Recon, or Custom | ENGRO2 | +| RAS Input | `-ir` | RAS scores TSV file | - | +| RAS Selector | `-rs` | Enable RAS constraints | false | +| Medium Selector | `-mes` | Medium configuration | allOpen | +| Save Models | `--save_models` | Save complete models with bounds | false | +| Output Path | `-idop` | Output directory | ras_to_bounds/ | +| Custom Model | `-mo` | Path to custom SBML model | - | +| Custom Medium | `-meo` | Custom medium file | - | -``` -Reaction Sample1 Sample2 Sample3 Control1 Control2 -R00001 1.25 0.85 1.42 1.05 0.98 -R00002 0.65 1.35 0.72 1.15 1.08 -R00003 2.15 2.05 0.45 0.95 1.12 -``` - -### Custom Model File (Optional) +## Input Format -SBML format metabolic model: -- XML format (.xml, .sbml) -- Compressed formats supported (.xml.gz, .xml.zip, .xml.bz2) -- Must contain valid reaction, metabolite, and gene definitions - -### Custom Medium File (Optional) - -Exchange reactions defining growth medium: +RAS scores file (TSV): ``` -reaction -EX_glc__D_e -EX_o2_e -EX_pi_e -EX_nh4_e +Reaction Sample1 Sample2 Sample3 +R00001 1.25 0.85 1.42 +R00002 0.65 1.35 0.72 ``` -## Algorithm - -### Constraint Application - -1. **Base Model Loading**: Load specified metabolic model and medium -2. **Bounds Extraction**: Extract original flux bounds for each reaction -3. **RAS Integration**: For each sample and reaction: - ``` - if RAS > 1.0: - new_upper_bound = original_upper_bound * RAS - if RAS < 1.0: - new_lower_bound = original_lower_bound * RAS - ``` -4. **Bounds Output**: Generate sample-specific bounds files - -### Scaling Rules - -- **RAS > 1**: Upregulated reactions → increased flux capacity -- **RAS < 1**: Downregulated reactions → decreased flux capacity -- **RAS = 1**: No change from original bounds -- **Missing RAS**: Original bounds retained +**File Format Notes:** +- Use **tab-separated** values (TSV) +- First row must contain column headers (Reaction, Sample names) +- Reaction IDs must match model reaction IDs +- Numeric values for RAS scores ## Output Format -### Bounds Files - -One TSV file per sample with constrained bounds: +Bounds files for each sample: ``` -# bounds_Sample1.tsv -Reaction lower_bound upper_bound -R00001 -1000 1250.5 -R00002 -650.2 1000 -R00003 -1000 2150.8 +reaction lower_bound upper_bound +R00001 -125.0 125.0 +R00002 -65.0 65.0 ``` -### Directory Structure +## Output Collections + +The tool generates three types of output: -``` -ras_to_bounds/ -├── bounds_Sample1.tsv -├── bounds_Sample2.tsv -├── bounds_Sample3.tsv -├── bounds_Control1.tsv -├── bounds_Control2.tsv -└── constraints_log.txt -``` +1. **Bounds files** (`ras_to_bounds/`): Individual bound files per sample (TSV format) +2. **Cell classes** (`cell_class`): Sample-to-class mapping file +3. **Complete models** (optional, `saved_models/`): Full tabular models with bounds applied + +To save complete models with integrated bounds, set `--save_models true`. This creates ready-to-use model files that can be directly used with Flux Simulation or other downstream tools. ## Examples -### Basic Usage with Built-in Model - -```bash -# Apply RAS constraints to ENGRO2 model -ras_to_bounds -td /opt/COBRAxy \ - -ms ENGRO2 \ - -ir ras_data.tsv \ - -rs true \ - -idop results/bounds/ -``` - -### Custom Model and Medium +### Basic Usage ```bash -# Use custom model with specific medium -ras_to_bounds -td /opt/COBRAxy \ - -ms Custom \ - -mo models/custom_model.xml \ - -mn custom_model.xml \ - -mes custom \ - -meo media/minimal_medium.tsv \ - -ir patient_ras.tsv \ +ras_to_bounds -ms ENGRO2 \ + -ir ras_scores.tsv \ -rs true \ - -idop personalized_models/ \ - -ol constraints.log -``` - -### Multiple Sample Processing - -```bash -# Process cohort data with sample classes -ras_to_bounds -td /opt/COBRAxy \ - -ms ENGRO2 \ - -ir cohort_ras_scores.tsv \ - -rn "Patient1,Patient2,Patient3,Healthy1,Healthy2" \ - -rs true \ - -cc sample_classes.tsv \ - -idop cohort_bounds/ + -idop output/ ``` -## Built-in Models - -### ENGRO2 -- **Species**: Homo sapiens -- **Scope**: Genome-scale reconstruction -- **Reactions**: ~2,000 reactions -- **Metabolites**: ~1,500 metabolites -- **Use Case**: General human metabolism - -### Custom Model Requirements -- Valid SBML format -- Consistent reaction/metabolite naming -- Proper compartment definitions -- Gene-protein-reaction associations - -## Medium Configurations - -### allOpen (Default) -- All exchange reactions unconstrained -- Maximum metabolic flexibility -- Suitable for exploratory analysis - -### Custom Medium -- User-defined nutrient availability -- Tissue-specific conditions -- Disease-specific constraints - -## Quality Control - -### Pre-processing Checks -- Verify RAS data completeness (recommend >80% reaction coverage) -- Check for extreme RAS values (>10 or <0.1 may indicate issues) -- Validate model consistency and solvability - -### Post-processing Validation -- Confirm bounds files generated for all samples -- Check constraint log for warnings -- Test model feasibility with sample bounds - -## Tips and Best Practices - -### RAS Data Preparation -- **Normalization**: Ensure RAS values are properly normalized (median ~1.0) -- **Filtering**: Remove reactions with consistently missing data -- **Validation**: Check RAS distributions across samples - -### Model Selection -- Use ENGRO2 for general human tissue analysis -- Consider custom models for specific organisms or tissues -- Validate model scope matches your biological question - -### Medium Configuration -- Match medium to experimental conditions -- Use minimal medium for growth requirement analysis -- Consider tissue-specific nutrient availability - -## Integration Workflow - -### Upstream Tools -- [RAS Generator](ras-generator.md) - Generate RAS scores from expression data - -### Downstream Tools -- [Flux Simulation](flux-simulation.md) - Sample fluxes using constrained bounds -- [MAREA](marea.md) - Statistical analysis of constraint effects - -### Typical Pipeline +### With Custom Model ```bash -# 1. Generate RAS from expression data -ras_generator -td /opt/COBRAxy -in expression.tsv -ra ras.tsv - -# 2. Apply RAS constraints to model bounds -ras_to_bounds -td /opt/COBRAxy -ms ENGRO2 -ir ras.tsv -rs true -idop bounds/ - -# 3. Sample fluxes with constraints -flux_simulation -td /opt/COBRAxy -ms ENGRO2 -in bounds/*.tsv -a CBS -idop fluxes/ - -# 4. Analyze and visualize results -marea -td /opt/COBRAxy -input_data fluxes/mean.tsv -choice_map ENGRO2 -idop maps/ +ras_to_bounds -ms Custom \ + -mo custom_model.xml \ + -ir ras_scores.tsv \ + -rs true \ + -idop output/ ``` ## Troubleshooting -### Common Issues - -**No bounds files generated** -- Check RAS file format and sample names -- Verify model loading (check model path/format) -- Ensure sufficient disk space for output - -**Model infeasibility after constraints** -- RAS values may be too restrictive -- Consider scaling factor adjustment -- Check medium compatibility with constraints - -**Missing reactions in bounds** -- RAS data may not cover all model reactions -- Original bounds retained for missing reactions -- Consider reaction mapping validation - -### Error Messages - -| Error | Cause | Solution | -|-------|-------|----------| -| "Model not found" | Invalid model path | Check model file location | -| "RAS file invalid" | Malformed TSV format | Verify file structure and encoding | -| "Infeasible solution" | Over-constrained model | Relax RAS scaling or medium constraints | - -### Performance Issues - -**Slow processing** -- Large models may require significant memory -- Consider batch processing for many samples -- Monitor system resource usage - -**Memory errors** -- Reduce model size or split processing -- Increase available system memory -- Use more efficient file formats - -## Advanced Usage - -### Batch Processing Script - -```bash -#!/bin/bash -# Process multiple RAS files -for ras_file in ras_data/*.tsv; do - sample_name=$(basename "$ras_file" .tsv) - ras_to_bounds -td /opt/COBRAxy \ - -ms ENGRO2 \ - -ir "$ras_file" \ - -rs true \ - -idop "bounds_$sample_name/" -done -``` - -### Custom Scaling Functions - -For advanced users, RAS scaling can be customized by modifying the constraint application logic in the source code. +| Error | Solution | +|-------|----------| +| "Model not found" | Check model file path | +| "RAS file invalid" | Verify TSV format | +| "Infeasible solution" | Relax RAS scaling or constraints | ## See Also -- [RAS Generator](ras-generator.md) - Generate input RAS data -- [Flux Simulation](flux-simulation.md) - Use constrained bounds for sampling -- [Import Metabolic Model](import-metabolic-model.md) - Extract model components \ No newline at end of file +- [RAS Generator](tools/ras-generator) +- [Flux Simulation](tools/flux-simulation) +- [Built-in Models](reference/built-in-models)
--- a/COBRAxy/docs/tools/rps-generator.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tools/rps-generator.md Tue Oct 28 10:44:07 2025 +0000 @@ -1,165 +1,91 @@ # RPS Generator -Generate Reaction Propensity Scores (RPS) from metabolite abundance data. +Compute Reaction Presence Scores (RPS) from metabolite abundance data. ## Overview -The RPS Generator computes reaction propensity scores based on metabolite abundance measurements. RPS values indicate how likely metabolic reactions are to be active based on the availability of their substrate and product metabolites. +RPS Generator calculates reaction presence scores based on metabolite availability in reaction formulas. + +## Galaxy Interface + +In Galaxy: **COBRAxy → RPS Generator** + +1. Select built-in model or upload custom reactions +2. Upload metabolite abundance data +3. Click **Execute** ## Usage -### Command Line - ```bash -rps_generator -td /path/to/COBRAxy \ - -id metabolite_abundance.tsv \ - -rp output_rps.tsv \ - -ol log.txt +rps_generator -rs ENGRO2 \ + -in metabolite_data.tsv \ + -rps rps_scores.tsv \ + -ol rps_generation.log ``` -### Galaxy Interface - -Select "RPS Generator" from the COBRAxy tool suite and upload your metabolite abundance file. - ## Parameters -### Required Parameters - -| Parameter | Flag | Description | -|-----------|------|-------------| -| Tool Directory | `-td, --tool_dir` | Path to COBRAxy installation directory | -| Input Dataset | `-id, --input` | Metabolite abundance TSV file (rows=metabolites, cols=samples) | -| RPS Output | `-rp, --rps_output` | Output file path for RPS scores | - -### Optional Parameters - | Parameter | Flag | Description | Default | |-----------|------|-------------|---------| -| Custom Reactions | `-rl, --model_upload` | Path to custom reactions file | Built-in reactions | -| Output Log | `-ol, --out_log` | Log file for warnings/errors | Standard output | +| Rules Selector | `-rs` | ENGRO2, Recon, or Custom | ENGRO2 | +| Input Data | `-in` | Metabolite abundance TSV file | - | +| Output RPS | `-rps` | Output RPS scores file | - | +| Output Log | `-ol` | Log file | - | +| Custom Rules | `-rl` | Custom reaction formulas file | - | ## Input Format -### Metabolite Abundance File - -Tab-separated values (TSV) format: +Metabolite data file (TSV): ``` Metabolite Sample1 Sample2 Sample3 -glucose 100.5 85.2 92.7 -pyruvate 45.3 38.9 41.2 -lactate 15.8 22.1 18.5 +glc_c 2.5 1.8 3.2 +atp_c 5.2 4.9 5.8 +pyr_c 1.5 2.1 1.8 ``` -**Requirements:** -- First column: metabolite names (case-insensitive) -- Subsequent columns: abundance values for each sample -- Missing values: use 0 or leave empty -- File encoding: UTF-8 - -### Custom Reactions File (Optional) - -If using custom reactions instead of built-in ones: - -``` -ReactionID Reaction -R00001 glucose + ATP -> glucose-6-phosphate + ADP -R00002 glucose-6-phosphate <-> fructose-6-phosphate -``` +**File Format Notes:** +- Use **tab-separated** values (TSV) +- First row must contain column headers (Metabolite, Sample names) +- Metabolite names must include compartment suffix (e.g., _c, _m, _e) +- Numeric values only for abundance data ## Output Format -### RPS Scores File - ``` Reaction Sample1 Sample2 Sample3 -R00001 0.85 0.72 0.79 -R00002 0.45 0.38 0.52 -R00003 0.12 0.28 0.21 +R00001 1.25 0.95 1.42 +R00002 0.85 1.15 0.92 ``` -- Values range from 0 (low propensity) to 1 (high propensity) -- NaN values indicate insufficient metabolite data for that reaction - -## Algorithm - -1. **Metabolite Matching**: Input metabolite names are matched against internal synonyms -2. **Abundance Normalization**: Raw abundances are normalized per sample -3. **Reaction Scoring**: For each reaction, RPS is computed based on: - - Substrate availability (geometric mean of substrate abundances) - - Product formation potential - - Stoichiometric coefficients - ## Examples ### Basic Usage ```bash -# Generate RPS from metabolite data -rps_generator -td /opt/COBRAxy \ - -id /data/metabolomics.tsv \ - -rp /results/rps_scores.tsv -``` - -### With Custom Reactions - -```bash -# Use custom reaction set -rps_generator -td /opt/COBRAxy \ - -id /data/metabolomics.tsv \ - -rl /custom/reactions.tsv \ - -rp /results/custom_rps.tsv \ - -ol /logs/rps.log +rps_generator -rs ENGRO2 \ + -in metabolites.tsv \ + -rps rps_scores.tsv ``` -## Tips and Best Practices - -### Data Preparation - -- **Metabolite Names**: Use standard nomenclature (KEGG, ChEBI, or common names) -- **Missing Data**: Remove samples with >50% missing metabolites -- **Outliers**: Consider log-transformation for highly variable metabolites -- **Replicates**: Average technical replicates before analysis +### Custom Reactions -### Quality Control - -- Check log file for unmatched metabolite names -- Verify RPS score distributions (should span 0-1 range) -- Compare results with expected pathway activities - -### Integration with Other Tools - -RPS scores are typically used with: -- [MAREA](marea.md) for pathway enrichment analysis -- [Flux to Map](flux-to-map.md) for metabolic map visualization +```bash +rps_generator -rs Custom \ + -rl custom_reactions.csv \ + -in metabolites.tsv \ + -rps rps_scores.tsv +``` ## Troubleshooting -### Common Issues - -**No RPS scores generated** -- Check metabolite name format and spelling -- Verify input file has correct TSV format -- Ensure tool directory contains reaction databases - -**Many NaN values in output** -- Insufficient metabolite coverage for reactions -- Consider using a smaller, more focused reaction set - -**Memory errors** -- Reduce dataset size or split into batches -- Increase available system memory - -### Error Messages - -| Error | Cause | Solution | -|-------|--------|----------| -| "File not found" | Missing input file | Check file path and permissions | -| "Invalid format" | Malformed TSV | Verify column headers and data types | -| "No metabolites matched" | Name mismatch | Check metabolite nomenclature | +| Error | Solution | +|-------|----------| +| "Metabolite not found" | Check metabolite nomenclature | +| "Invalid formula" | Verify reaction formula syntax | ## See Also -- [RAS Generator](ras-generator.md) - Generate reaction activity scores from gene expression -- [MAREA](marea.md) - Statistical analysis and visualization -- [Flux Simulation](flux-simulation.md) - Constraint-based modeling \ No newline at end of file +- [MAREA](tools/marea) +- [RAS Generator](tools/ras-generator) +- [Built-in Models](reference/built-in-models)
--- a/COBRAxy/docs/troubleshooting.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/troubleshooting.md Tue Oct 28 10:44:07 2025 +0000 @@ -133,10 +133,9 @@ # Use absolute paths from pathlib import Path -tool_dir = str(Path('/path/to/COBRAxy').absolute()) input_file = str(Path('expression.tsv').absolute()) -args = ['-td', tool_dir, '-in', input_file, ...] +args = ['-in', input_file, ...] ``` **Problem**: Permission denied @@ -317,7 +316,7 @@ f.write(test_data) # Test basic functionality -ras_generator.main(['-td', tool_dir, '-in', 'test_input.tsv', +ras_generator.main(['-in', 'test_input.tsv', '-ra', 'test_output.tsv', '-rs', 'ENGRO2']) ``` @@ -362,12 +361,12 @@ Before reporting issues: -- ✅ Checked this troubleshooting guide -- ✅ Verified installation completeness -- ✅ Tested with built-in example data -- ✅ Searched existing GitHub issues -- ✅ Tried alternative models/parameters -- ✅ Checked file formats and permissions +- Checked this troubleshooting guide +- Verified installation completeness +- Tested with built-in example data +- Searched existing GitHub issues +- Tried alternative models/parameters +- Checked file formats and permissions ## Prevention Tips
--- a/COBRAxy/docs/tutorials/README.md Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/docs/tutorials/README.md Tue Oct 28 10:44:07 2025 +0000 @@ -4,23 +4,11 @@ ## Available Tutorials -| Tutorial | Duration | Description | -|----------|----------|-------------| -| [Galaxy Setup](tutorials/galaxy-setup) | 30 min | Set up Galaxy for web-based analysis | - -## Web Interface Tutorial - -### [Galaxy Setup Tutorial](tutorials/galaxy-setup) - -Set up a local Galaxy instance with COBRAxy tools for point-and-click analysis. Perfect for users who prefer graphical interfaces and reproducible workflows. - -## Prerequisites - -Before starting the tutorials, make sure you have: - -- ✅ [COBRAxy installed](installation) -- ✅ Basic understanding of metabolic modeling (helpful but not required) -- ✅ Familiarity with command line basics +| Tutorial | Description | +|----------|-------------| +| [Galaxy Setup](tutorials/galaxy-setup) | Set up Galaxy for web-based analysis | +| | | +| | | ## Tutorial Data @@ -28,22 +16,18 @@ ```bash # Download tutorial data -wget https://github.com/CompBtBs/COBRAxy/releases/download/v1.0/tutorial_data.zip +wget https://github.com/CompBtBs/COBRAxy/blob/main/data_tutorial/data_tutorial.zip unzip tutorial_data.zip ``` -The tutorial data includes: -- Sample gene expression datasets -- Metabolite abundance data -- Pre-configured Galaxy workflows -- Expected output files for verification +The tutorial data includes Sample gene expression datasets (Cancer.txt and Normal.txt) ## Getting Help If you encounter issues during tutorials: 1. Check the specific tutorial's troubleshooting section -2. Refer to the main [Troubleshooting Guide](/troubleshooting.md) +2. Refer to the main [Troubleshooting Guide](troubleshooting) ## Contributing @@ -53,4 +37,4 @@ - [Report issues](https://github.com/CompBtBs/COBRAxy/issues) - Suggest new tutorial topics -Ready to start? Pick your first tutorial above! 🚀 \ No newline at end of file +Ready to start? Pick your first tutorial above! 🚀
--- a/COBRAxy/src/flux_simulation.py Mon Oct 27 12:33:08 2025 +0000 +++ b/COBRAxy/src/flux_simulation.py Tue Oct 28 10:44:07 2025 +0000 @@ -1,646 +1,646 @@ -""" -Flux sampling and analysis utilities for COBRA models. - -This script supports two modes: -- Mode 1 (model_and_bounds=True): load a base model and apply bounds from - separate files before sampling. -- Mode 2 (model_and_bounds=False): load complete models and sample directly. - -Sampling algorithms supported: OPTGP and CBS. Outputs include flux samples -and optional analyses (pFBA, FVA, sensitivity), saved as tabular files. -""" - -import argparse -from typing import List -import os -import pandas as pd -import numpy as np -import cobra -from joblib import Parallel, delayed, cpu_count -from cobra.sampling import OptGPSampler -import sys - -try: - from .utils import general_utils as utils - from .utils import CBS_backend - from .utils import model_utils -except: - import utils.general_utils as utils - import utils.CBS_backend as CBS_backend - import utils.model_utils as model_utils - - -################################# process args ############################### -def process_args(args: List[str] = None) -> argparse.Namespace: - """ - Processes command-line arguments. - - Args: - args (list): List of command-line arguments. - - Returns: - Namespace: An object containing parsed arguments. - """ - parser = argparse.ArgumentParser(usage='%(prog)s [options]', - description='process some value\'s') - - parser.add_argument("-mo", "--model_upload", type=str, - help="path to input file with custom rules, if provided") - - parser.add_argument("-mab", "--model_and_bounds", type=str, - choices=['True', 'False'], - required=True, - help="upload mode: True for model+bounds, False for complete models") - - parser.add_argument("-ens", "--sampling_enabled", type=str, - choices=['true', 'false'], - required=True, - help="enable sampling: 'true' for sampling, 'false' for no sampling") - - parser.add_argument('-ol', '--out_log', - help="Output log") - - parser.add_argument('-td', '--tool_dir', - type=str, - default=os.path.dirname(os.path.abspath(__file__)), - help='your tool directory (default: auto-detected package location)') - - parser.add_argument('-inf', '--input_file', - required=True, - type=str, - help='path to file containing list of input bounds files or complete model files (one per line)') - - parser.add_argument('-nif', '--name_file', - required=True, - type=str, - help='path to file containing list of cell names (one per line)') - - parser.add_argument('-a', '--algorithm', - type=str, - choices=['OPTGP', 'CBS'], - required=True, - help='choose sampling algorithm') - - parser.add_argument('-th', '--thinning', - type=int, - default=100, - required=True, - help='choose thinning') - - parser.add_argument('-ns', '--n_samples', - type=int, - required=True, - help='choose how many samples (set to 0 for optimization only)') - - parser.add_argument('-sd', '--seed', - type=int, - required=True, - help='seed for random number generation') - - parser.add_argument('-nb', '--n_batches', - type=int, - required=True, - help='choose how many batches') - - parser.add_argument('-opt', '--perc_opt', - type=float, - default=0.9, - required=False, - help='choose the fraction of optimality for FVA (0-1)') - - parser.add_argument('-ot', '--output_type', - type=str, - required=True, - help='output type for sampling results') - - parser.add_argument('-ota', '--output_type_analysis', - type=str, - required=False, - help='output type analysis (optimization methods)') - - parser.add_argument('-idop', '--output_path', - type=str, - default='flux_simulation/', - help = 'output path for fluxes') - - parser.add_argument('-otm', '--out_mean', - type = str, - required=False, - help = 'output of mean of fluxes') - - parser.add_argument('-otmd', '--out_median', - type = str, - required=False, - help = 'output of median of fluxes') - - parser.add_argument('-otq', '--out_quantiles', - type = str, - required=False, - help = 'output of quantiles of fluxes') - - parser.add_argument('-otfva', '--out_fva', - type = str, - required=False, - help = 'output of FVA results') - parser.add_argument('-otp', '--out_pfba', - type = str, - required=False, - help = 'output of pFBA results') - parser.add_argument('-ots', '--out_sensitivity', - type = str, - required=False, - help = 'output of sensitivity results') - ARGS = parser.parse_args(args) - return ARGS -########################### warning ########################################### -def warning(s :str) -> None: - """ - Log a warning message to an output log file and print it to the console. - - Args: - s (str): The warning message to be logged and printed. - - Returns: - None - """ - with open(ARGS.out_log, 'a') as log: - log.write(s + "\n\n") - print(s) - - -def write_to_file(dataset: pd.DataFrame, path: str, keep_index:bool=False, name:str=None)->None: - """ - Write a DataFrame to a TSV file under path with a given base name. - - Args: - dataset: The DataFrame to write. - name: Base file name (without extension). If None, 'path' is treated as the full file path. - path: Directory path where the file will be saved. - keep_index: Whether to keep the DataFrame index in the file. - - Returns: - None - """ - dataset.index.name = 'Reactions' - if name: - dataset.to_csv(os.path.join(path, name + ".csv"), sep = '\t', index = keep_index) - else: - dataset.to_csv(path, sep = '\t', index = keep_index) - -############################ dataset input #################################### -def read_dataset(data :str, name :str) -> pd.DataFrame: - """ - Read a dataset from a CSV file and return it as a pandas DataFrame. - - Args: - data (str): Path to the CSV file containing the dataset. - name (str): Name of the dataset, used in error messages. - - Returns: - pandas.DataFrame: DataFrame containing the dataset. - - Raises: - pd.errors.EmptyDataError: If the CSV file is empty. - sys.exit: If the CSV file has the wrong format, the execution is aborted. - """ - try: - dataset = pd.read_csv(data, sep = '\t', header = 0, index_col=0, engine='python') - except pd.errors.EmptyDataError: - sys.exit('Execution aborted: wrong format of ' + name + '\n') - if len(dataset.columns) < 2: - sys.exit('Execution aborted: wrong format of ' + name + '\n') - return dataset - - - -def OPTGP_sampler(model: cobra.Model, model_name: str, n_samples: int = 1000, thinning: int = 100, n_batches: int = 1, seed: int = 0) -> None: - """ - Samples from the OPTGP (Optimal Global Perturbation) algorithm and saves the results to CSV files. - - Args: - model (cobra.Model): The COBRA model to sample from. - model_name (str): The name of the model, used in naming output files. - n_samples (int, optional): Number of samples per batch. Default is 1000. - thinning (int, optional): Thinning parameter for the sampler. Default is 100. - n_batches (int, optional): Number of batches to run. Default is 1. - seed (int, optional): Random seed for reproducibility. Default is 0. - - Returns: - None - """ - import numpy as np - - # Get reaction IDs for consistent column ordering - reaction_ids = [rxn.id for rxn in model.reactions] - - # Sample and save each batch as numpy file - for i in range(n_batches): - optgp = OptGPSampler(model, thinning, seed) - samples = optgp.sample(n_samples) - - # Save as numpy array (more memory efficient) - batch_filename = f"{ARGS.output_path}/{model_name}_{i}_OPTGP.npy" - np.save(batch_filename, samples.to_numpy()) - - seed += 1 - - # Merge all batches into a single DataFrame - all_samples = [] - - for i in range(n_batches): - batch_filename = f"{ARGS.output_path}/{model_name}_{i}_OPTGP.npy" - batch_data = np.load(batch_filename, allow_pickle=True) - all_samples.append(batch_data) - - # Concatenate all batches - samplesTotal_array = np.vstack(all_samples) - - # Convert back to DataFrame with proper column names - samplesTotal = pd.DataFrame(samplesTotal_array, columns=reaction_ids) - - # Save the final merged result as CSV - write_to_file(samplesTotal.T, ARGS.output_path, True, name=model_name) - - # Clean up temporary numpy files - for i in range(n_batches): - batch_filename = f"{ARGS.output_path}/{model_name}_{i}_OPTGP.npy" - if os.path.exists(batch_filename): - os.remove(batch_filename) - - -def CBS_sampler(model: cobra.Model, model_name: str, n_samples: int = 1000, n_batches: int = 1, seed: int = 0) -> None: - """ - Samples using the CBS (Constraint-based Sampling) algorithm and saves the results to CSV files. - - Args: - model (cobra.Model): The COBRA model to sample from. - model_name (str): The name of the model, used in naming output files. - n_samples (int, optional): Number of samples per batch. Default is 1000. - n_batches (int, optional): Number of batches to run. Default is 1. - seed (int, optional): Random seed for reproducibility. Default is 0. - - Returns: - None - """ - import numpy as np - - # Get reaction IDs for consistent column ordering - reaction_ids = [reaction.id for reaction in model.reactions] - - # Perform FVA analysis once for all batches - df_FVA = cobra.flux_analysis.flux_variability_analysis(model, fraction_of_optimum=0).round(6) - - # Generate random objective functions for all samples across all batches - df_coefficients = CBS_backend.randomObjectiveFunction(model, n_samples * n_batches, df_FVA, seed=seed) - - # Sample and save each batch as numpy file - for i in range(n_batches): - samples = pd.DataFrame(columns=reaction_ids, index=range(n_samples)) - - try: - CBS_backend.randomObjectiveFunctionSampling( - model, - n_samples, - df_coefficients.iloc[:, i * n_samples:(i + 1) * n_samples], - samples - ) - except Exception as e: - utils.logWarning( - f"Warning: GLPK solver has failed for {model_name}. Trying with COBRA interface. Error: {str(e)}", - ARGS.out_log - ) - CBS_backend.randomObjectiveFunctionSampling_cobrapy( - model, - n_samples, - df_coefficients.iloc[:, i * n_samples:(i + 1) * n_samples], - samples - ) - - # Save as numpy array (more memory efficient) - batch_filename = f"{ARGS.output_path}/{model_name}_{i}_CBS.npy" - utils.logWarning(batch_filename, ARGS.out_log) - np.save(batch_filename, samples.to_numpy()) - - # Merge all batches into a single DataFrame - all_samples = [] - - for i in range(n_batches): - batch_filename = f"{ARGS.output_path}/{model_name}_{i}_CBS.npy" - batch_data = np.load(batch_filename, allow_pickle=True) - all_samples.append(batch_data) - - # Concatenate all batches - samplesTotal_array = np.vstack(all_samples) - - # Convert back to DataFrame with proper column namesq - samplesTotal = pd.DataFrame(samplesTotal_array, columns=reaction_ids) - - # Save the final merged result as CSV - write_to_file(samplesTotal.T, ARGS.output_path, True, name=model_name) - - # Clean up temporary numpy files - for i in range(n_batches): - batch_filename = f"{ARGS.output_path}/{model_name}_{i}_CBS.npy" - if os.path.exists(batch_filename): - os.remove(batch_filename) - - - -def model_sampler_with_bounds(model_input_original: cobra.Model, bounds_path: str, cell_name: str) -> List[pd.DataFrame]: - """ - MODE 1: Prepares the model with bounds from separate bounds file and performs sampling. - - Args: - model_input_original (cobra.Model): The original COBRA model. - bounds_path (str): Path to the CSV file containing the bounds dataset. - cell_name (str): Name of the cell, used to generate filenames for output. - - Returns: - List[pd.DataFrame]: A list of DataFrames containing statistics and analysis results. - """ - - model_input = model_input_original.copy() - bounds_df = read_dataset(bounds_path, "bounds dataset") - - # Apply bounds to model - for rxn_index, row in bounds_df.iterrows(): - try: - model_input.reactions.get_by_id(rxn_index).lower_bound = row.lower_bound - model_input.reactions.get_by_id(rxn_index).upper_bound = row.upper_bound - except KeyError: - warning(f"Warning: Reaction {rxn_index} not found in model. Skipping.") - - return perform_sampling_and_analysis(model_input, cell_name) - - -def perform_sampling_and_analysis(model_input: cobra.Model, cell_name: str) -> List[pd.DataFrame]: - """ - Common function to perform sampling and analysis on a prepared model. - - Args: - model_input (cobra.Model): The prepared COBRA model with bounds applied. - cell_name (str): Name of the cell, used to generate filenames for output. - - Returns: - List[pd.DataFrame]: A list of DataFrames containing statistics and analysis results. - """ - - returnList = [] - - if ARGS.sampling_enabled == "true": - - if ARGS.algorithm == 'OPTGP': - OPTGP_sampler(model_input, cell_name, ARGS.n_samples, ARGS.thinning, ARGS.n_batches, ARGS.seed) - elif ARGS.algorithm == 'CBS': - CBS_sampler(model_input, cell_name, ARGS.n_samples, ARGS.n_batches, ARGS.seed) - - df_mean, df_median, df_quantiles = fluxes_statistics(cell_name, ARGS.output_types) - - if("fluxes" not in ARGS.output_types): - os.remove(ARGS.output_path + "/" + cell_name + '.csv') - - returnList = [df_mean, df_median, df_quantiles] - - df_pFBA, df_FVA, df_sensitivity = fluxes_analysis(model_input, cell_name, ARGS.output_type_analysis) - - if("pFBA" in ARGS.output_type_analysis): - returnList.append(df_pFBA) - if("FVA" in ARGS.output_type_analysis): - returnList.append(df_FVA) - if("sensitivity" in ARGS.output_type_analysis): - returnList.append(df_sensitivity) - - return returnList - -def fluxes_statistics(model_name: str, output_types:List)-> List[pd.DataFrame]: - """ - Computes statistics (mean, median, quantiles) for the fluxes. - - Args: - model_name (str): Name of the model, used in filename for input. - output_types (List[str]): Types of statistics to compute (mean, median, quantiles). - - Returns: - List[pd.DataFrame]: List of DataFrames containing mean, median, and quantiles statistics. - """ - - df_mean = pd.DataFrame() - df_median= pd.DataFrame() - df_quantiles= pd.DataFrame() - - df_samples = pd.read_csv(ARGS.output_path + "/" + model_name + '.csv', sep = '\t', index_col = 0).T - df_samples = df_samples.round(8) - - for output_type in output_types: - if(output_type == "mean"): - df_mean = df_samples.mean() - df_mean = df_mean.to_frame().T - df_mean = df_mean.reset_index(drop=True) - df_mean.index = [model_name] - elif(output_type == "median"): - df_median = df_samples.median() - df_median = df_median.to_frame().T - df_median = df_median.reset_index(drop=True) - df_median.index = [model_name] - elif(output_type == "quantiles"): - newRow = [] - cols = [] - for rxn in df_samples.columns: - quantiles = df_samples[rxn].quantile([0.25, 0.50, 0.75]) - newRow.append(quantiles[0.25]) - cols.append(rxn + "_q1") - newRow.append(quantiles[0.5]) - cols.append(rxn + "_q2") - newRow.append(quantiles[0.75]) - cols.append(rxn + "_q3") - df_quantiles = pd.DataFrame(columns=cols) - df_quantiles.loc[0] = newRow - df_quantiles = df_quantiles.reset_index(drop=True) - df_quantiles.index = [model_name] - - return df_mean, df_median, df_quantiles - -def fluxes_analysis(model:cobra.Model, model_name:str, output_types:List)-> List[pd.DataFrame]: - """ - Performs flux analysis including pFBA, FVA, and sensitivity analysis. The objective function - is assumed to be already set in the model. - - Args: - model (cobra.Model): The COBRA model to analyze. - model_name (str): Name of the model, used in filenames for output. - output_types (List[str]): Types of analysis to perform (pFBA, FVA, sensitivity). - - Returns: - List[pd.DataFrame]: List of DataFrames containing pFBA, FVA, and sensitivity analysis results. - """ - - df_pFBA = pd.DataFrame() - df_FVA= pd.DataFrame() - df_sensitivity= pd.DataFrame() - - for output_type in output_types: - if(output_type == "pFBA"): - solution = cobra.flux_analysis.pfba(model) - fluxes = solution.fluxes - df_pFBA.loc[0,[rxn.id for rxn in model.reactions]] = fluxes.tolist() - df_pFBA = df_pFBA.reset_index(drop=True) - df_pFBA.index = [model_name] - df_pFBA = df_pFBA.astype(float).round(6) - elif(output_type == "FVA"): - fva = cobra.flux_analysis.flux_variability_analysis(model, fraction_of_optimum=ARGS.perc_opt, processes=1).round(8) - columns = [] - for rxn in fva.index.to_list(): - columns.append(rxn + "_min") - columns.append(rxn + "_max") - df_FVA= pd.DataFrame(columns = columns) - for index_rxn, row in fva.iterrows(): - df_FVA.loc[0, index_rxn+ "_min"] = fva.loc[index_rxn, "minimum"] - df_FVA.loc[0, index_rxn+ "_max"] = fva.loc[index_rxn, "maximum"] - df_FVA = df_FVA.reset_index(drop=True) - df_FVA.index = [model_name] - df_FVA = df_FVA.astype(float).round(6) - elif(output_type == "sensitivity"): - solution_original = model.optimize().objective_value - reactions = model.reactions - single = cobra.flux_analysis.single_reaction_deletion(model) - newRow = [] - df_sensitivity = pd.DataFrame(columns = [rxn.id for rxn in reactions], index = [model_name]) - for rxn in reactions: - newRow.append(single.knockout[rxn.id].growth.values[0]/solution_original) - df_sensitivity.loc[model_name] = newRow - df_sensitivity = df_sensitivity.astype(float).round(6) - return df_pFBA, df_FVA, df_sensitivity - -############################# main ########################################### -def main(args: List[str] = None) -> None: - """ - Initialize and run sampling/analysis based on the frontend input arguments. - - Returns: - None - """ - - num_processors = max(1, cpu_count() - 1) - - global ARGS - ARGS = process_args(args) - - if not os.path.exists('flux_simulation'): - os.makedirs('flux_simulation') - - # --- Read input files and names from the provided file paths --- - with open(ARGS.input_file, 'r') as f: - ARGS.input_files = [line.strip() for line in f if line.strip()] - - with open(ARGS.name_file, 'r') as f: - ARGS.file_names = [line.strip() for line in f if line.strip()] - - # output types (required) -> list - ARGS.output_types = ARGS.output_type.split(",") if ARGS.output_type else [] - # optional analysis output types -> list or empty - ARGS.output_type_analysis = ARGS.output_type_analysis.split(",") if ARGS.output_type_analysis else [] - - # Determine if sampling should be performed - if ARGS.sampling_enabled == "true": - perform_sampling = True - else: - perform_sampling = False - - print("=== INPUT FILES ===") - print(f"{ARGS.input_files}") - print(f"{ARGS.file_names}") - print(f"{ARGS.output_type}") - print(f"{ARGS.output_types}") - print(f"{ARGS.output_type_analysis}") - print(f"Sampling enabled: {perform_sampling} (n_samples: {ARGS.n_samples})") - - if ARGS.model_and_bounds == "True": - # MODE 1: Model + bounds (separate files) - print("=== MODE 1: Model + Bounds (separate files) ===") - - # Load base model - if not ARGS.model_upload: - sys.exit("Error: model_upload is required for Mode 1") - - base_model = model_utils.build_cobra_model_from_csv(ARGS.model_upload) - - validation = model_utils.validate_model(base_model) - - print("\n=== MODEL VALIDATION ===") - for key, value in validation.items(): - print(f"{key}: {value}") - - # Set solver verbosity to 1 to see warning and error messages only. - base_model.solver.configuration.verbosity = 1 - - # Process each bounds file with the base model - results = Parallel(n_jobs=num_processors)( - delayed(model_sampler_with_bounds)(base_model, bounds_file, cell_name) - for bounds_file, cell_name in zip(ARGS.input_files, ARGS.file_names) - ) - - else: - # MODE 2: Multiple complete models - print("=== MODE 2: Multiple complete models ===") - - # Process each complete model file - results = Parallel(n_jobs=num_processors)( - delayed(perform_sampling_and_analysis)(model_utils.build_cobra_model_from_csv(model_file), cell_name) - for model_file, cell_name in zip(ARGS.input_files, ARGS.file_names) - ) - - # Handle sampling outputs (only if sampling was performed) - if perform_sampling: - print("=== PROCESSING SAMPLING RESULTS ===") - - all_mean = pd.concat([result[0] for result in results], ignore_index=False) - all_median = pd.concat([result[1] for result in results], ignore_index=False) - all_quantiles = pd.concat([result[2] for result in results], ignore_index=False) - - if "mean" in ARGS.output_types: - all_mean = all_mean.fillna(0.0) - all_mean = all_mean.sort_index() - write_to_file(all_mean.T, ARGS.out_mean, True) - - if "median" in ARGS.output_types: - all_median = all_median.fillna(0.0) - all_median = all_median.sort_index() - write_to_file(all_median.T, ARGS.out_median, True) - - if "quantiles" in ARGS.output_types: - all_quantiles = all_quantiles.fillna(0.0) - all_quantiles = all_quantiles.sort_index() - write_to_file(all_quantiles.T, ARGS.out_quantiles, True) - else: - print("=== SAMPLING SKIPPED (n_samples = 0 or sampling disabled) ===") - - # Handle optimization analysis outputs (always available) - print("=== PROCESSING OPTIMIZATION RESULTS ===") - - # Determine the starting index for optimization results - # If sampling was performed, optimization results start at index 3 - # If no sampling, optimization results start at index 0 - index_result = 3 if perform_sampling else 0 - - if "pFBA" in ARGS.output_type_analysis: - all_pFBA = pd.concat([result[index_result] for result in results], ignore_index=False) - all_pFBA = all_pFBA.sort_index() - write_to_file(all_pFBA.T, ARGS.out_pfba, True) - index_result += 1 - - if "FVA" in ARGS.output_type_analysis: - all_FVA = pd.concat([result[index_result] for result in results], ignore_index=False) - all_FVA = all_FVA.sort_index() - write_to_file(all_FVA.T, ARGS.out_fva, True) - index_result += 1 - - if "sensitivity" in ARGS.output_type_analysis: - all_sensitivity = pd.concat([result[index_result] for result in results], ignore_index=False) - all_sensitivity = all_sensitivity.sort_index() - write_to_file(all_sensitivity.T, ARGS.out_sensitivity, True) - - return - -############################################################################## -if __name__ == "__main__": +""" +Flux sampling and analysis utilities for COBRA models. + +This script supports two modes: +- Mode 1 (model_and_bounds=True): load a base model and apply bounds from + separate files before sampling. +- Mode 2 (model_and_bounds=False): load complete models and sample directly. + +Sampling algorithms supported: OPTGP and CBS. Outputs include flux samples +and optional analyses (pFBA, FVA, sensitivity), saved as tabular files. +""" + +import argparse +from typing import List +import os +import pandas as pd +import numpy as np +import cobra +from joblib import Parallel, delayed, cpu_count +from cobra.sampling import OptGPSampler +import sys + +try: + from .utils import general_utils as utils + from .utils import CBS_backend + from .utils import model_utils +except: + import utils.general_utils as utils + import utils.CBS_backend as CBS_backend + import utils.model_utils as model_utils + + +################################# process args ############################### +def process_args(args: List[str] = None) -> argparse.Namespace: + """ + Processes command-line arguments. + + Args: + args (list): List of command-line arguments. + + Returns: + Namespace: An object containing parsed arguments. + """ + parser = argparse.ArgumentParser(usage='%(prog)s [options]', + description='process some value\'s') + + parser.add_argument("-mo", "--model_upload", type=str, + help="path to input file with custom rules, if provided") + + parser.add_argument("-mab", "--model_and_bounds", type=str, + choices=['True', 'False'], + required=True, + help="upload mode: True for model+bounds, False for complete models") + + parser.add_argument("-ens", "--sampling_enabled", type=str, + choices=['true', 'false'], + required=True, + help="enable sampling: 'true' for sampling, 'false' for no sampling") + + parser.add_argument('-ol', '--out_log', + help="Output log") + + parser.add_argument('-td', '--tool_dir', + type=str, + default=os.path.dirname(os.path.abspath(__file__)), + help='your tool directory (default: auto-detected package location)') + + parser.add_argument('-inf', '--input_file', + required=True, + type=str, + help='path to file containing list of input bounds files or complete model files (one per line)') + + parser.add_argument('-nif', '--name_file', + required=True, + type=str, + help='path to file containing list of cell names (one per line)') + + parser.add_argument('-a', '--algorithm', + type=str, + choices=['OPTGP', 'CBS'], + required=True, + help='choose sampling algorithm') + + parser.add_argument('-th', '--thinning', + type=int, + default=100, + required=True, + help='choose thinning') + + parser.add_argument('-ns', '--n_samples', + type=int, + required=True, + help='choose how many samples (set to 0 for optimization only)') + + parser.add_argument('-sd', '--seed', + type=int, + required=True, + help='seed for random number generation') + + parser.add_argument('-nb', '--n_batches', + type=int, + required=True, + help='choose how many batches') + + parser.add_argument('-opt', '--perc_opt', + type=float, + default=0.9, + required=False, + help='choose the fraction of optimality for FVA (0-1)') + + parser.add_argument('-ot', '--output_type', + type=str, + required=True, + help='output type for sampling results') + + parser.add_argument('-ota', '--output_type_analysis', + type=str, + required=False, + help='output type analysis (optimization methods)') + + parser.add_argument('-idop', '--output_path', + type=str, + default='flux_simulation/', + help = 'output path for fluxes') + + parser.add_argument('-otm', '--out_mean', + type = str, + required=False, + help = 'output of mean of fluxes') + + parser.add_argument('-otmd', '--out_median', + type = str, + required=False, + help = 'output of median of fluxes') + + parser.add_argument('-otq', '--out_quantiles', + type = str, + required=False, + help = 'output of quantiles of fluxes') + + parser.add_argument('-otfva', '--out_fva', + type = str, + required=False, + help = 'output of FVA results') + parser.add_argument('-otp', '--out_pfba', + type = str, + required=False, + help = 'output of pFBA results') + parser.add_argument('-ots', '--out_sensitivity', + type = str, + required=False, + help = 'output of sensitivity results') + ARGS = parser.parse_args(args) + return ARGS +########################### warning ########################################### +def warning(s :str) -> None: + """ + Log a warning message to an output log file and print it to the console. + + Args: + s (str): The warning message to be logged and printed. + + Returns: + None + """ + with open(ARGS.out_log, 'a') as log: + log.write(s + "\n\n") + print(s) + + +def write_to_file(dataset: pd.DataFrame, path: str, keep_index:bool=False, name:str=None)->None: + """ + Write a DataFrame to a TSV file under path with a given base name. + + Args: + dataset: The DataFrame to write. + name: Base file name (without extension). If None, 'path' is treated as the full file path. + path: Directory path where the file will be saved. + keep_index: Whether to keep the DataFrame index in the file. + + Returns: + None + """ + dataset.index.name = 'Reactions' + if name: + dataset.to_csv(os.path.join(path, name + ".csv"), sep = '\t', index = keep_index) + else: + dataset.to_csv(path, sep = '\t', index = keep_index) + +############################ dataset input #################################### +def read_dataset(data :str, name :str) -> pd.DataFrame: + """ + Read a dataset from a CSV file and return it as a pandas DataFrame. + + Args: + data (str): Path to the CSV file containing the dataset. + name (str): Name of the dataset, used in error messages. + + Returns: + pandas.DataFrame: DataFrame containing the dataset. + + Raises: + pd.errors.EmptyDataError: If the CSV file is empty. + sys.exit: If the CSV file has the wrong format, the execution is aborted. + """ + try: + dataset = pd.read_csv(data, sep = '\t', header = 0, index_col=0, engine='python') + except pd.errors.EmptyDataError: + sys.exit('Execution aborted: wrong format of ' + name + '\n') + if len(dataset.columns) < 2: + sys.exit('Execution aborted: wrong format of ' + name + '\n') + return dataset + + + +def OPTGP_sampler(model: cobra.Model, model_name: str, n_samples: int = 1000, thinning: int = 100, n_batches: int = 1, seed: int = 0) -> None: + """ + Samples from the OPTGP (Optimal Global Perturbation) algorithm and saves the results to CSV files. + + Args: + model (cobra.Model): The COBRA model to sample from. + model_name (str): The name of the model, used in naming output files. + n_samples (int, optional): Number of samples per batch. Default is 1000. + thinning (int, optional): Thinning parameter for the sampler. Default is 100. + n_batches (int, optional): Number of batches to run. Default is 1. + seed (int, optional): Random seed for reproducibility. Default is 0. + + Returns: + None + """ + import numpy as np + + # Get reaction IDs for consistent column ordering + reaction_ids = [rxn.id for rxn in model.reactions] + + # Sample and save each batch as numpy file + for i in range(n_batches): + optgp = OptGPSampler(model, thinning, seed) + samples = optgp.sample(n_samples) + + # Save as numpy array (more memory efficient) + batch_filename = f"{ARGS.output_path}/{model_name}_{i}_OPTGP.npy" + np.save(batch_filename, samples.to_numpy()) + + seed += 1 + + # Merge all batches into a single DataFrame + all_samples = [] + + for i in range(n_batches): + batch_filename = f"{ARGS.output_path}/{model_name}_{i}_OPTGP.npy" + batch_data = np.load(batch_filename, allow_pickle=True) + all_samples.append(batch_data) + + # Concatenate all batches + samplesTotal_array = np.vstack(all_samples) + + # Convert back to DataFrame with proper column names + samplesTotal = pd.DataFrame(samplesTotal_array, columns=reaction_ids) + + # Save the final merged result as CSV + write_to_file(samplesTotal.T, ARGS.output_path, True, name=model_name) + + # Clean up temporary numpy files + for i in range(n_batches): + batch_filename = f"{ARGS.output_path}/{model_name}_{i}_OPTGP.npy" + if os.path.exists(batch_filename): + os.remove(batch_filename) + + +def CBS_sampler(model: cobra.Model, model_name: str, n_samples: int = 1000, n_batches: int = 1, seed: int = 0) -> None: + """ + Samples using the CBS (Constraint-based Sampling) algorithm and saves the results to CSV files. + + Args: + model (cobra.Model): The COBRA model to sample from. + model_name (str): The name of the model, used in naming output files. + n_samples (int, optional): Number of samples per batch. Default is 1000. + n_batches (int, optional): Number of batches to run. Default is 1. + seed (int, optional): Random seed for reproducibility. Default is 0. + + Returns: + None + """ + import numpy as np + + # Get reaction IDs for consistent column ordering + reaction_ids = [reaction.id for reaction in model.reactions] + + # Perform FVA analysis once for all batches + df_FVA = cobra.flux_analysis.flux_variability_analysis(model, fraction_of_optimum=0).round(6) + + # Generate random objective functions for all samples across all batches + df_coefficients = CBS_backend.randomObjectiveFunction(model, n_samples * n_batches, df_FVA, seed=seed) + + # Sample and save each batch as numpy file + for i in range(n_batches): + samples = pd.DataFrame(columns=reaction_ids, index=range(n_samples)) + + try: + CBS_backend.randomObjectiveFunctionSampling( + model, + n_samples, + df_coefficients.iloc[:, i * n_samples:(i + 1) * n_samples], + samples + ) + except Exception as e: + utils.logWarning( + f"Warning: GLPK solver has failed for {model_name}. Trying with COBRA interface. Error: {str(e)}", + ARGS.out_log + ) + CBS_backend.randomObjectiveFunctionSampling_cobrapy( + model, + n_samples, + df_coefficients.iloc[:, i * n_samples:(i + 1) * n_samples], + samples + ) + + # Save as numpy array (more memory efficient) + batch_filename = f"{ARGS.output_path}/{model_name}_{i}_CBS.npy" + utils.logWarning(batch_filename, ARGS.out_log) + np.save(batch_filename, samples.to_numpy()) + + # Merge all batches into a single DataFrame + all_samples = [] + + for i in range(n_batches): + batch_filename = f"{ARGS.output_path}/{model_name}_{i}_CBS.npy" + batch_data = np.load(batch_filename, allow_pickle=True) + all_samples.append(batch_data) + + # Concatenate all batches + samplesTotal_array = np.vstack(all_samples) + + # Convert back to DataFrame with proper column namesq + samplesTotal = pd.DataFrame(samplesTotal_array, columns=reaction_ids) + + # Save the final merged result as CSV + write_to_file(samplesTotal.T, ARGS.output_path, True, name=model_name) + + # Clean up temporary numpy files + for i in range(n_batches): + batch_filename = f"{ARGS.output_path}/{model_name}_{i}_CBS.npy" + if os.path.exists(batch_filename): + os.remove(batch_filename) + + + +def model_sampler_with_bounds(model_path: str, bounds_path: str, cell_name: str) -> List[pd.DataFrame]: + """ + MODE 1: Loads model from file, applies bounds from separate bounds file and performs sampling. + + Args: + model_path (str): Path to the tabular model file. + bounds_path (str): Path to the CSV file containing the bounds dataset. + cell_name (str): Name of the cell, used to generate filenames for output. + + Returns: + List[pd.DataFrame]: A list of DataFrames containing statistics and analysis results. + """ + + model_input = model_utils.build_cobra_model_from_csv(model_path) + + validation = model_utils.validate_model(model_input) + + print("\n=== MODEL VALIDATION ===") + for key, value in validation.items(): + print(f"{key}: {value}") + + model_input.solver.configuration.verbosity = 1 + + bounds_df = read_dataset(bounds_path, "bounds dataset") + + # Apply bounds to model + for rxn_index, row in bounds_df.iterrows(): + try: + model_input.reactions.get_by_id(rxn_index).lower_bound = row.lower_bound + model_input.reactions.get_by_id(rxn_index).upper_bound = row.upper_bound + except KeyError: + warning(f"Warning: Reaction {rxn_index} not found in model. Skipping.") + + return perform_sampling_and_analysis(model_input, cell_name) + + +def perform_sampling_and_analysis(model_path: str, cell_name: str) -> List[pd.DataFrame]: + """ + Common function to perform sampling and analysis on a prepared model. + + Args: + model_path (str): Path to the tabular model file. model with bounds applied. + cell_name (str): Name of the cell, used to generate filenames for output. + + Returns: + List[pd.DataFrame]: A list of DataFrames containing statistics and analysis results. + """ + + returnList = [] + + model_input = model_utils.build_cobra_model_from_csv(model_path) + + if ARGS.sampling_enabled == "true": + + if ARGS.algorithm == 'OPTGP': + OPTGP_sampler(model_input, cell_name, ARGS.n_samples, ARGS.thinning, ARGS.n_batches, ARGS.seed) + elif ARGS.algorithm == 'CBS': + CBS_sampler(model_input, cell_name, ARGS.n_samples, ARGS.n_batches, ARGS.seed) + + df_mean, df_median, df_quantiles = fluxes_statistics(cell_name, ARGS.output_types) + + if("fluxes" not in ARGS.output_types): + os.remove(ARGS.output_path + "/" + cell_name + '.csv') + + returnList = [df_mean, df_median, df_quantiles] + + df_pFBA, df_FVA, df_sensitivity = fluxes_analysis(model_input, cell_name, ARGS.output_type_analysis) + + if("pFBA" in ARGS.output_type_analysis): + returnList.append(df_pFBA) + if("FVA" in ARGS.output_type_analysis): + returnList.append(df_FVA) + if("sensitivity" in ARGS.output_type_analysis): + returnList.append(df_sensitivity) + + return returnList + +def fluxes_statistics(model_name: str, output_types:List)-> List[pd.DataFrame]: + """ + Computes statistics (mean, median, quantiles) for the fluxes. + + Args: + model_name (str): Name of the model, used in filename for input. + output_types (List[str]): Types of statistics to compute (mean, median, quantiles). + + Returns: + List[pd.DataFrame]: List of DataFrames containing mean, median, and quantiles statistics. + """ + + df_mean = pd.DataFrame() + df_median= pd.DataFrame() + df_quantiles= pd.DataFrame() + + df_samples = pd.read_csv(ARGS.output_path + "/" + model_name + '.csv', sep = '\t', index_col = 0).T + df_samples = df_samples.round(8) + + for output_type in output_types: + if(output_type == "mean"): + df_mean = df_samples.mean() + df_mean = df_mean.to_frame().T + df_mean = df_mean.reset_index(drop=True) + df_mean.index = [model_name] + elif(output_type == "median"): + df_median = df_samples.median() + df_median = df_median.to_frame().T + df_median = df_median.reset_index(drop=True) + df_median.index = [model_name] + elif(output_type == "quantiles"): + newRow = [] + cols = [] + for rxn in df_samples.columns: + quantiles = df_samples[rxn].quantile([0.25, 0.50, 0.75]) + newRow.append(quantiles[0.25]) + cols.append(rxn + "_q1") + newRow.append(quantiles[0.5]) + cols.append(rxn + "_q2") + newRow.append(quantiles[0.75]) + cols.append(rxn + "_q3") + df_quantiles = pd.DataFrame(columns=cols) + df_quantiles.loc[0] = newRow + df_quantiles = df_quantiles.reset_index(drop=True) + df_quantiles.index = [model_name] + + return df_mean, df_median, df_quantiles + +def fluxes_analysis(model:cobra.Model, model_name:str, output_types:List)-> List[pd.DataFrame]: + """ + Performs flux analysis including pFBA, FVA, and sensitivity analysis. The objective function + is assumed to be already set in the model. + + Args: + model (cobra.Model): The COBRA model to analyze. + model_name (str): Name of the model, used in filenames for output. + output_types (List[str]): Types of analysis to perform (pFBA, FVA, sensitivity). + + Returns: + List[pd.DataFrame]: List of DataFrames containing pFBA, FVA, and sensitivity analysis results. + """ + + df_pFBA = pd.DataFrame() + df_FVA= pd.DataFrame() + df_sensitivity= pd.DataFrame() + + for output_type in output_types: + if(output_type == "pFBA"): + solution = cobra.flux_analysis.pfba(model) + fluxes = solution.fluxes + df_pFBA.loc[0,[rxn.id for rxn in model.reactions]] = fluxes.tolist() + df_pFBA = df_pFBA.reset_index(drop=True) + df_pFBA.index = [model_name] + df_pFBA = df_pFBA.astype(float).round(6) + elif(output_type == "FVA"): + fva = cobra.flux_analysis.flux_variability_analysis(model, fraction_of_optimum=ARGS.perc_opt, processes=1).round(8) + columns = [] + for rxn in fva.index.to_list(): + columns.append(rxn + "_min") + columns.append(rxn + "_max") + df_FVA= pd.DataFrame(columns = columns) + for index_rxn, row in fva.iterrows(): + df_FVA.loc[0, index_rxn+ "_min"] = fva.loc[index_rxn, "minimum"] + df_FVA.loc[0, index_rxn+ "_max"] = fva.loc[index_rxn, "maximum"] + df_FVA = df_FVA.reset_index(drop=True) + df_FVA.index = [model_name] + df_FVA = df_FVA.astype(float).round(6) + elif(output_type == "sensitivity"): + solution_original = model.optimize().objective_value + reactions = model.reactions + single = cobra.flux_analysis.single_reaction_deletion(model) + newRow = [] + df_sensitivity = pd.DataFrame(columns = [rxn.id for rxn in reactions], index = [model_name]) + for rxn in reactions: + newRow.append(single.knockout[rxn.id].growth.values[0]/solution_original) + df_sensitivity.loc[model_name] = newRow + df_sensitivity = df_sensitivity.astype(float).round(6) + return df_pFBA, df_FVA, df_sensitivity + +############################# main ########################################### +def main(args: List[str] = None) -> None: + """ + Initialize and run sampling/analysis based on the frontend input arguments. + + Returns: + None + """ + + num_processors = max(1, cpu_count() - 1) + + global ARGS + ARGS = process_args(args) + + if not os.path.exists('flux_simulation'): + os.makedirs('flux_simulation') + + # --- Read input files and names from the provided file paths --- + with open(ARGS.input_file, 'r') as f: + ARGS.input_files = [line.strip() for line in f if line.strip()] + + with open(ARGS.name_file, 'r') as f: + ARGS.file_names = [line.strip() for line in f if line.strip()] + + # output types (required) -> list + ARGS.output_types = ARGS.output_type.split(",") if ARGS.output_type else [] + # optional analysis output types -> list or empty + ARGS.output_type_analysis = ARGS.output_type_analysis.split(",") if ARGS.output_type_analysis else [] + + # Determine if sampling should be performed + if ARGS.sampling_enabled == "true": + perform_sampling = True + else: + perform_sampling = False + + print("=== INPUT FILES ===") + print(f"{ARGS.input_files}") + print(f"{ARGS.file_names}") + print(f"{ARGS.output_type}") + print(f"{ARGS.output_types}") + print(f"{ARGS.output_type_analysis}") + print(f"Sampling enabled: {perform_sampling} (n_samples: {ARGS.n_samples})") + + if ARGS.model_and_bounds == "True": + # MODE 1: Model + bounds (separate files) + print("=== MODE 1: Model + Bounds (separate files) ===") + + # Load base model + if not ARGS.model_upload: + sys.exit("Error: model_upload is required for Mode 1") + + # Process each bounds file with the base model + results = Parallel(n_jobs=num_processors)( + delayed(model_sampler_with_bounds)(ARGS.model_upload, bounds_file, cell_name) + for bounds_file, cell_name in zip(ARGS.input_files, ARGS.file_names) + ) + + else: + # MODE 2: Multiple complete models + print("=== MODE 2: Multiple complete models ===") + + # Process each complete model file + results = Parallel(n_jobs=num_processors)( + delayed(perform_sampling_and_analysis)(model_file, cell_name) + for model_file, cell_name in zip(ARGS.input_files, ARGS.file_names) + ) + + # Handle sampling outputs (only if sampling was performed) + if perform_sampling: + print("=== PROCESSING SAMPLING RESULTS ===") + + all_mean = pd.concat([result[0] for result in results], ignore_index=False) + all_median = pd.concat([result[1] for result in results], ignore_index=False) + all_quantiles = pd.concat([result[2] for result in results], ignore_index=False) + + if "mean" in ARGS.output_types: + all_mean = all_mean.fillna(0.0) + all_mean = all_mean.sort_index() + write_to_file(all_mean.T, ARGS.out_mean, True) + + if "median" in ARGS.output_types: + all_median = all_median.fillna(0.0) + all_median = all_median.sort_index() + write_to_file(all_median.T, ARGS.out_median, True) + + if "quantiles" in ARGS.output_types: + all_quantiles = all_quantiles.fillna(0.0) + all_quantiles = all_quantiles.sort_index() + write_to_file(all_quantiles.T, ARGS.out_quantiles, True) + else: + print("=== SAMPLING SKIPPED (n_samples = 0 or sampling disabled) ===") + + # Handle optimization analysis outputs (always available) + print("=== PROCESSING OPTIMIZATION RESULTS ===") + + # Determine the starting index for optimization results + # If sampling was performed, optimization results start at index 3 + # If no sampling, optimization results start at index 0 + index_result = 3 if perform_sampling else 0 + + if "pFBA" in ARGS.output_type_analysis: + all_pFBA = pd.concat([result[index_result] for result in results], ignore_index=False) + all_pFBA = all_pFBA.sort_index() + write_to_file(all_pFBA.T, ARGS.out_pfba, True) + index_result += 1 + + if "FVA" in ARGS.output_type_analysis: + all_FVA = pd.concat([result[index_result] for result in results], ignore_index=False) + all_FVA = all_FVA.sort_index() + write_to_file(all_FVA.T, ARGS.out_fva, True) + index_result += 1 + + if "sensitivity" in ARGS.output_type_analysis: + all_sensitivity = pd.concat([result[index_result] for result in results], ignore_index=False) + all_sensitivity = all_sensitivity.sort_index() + write_to_file(all_sensitivity.T, ARGS.out_sensitivity, True) + + return + +############################################################################## +if __name__ == "__main__": main() \ No newline at end of file
--- a/COBRAxy/src/test/testing.py Mon Oct 27 12:33:08 2025 +0000 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,806 +0,0 @@ -# This is a general-purpose "testing utilities" module for the COBRAxy tool. -# This code was written entirely by m.ferrari133@campus.unimib.it and then (hopefully) many -# more people contributed by writing tests for this tool's modules, feel free to send an email for -# any questions. - -# How the testing module works: -# The testing module allows you to easily set up unit tests for functions in a module, obtaining -# information on what each method returns, when and how it fails and so on. - -# How do I test a module? -# - create a function at the very bottom, before the __main__ -# - import the stuff you need -# - create a UnitTester instance, follow the documentation -# - fill it up with UnitTest instances, follow the documentation -# - each UnitTest tests the function by passing specific parameters to it and by veryfing the correctness -# of the output via a CheckingMode instance -# - call testModule() on the UnitTester - -# TODO(s): -# - This module was written before the utilities were introduced, it may want to use some of those functions. -# - I never got around to writing a CheckingMode for methods you WANT to fail in certain scenarios, I -# like the name "MustPanic". -# - It's good practice to enforce boolean arguments of a function to be passed as kwargs and I did it a lot -# in the code I wrote for these tool's modules, but the current implementation of UnitTest doesn't allow -# you to pass kwargs to the functions you test. -# - Implement integration tests as well, maybe! - -## Imports: -from typing import Dict, Callable, Type, List -from enum import Enum, auto -from collections.abc import Iterable - -## Generic utilities: -class TestResult: - """ - Represents the result of a test and contains all the relevant information about it. Loosely models two variants: - - Ok: The test passed, no further information is saved besides the target's name. - - Err: The test failed, an error message and further contextual details are also saved. - - This class does not ensure a static proof of the two states' behaviour, their meaning or mutual exclusivity outside - of the :bool property "isPass", meant for outside reads. - """ - def __init__(self, isPass :bool, targetName :str, errMsg = "", details = "") -> None: - """ - (Private) Initializes an instance of TestResult. - - Args: - isPass : distinction between TestResult.Ok (True) and TestResult.Err (False). - targetName : the name of the target object / property / function / module being tested, not always set - to a meaningful value at this stage. - - errMsg : concise error message explaining the test's failure. - details : contextual details about the error. - - Returns: - None : practically, a TestResult instance. - """ - self.isPass = isPass - self.isFail = not isPass # Convenience above all - - self.targetName = targetName - if isPass: return - - self.errMsg = errMsg - self.details = details - - @classmethod - def Ok(cls, targetName = "") -> "TestResult": - """ - Factory method for TestResult.Ok, where all we need to know is that our test passed. - - Args: - targetName : the name of the target object / property / function / module being tested, not always set - to a meaningful value at this stage. - - Returns: - TestResult : a new Ok instance. - """ - return cls(True, targetName) - - @classmethod - def Err(cls, errMsg :str, details :str, targetName = "") -> "TestResult": - """ - Factory method for TestResult.Err, where we store relevant error information. - - Args: - errMsg : concise error message explaining the test's failure. - details : contextual details about the error. - targetName : the name of the target object / property / function / module being tested, not always set - to a meaningful value at this stage. - - Returns: - TestResult : a new Err instance. - """ - return cls(False, targetName, errMsg, details) - - def log(self, isCompact = True) -> str: - """ - Dumps all the available information in a :str, ready for logging. - - Args: - isCompact : if True limits the amount of information displayed to the targetName. - - Returns: - str : information about this test result. - - """ - if isCompact: - return f"{TestResult.__name__}::{'Ok' if self.isPass else 'Err'}(Unit test on {self.targetName})" - - logMsg = f"Unit test on {self.targetName} {'passed' if self.isPass else f'failed because {self.errMsg}'}" - if self.details: logMsg += f", {self.details}" - return logMsg - - def throw(self) -> None: - #TODO: finer Exception typing would be desirable - """ - Logs the result information and panics. - - Raises: - Exception : an error containing log information about the test result. - - Returns: - None - - """ - raise Exception(self.log()) - -class CheckingMode: - """ - (Private) Represents a way to check a value for correctness, in the context of "testing" it. - """ - - def __init__(self) -> None: - """ - (Private) Implemented on child classes, initializes an instance of CheckingMode. - - Returns: - None : practically, a CheckingMode instance. - """ - self.logMsg = "CheckingMode base class should not be used directly" - - def __checkPasses__(self, _) -> bool: - """ - (Private) Implemented on child classes, performs the actual correctness check on a received value. - - Returns: - bool : True if the check passed, False if it failed. - """ - return True - - def check(self, value) -> TestResult: - """ - Converts the :bool evaluation of the value's correctness to a TestResult. - - Args: - value : the value to check. - - Returns: - TestResult : the result of the check. - """ - return TestResult.Ok() if self.__checkPasses__(value) else TestResult.Err(self.logMsg, f"got {value} instead") - - def __repr__(self) -> str: - """ - (Private) Implemented on child classes, formats :object as :str. - """ - return self.__class__.__name__ - -class ExactValue(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value needs to match another exactly. - """ - - #I suggest solving the more complex equality checking edge cases with the "Satisfies" and "MatchingShape" variants. - def __init__(self, value) -> None: - self.value = value - self.logMsg = f"value needed to match {value} exactly" - - def __checkPasses__(self, value) -> bool: - return self.value == value - - def __repr__(self) -> str: - return f"{super().__repr__()}({self.value})" - -class AcceptedValues(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value needs to appear in a list of accepted values. - """ - def __init__(self, *values) -> None: - self.values = values - self.logMsg = f"value needed to be one of these: {values}" - - def __checkPasses__(self, value) -> bool: - return value in self.values - - def __repr__(self) -> str: - return f"{super().__repr__()}{self.values}" - -class SatisfiesPredicate(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value needs to verify a given predicate, as in - the predicate accepts it as input and returns True. - """ - def __init__(self, pred :Callable[..., bool], predName = "") -> None: - self.pred = pred - self.logMsg = f"value needed to verify a predicate{bool(predName) * f' called {predName}'}" - - def __checkPasses__(self, *params) -> bool: - return self.pred(*params) - - def __repr__(self) -> str: - return f"{super().__repr__()}(T) -> bool" - -class IsOfType(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value needs to be of a certain type. - """ - def __init__(self, type :Type) -> None: - self.type = type - self.logMsg = f"value needed to be of type {type.__name__}" - - def __checkPasses__(self, value :Type) -> bool: - return isinstance(value, self.type) - - def __repr__(self) -> str: - return f"{super().__repr__()}:{self.type.__name__}" - -class Exists(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value needs to exist (or not!). Mainly employed as a quick default - check that always passes, it still upholds its contract when it comes to checking for existing properties in objects - without much concern on what value they contain. - """ - def __init__(self, exists = True) -> None: - self.exists = exists - self.logMsg = f"value needed to {(not exists) * 'not '}exist" - - def __checkPasses__(self, _) -> bool: return self.exists - - def __repr__(self) -> str: - return f"{super().__repr__() if self.exists else 'IsMissing'}" - -class MatchingShape(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value is an object that needs to have a certain shape, - as in to posess properties with a given name and value. Each property is checked for existance and correctness with - its own given CheckingMode. - """ - def __init__(self, props :Dict[str, CheckingMode], objName = "") -> None: - """ - (Private) Initializes an instance of MatchingShape. - - Args: - props : :dict using property names as keys and checking modes for the property's value as values. - objName : label for the object we're testing the shape of. - - Returns: - None : practically, a MatchingShape instance. - """ - self.props = props - self.objName = objName - - self.shapeRepr = " {\n" + "\n".join([f" {propName} : {prop}" for propName, prop in props.items()]) + "\n}" - - def check(self, obj :object) -> TestResult: - objIsDict = isinstance(obj, dict) # Python forces us to distinguish between object properties and dict keys - for propName, checkingMode in self.props.items(): - # Checking if the property exists: - if (not objIsDict and not hasattr(obj, propName)) or (objIsDict and propName not in obj): - if not isinstance(checkingMode, Exists): return TestResult.Err( - f"property \"{propName}\" doesn't exist on object {self.objName}", "", self.objName) - - if not checkingMode.exists: return TestResult.Ok(self.objName) - # Either the property value is meant to be checked (checkingMode is anything but Exists) - # or we want the property to not exist, all other cases are handled correctly ahead - - checkRes = checkingMode.check(obj[propName] if objIsDict else getattr(obj, propName)) - if checkRes.isPass: continue - - checkRes.targetName = self.objName - return TestResult.Err( - f"property \"{propName}\" failed check {checkingMode} on shape {obj}", - checkRes.log(isCompact = False), - self.objName) - - return TestResult.Ok(self.objName) - - def __repr__(self) -> str: - return super().__repr__() + self.shapeRepr - -class Many(CheckingMode): - """ - CheckingMode subclass variant to be used when the checked value is an Iterable we want to check item by item. - """ - def __init__(self, *values :CheckingMode) -> None: - self.values = values - self.shapeRepr = " [\n" + "\n".join([f" {value}" for value in values]) + "\n]" - - def check(self, coll :Iterable) -> TestResult: - amt = len(coll) - expectedAmt = len(self.values) - # Length equality is forced: - if amt != expectedAmt: return TestResult.Err( - "items' quantities don't match", f"expected {expectedAmt} items, but got {amt}") - - # Items in the given collection value are paired in order with the corresponding checkingMode meant for each of them - for item, checkingMode in zip(coll, self.values): - checkRes = checkingMode.check(item) - if checkRes.isFail: return TestResult.Err( - f"item in list failed check {checkingMode}", - checkRes.log(isCompact = False)) - - return TestResult.Ok() - - def __repr__(self) -> str: - return super().__repr__() + self.shapeRepr - -class LogMode(Enum): - """ - Represents the level of detail of a logged message. Models 4 variants, in order of increasing detail: - - Minimal : Logs the overall test result for the entire module. - - Default : Also logs all single test fails, in compact mode. - - Detailed : Logs all function test results, in compact mode. - - Pedantic : Also logs all single test results in detailed mode. - """ - Minimal = auto() - Default = auto() - Detailed = auto() - Pedantic = auto() - - def isMoreVerbose(self, requiredMode :"LogMode") -> bool: - """ - Compares the instance's level of detail with that of another. - - Args: - requiredMode : the other instance. - - Returns: - bool : True if the caller instance is a more detailed variant than the other. - """ - return self.value >= requiredMode.value - -## Specific Unit Testing utilities: -class UnitTest: - """ - Represents a unit test, the test of a single function's isolated correctness. - """ - def __init__(self, func :Callable, inputParams :list, expectedRes :CheckingMode) -> None: - """ - (Private) Initializes an instance of UnitTest. - - Args: - func : the function to test. - inputParams : list of parameters to pass as inputs to the function, in order. - expectedRes : checkingMode to test the function's return value for correctness. - - Returns: - None : practically, a UnitTest instance. - """ - self.func = func - self.inputParams = inputParams - self.expectedRes = expectedRes - - self.funcName = func.__name__ - - def test(self) -> TestResult: - """ - Tests the function. - - Returns: - TestResult : the test's result. - """ - result = None - try: result = self.func(*self.inputParams) - except Exception as e: return TestResult.Err("the function panicked at runtime", e, self.funcName) - - checkRes = self.expectedRes.check(result) - checkRes.targetName = self.funcName - return checkRes - -class UnitTester: - """ - Manager class for unit testing an entire module, groups single UnitTests together and executes them in order on a - per-function basis (tests about the same function are executed consecutively) giving back as much information as - possible depending on the selected logMode. More customization options are available. - """ - def __init__(self, moduleName :str, logMode = LogMode.Default, stopOnFail = True, *funcTests :'UnitTest') -> None: - """ - (Private) initializes an instance of UnitTester. - - Args: - moduleName : name of the tested module. - logMode : level of detail applied to all messages logged during the test. - stopOnFail : if True, the test stops entirely after one unit test fails. - funcTests : the unit tests to perform on the module. - - Returns: - None : practically, a UnitTester instance. - """ - self.logMode = logMode - self.moduleName = moduleName - self.stopOnFail = stopOnFail - - # This ensures the per-function order: - self.funcTests :Dict[str, List[UnitTest]]= {} - for test in funcTests: - if test.funcName in self.funcTests: self.funcTests[test.funcName].append(test) - else: self.funcTests[test.funcName] = [test] - - def logTestResult(self, testRes :TestResult) -> None: - """ - Prints the formatted result information of a unit test. - - Args: - testRes : the result of the test. - - Returns: - None - """ - if testRes.isPass: return self.log("Passed!", LogMode.Detailed, indent = 2) - - failMsg = "Failed! " - # Doing it this way prevents .log computations when not needed - if self.logMode.isMoreVerbose(LogMode.Detailed): - # Given that Pedantic is the most verbose variant, there's no point in comparing with LogMode.isMoreVerbose - failMsg += testRes.log(self.logMode is not LogMode.Pedantic) - - self.log(failMsg, indent = 2) - - def log(self, msg :str, minRequiredMode = LogMode.Default, indent = 0) -> None: - """ - Prints and formats a message only when the UnitTester instance is set to a level of detail at least equal - to a minimum requirement, given as input. - - Args: - msg : the message to print. - minRequiredMode : minimum detail requirement. - indent : formatting information, counter from 0 that adds 2 spaces each number up - - Returns: - None - """ - if self.logMode.isMoreVerbose(minRequiredMode): print(" " * indent + msg) - - def testFunction(self, name :str) -> TestResult: - """ - Perform all unit tests relative to the same function, plus the surrounding logs and checks. - - Args: - name : the name of the tested function. - - Returns : - TestResult : the overall Ok result of all the tests passing or the first Err. This behaviour is unrelated - to that of the overall testing procedure (stopOnFail), it always works like this for tests about the - same function. - """ - self.log(f"Unit testing {name}...", indent = 1) - - allPassed = True - for unitTest in self.funcTests[name]: - testRes = unitTest.test() - self.logTestResult(testRes) - if testRes.isPass: continue - - allPassed = False - if self.stopOnFail: break - - self.log("", LogMode.Detailed) # Provides one extra newline of space when needed, to better format the output - if allPassed: return TestResult.Ok(name) - - if self.logMode is LogMode.Default: self.log("") - return TestResult.Err(f"Unlogged err", "unit test failed", name) - - def testModule(self) -> None: - """ - Runs all the provided unit tests in order but on a per-function basis. - - Returns: - None - """ - self.log(f"Unit testing module {self.moduleName}...", LogMode.Minimal) - - fails = 0 - testStatusMsg = "complete" - for funcName in self.funcTests.keys(): - if self.testFunction(funcName).isPass: continue - fails += 1 - - if self.stopOnFail: - testStatusMsg = "interrupted" - break - - self.log(f"Testing {testStatusMsg}: {fails} problem{'s' * (fails != 1)} found.\n", LogMode.Minimal) - # ^^^ Manually applied an extra newline of space. - -## Unit testing all the modules: -def unit_cobraxy() -> None: - import cobraxy as m - import math - import lxml.etree as ET - import utils.general_utils as utils - - #m.ARGS = m.process_args() - - ids = ["react1", "react2", "react3", "react4", "react5"] - metabMap = utils.Model.ENGRO2.getMap() - class_pat = { - "dataset1" :[ - [2.3, 4, 7, 0, 0.01, math.nan, math.nan], - [math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan], - [2.3, 4, 7, 0, 0.01, 5, 9], - [math.nan, math.nan, 2.3, 4, 7, 0, 0.01], - [2.3, 4, 7, math.nan, 2.3, 0, 0.01]], - - "dataset2" :[ - [2.3, 4, 7, math.nan, 2.3, 0, 0.01], - [2.3, 4, 7, 0, 0.01, math.nan, math.nan], - [math.nan, math.nan, 2.3, 4, 7, 0, 0.01], - [2.3, 4, 7, 0, 0.01, 5, 9], - [math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan]] - } - - unitTester = UnitTester("cobraxy", LogMode.Pedantic, False, - UnitTest(m.name_dataset, ["customName", 12], ExactValue("customName")), - UnitTest(m.name_dataset, ["Dataset", 12], ExactValue("Dataset_12")), - - UnitTest(m.fold_change, [0.5, 0.5], ExactValue(0.0)), - UnitTest(m.fold_change, [0, 0.35], ExactValue("-INF")), - UnitTest(m.fold_change, [0.5, 0], ExactValue("INF")), - UnitTest(m.fold_change, [0, 0], ExactValue(0)), - - UnitTest( - m.Arrow(m.Arrow.MAX_W, m.ArrowColor.DownRegulated, isDashed = True).toStyleStr, [], - ExactValue(";stroke:#0000FF;stroke-width:12;stroke-dasharray:5,5")), - - UnitTest(m.computeEnrichment, [metabMap, class_pat, ids], ExactValue(None)), - - UnitTest(m.computePValue, [class_pat["dataset1"][0], class_pat["dataset2"][0]], SatisfiesPredicate(math.isnan)), - - UnitTest(m.reactionIdIsDirectional, ["reactId"], ExactValue(m.ReactionDirection.Unknown)), - UnitTest(m.reactionIdIsDirectional, ["reactId_F"], ExactValue(m.ReactionDirection.Direct)), - UnitTest(m.reactionIdIsDirectional, ["reactId_B"], ExactValue(m.ReactionDirection.Inverse)), - - UnitTest(m.ArrowColor.fromFoldChangeSign, [-2], ExactValue(m.ArrowColor.DownRegulated)), - UnitTest(m.ArrowColor.fromFoldChangeSign, [2], ExactValue(m.ArrowColor.UpRegulated)), - - UnitTest( - m.Arrow(m.Arrow.MAX_W, m.ArrowColor.UpRegulated).styleReactionElements, - [metabMap, "reactId"], - ExactValue(None)), - - UnitTest(m.getArrowBodyElementId, ["reactId"], ExactValue("R_reactId")), - UnitTest(m.getArrowBodyElementId, ["reactId_F"], ExactValue("R_reactId")), - - UnitTest( - m.getArrowHeadElementId, ["reactId"], - Many(ExactValue("F_reactId"), ExactValue("B_reactId"))), - - UnitTest( - m.getArrowHeadElementId, ["reactId_F"], - Many(ExactValue("F_reactId"), ExactValue(""))), - - UnitTest( - m.getArrowHeadElementId, ["reactId_B"], - Many(ExactValue("B_reactId"), ExactValue(""))), - - UnitTest( - m.getElementById, ["reactId_F", metabMap], - SatisfiesPredicate(lambda res : res.isErr and isinstance(res.value, utils.Result.ResultErr))), - - UnitTest( - m.getElementById, ["F_tyr_L_t", metabMap], - SatisfiesPredicate(lambda res : res.isOk and res.unwrap().get("id") == "F_tyr_L_t")), - ).testModule() - -def unit_rps_generator() -> None: - import rps_generator as rps - import math - import pandas as pd - import utils.general_utils as utils - dataset = pd.DataFrame({ - "cell lines" : ["normal", "cancer"], - "pyru_vate" : [5.3, 7.01], - "glu,cose" : [8.2, 4.0], - "unknown" : [3.0, 3.97], - "()atp" : [7.05, 8.83], - }) - - abundancesNormalRaw = { - "pyru_vate" : 5.3, - "glu,cose" : 8.2, - "unknown" : 3.0, - "()atp" : 7.05, - } - - abundancesNormal = { - "pyr" : 5.3, - "glc__D" : 8.2, - "atp" : 7.05, - } - - # TODO: this currently doesn't work due to "the pickle extension problem", see FileFormat class for details. - synsDict = utils.readPickle(utils.FilePath("synonyms", utils.FileFormat.PICKLE, prefix = "./local/pickle files")) - - reactionsDict = { - "r1" : { - "glc__D" : 1 - }, - - "r2" : { - "co2" : 2, - "pyr" : 3, - }, - - "r3" : { - "atp" : 2, - "glc__D" : 4, - }, - - "r4" : { - "atp" : 3, - } - } - - abundancesNormalEdited = { - "pyr" : 5.3, - "glc__D" : 8.2, - "atp" : 7.05, - "co2" : 1, - } - - blackList = ["atp"] # No jokes allowed! - missingInDataset = ["co2"] - - normalRpsShape = MatchingShape({ - "r1" : ExactValue(8.2 ** 1), - "r2" : ExactValue((1 ** 2) * (5.3 ** 3)), - "r3" : ExactValue((8.2 ** 4) * (7.05 ** 2)), - "r4" : SatisfiesPredicate(lambda n : math.isnan(n)) - }, "rps dict") - - UnitTester("rps_generator", LogMode.Pedantic, False, - UnitTest(rps.get_abund_data, [dataset, 0], MatchingShape({ - "pyru_vate" : ExactValue(5.3), - "glu,cose" : ExactValue(8.2), - "unknown" : ExactValue(3.0), - "()atp" : ExactValue(7.05), - "name" : ExactValue("normal") - }, "abundance series")), - - UnitTest(rps.get_abund_data, [dataset, 1], MatchingShape({ - "pyru_vate" : ExactValue(7.01), - "glu,cose" : ExactValue(4.0), - "unknown" : ExactValue(3.97), - "()atp" : ExactValue(8.83), - "name" : ExactValue("cancer") - }, "abundance series")), - - UnitTest(rps.get_abund_data, [dataset, -1], ExactValue(None)), - - UnitTest(rps.check_missing_metab, [reactionsDict, abundancesNormal.copy()], Many(MatchingShape({ - "pyr" : ExactValue(5.3), - "glc__D" : ExactValue(8.2), - "atp" : ExactValue(7.05), - "co2" : ExactValue(1) - }, "updated abundances"), Many(ExactValue("co2")))), - - UnitTest(rps.clean_metabolite_name, ["4,4'-diphenylmethane diisocyanate"], ExactValue("44diphenylmethanediisocyanate")), - - UnitTest(rps.get_metabolite_id, ["tryptophan", synsDict], ExactValue("trp__L")), - - UnitTest(rps.calculate_rps, [reactionsDict, abundancesNormalEdited, blackList, missingInDataset], normalRpsShape), - - UnitTest(rps.rps_for_cell_lines, [dataset, reactionsDict, blackList, synsDict, "", True], Many(normalRpsShape, MatchingShape({ - "r1" : ExactValue(4.0 ** 1), - "r2" : ExactValue((1 ** 2) * (7.01 ** 3)), - "r3" : ExactValue((4.0 ** 4) * (8.83 ** 2)), - "r4" : SatisfiesPredicate(lambda n : math.isnan(n)) - }, "rps dict"))), - - #UnitTest(rps.main, [], ExactValue(None)) # Complains about sys argvs - ).testModule() - -def unit_custom_data_generator() -> None: - import custom_data_generator as cdg - - UnitTester("custom data generator", LogMode.Pedantic, False, - UnitTest(lambda :True, [], ExactValue(True)), # No tests can be done without a model at hand! - ).testModule() - -def unit_utils() -> None: - import utils.general_utils as utils - import utils.rule_parsing as ruleUtils - import utils.reaction_parsing as reactionUtils - - UnitTester("utils", LogMode.Pedantic, False, - UnitTest(utils.CustomErr, ["myMsg", "more details"], MatchingShape({ - "details" : ExactValue("more details"), - "msg" : ExactValue("myMsg"), - "id" : ExactValue(0) # this will fail if any custom errors happen anywhere else before! - })), - - UnitTest(utils.CustomErr, ["myMsg", "more details", 42], MatchingShape({ - "details" : ExactValue("more details"), - "msg" : ExactValue("myMsg"), - "id" : ExactValue(42) - })), - - UnitTest(utils.Bool("someArg").check, ["TrUe"], ExactValue(True)), - UnitTest(utils.Bool("someArg").check, ["FALse"], ExactValue(False)), - UnitTest(utils.Bool("someArg").check, ["foo"], Exists(False)), # should panic! - - UnitTest(utils.Model.ENGRO2.getRules, ["."], IsOfType(dict)), - UnitTest(utils.Model.Custom.getRules, [".", ""], Exists(False)), # expected panic - - # rule utilities tests: - UnitTest(ruleUtils.parseRuleToNestedList, ["A"], Many(ExactValue("A"))), - UnitTest(ruleUtils.parseRuleToNestedList, ["A or B"], Many(ExactValue("A"), ExactValue("B"))), - UnitTest(ruleUtils.parseRuleToNestedList, ["A and B"], Many(ExactValue("A"), ExactValue("B"))), - UnitTest(ruleUtils.parseRuleToNestedList, ["A foo B"], Exists(False)), # expected panic - UnitTest(ruleUtils.parseRuleToNestedList, ["A)"], Exists(False)), # expected panic - - UnitTest( - ruleUtils.parseRuleToNestedList, ["A or B"], - MatchingShape({ "op" : ExactValue(ruleUtils.RuleOp.OR)})), - - UnitTest( - ruleUtils.parseRuleToNestedList, ["A and B"], - MatchingShape({ "op" : ExactValue(ruleUtils.RuleOp.AND)})), - - UnitTest( - ruleUtils.parseRuleToNestedList, ["A or B and C"], - MatchingShape({ "op" : ExactValue(ruleUtils.RuleOp.OR)})), - - UnitTest( - ruleUtils.parseRuleToNestedList, ["A or B and C or (D and E)"], - Many( - ExactValue("A"), - Many(ExactValue("B"), ExactValue("C")), - Many(ExactValue("D"), ExactValue("E")) - )), - - UnitTest(lambda s : ruleUtils.RuleOp(s), ["or"], ExactValue(ruleUtils.RuleOp.OR)), - UnitTest(lambda s : ruleUtils.RuleOp(s), ["and"], ExactValue(ruleUtils.RuleOp.AND)), - UnitTest(lambda s : ruleUtils.RuleOp(s), ["foo"], Exists(False)), # expected panic - - UnitTest(ruleUtils.RuleOp.isOperator, ["or"], ExactValue(True)), - UnitTest(ruleUtils.RuleOp.isOperator, ["and"], ExactValue(True)), - UnitTest(ruleUtils.RuleOp.isOperator, ["foo"], ExactValue(False)), - - # reaction utilities tests: - UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp <=> adp + pi"], ExactValue(reactionUtils.ReactionDir.REVERSIBLE)), - UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp --> adp + pi"], ExactValue(reactionUtils.ReactionDir.FORWARD)), - UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp <-- adp + pi"], ExactValue(reactionUtils.ReactionDir.BACKWARD)), - UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp ??? adp + pi"], Exists(False)), # should panic - - UnitTest( - reactionUtils.create_reaction_dict, - [{'shdgd': '2 pyruvate + 1 h2o <=> 1 h2o + 2 acetate', 'sgwrw': '2 co2 + 6 h2o --> 3 atp'}], - MatchingShape({ - "shdgd_B" : MatchingShape({ - "acetate" : ExactValue(2), - "h2o" : ExactValue(1), - }), - - "shdgd_F" : MatchingShape({ - "pyruvate" : ExactValue(2), - "h2o" : ExactValue(1) - }), - - "sgwrw" : MatchingShape({ - "co2" : ExactValue(2), - "h2o" : ExactValue(6), - }) - }, "reaction dict")), - ).testModule() - - rule = "A and B or C or D and (E or F and G) or H" - print(f"rule \"{rule}\" should comes out as: {ruleUtils.parseRuleToNestedList(rule)}") - -def unit_ras_generator() -> None: - import ras_generator as ras - import utils.rule_parsing as ruleUtils - - # Making an alias to mask the name of the inner function and separate the 2 tests: - def opListAlias(op_list, dataset): - ras.ARGS.none = False - return ras.ras_op_list(op_list, dataset) - - ras.ARGS = ras.process_args() - rule = ruleUtils.OpList(ruleUtils.RuleOp.AND) - rule.extend(["foo", "bar", "baz"]) - - dataset = { "foo" : 5, "bar" : 2, "baz" : None } - - UnitTester("ras generator", LogMode.Pedantic, False, - UnitTest(ras.ras_op_list, [rule, dataset], ExactValue(2)), - UnitTest(opListAlias, [rule, dataset], ExactValue(None)), - ).testModule() - -if __name__ == "__main__": - unit_cobraxy() - unit_custom_data_generator() - unit_utils() - unit_ras_generator() \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/COBRAxy/test/testing.py Tue Oct 28 10:44:07 2025 +0000 @@ -0,0 +1,811 @@ +# This is a general-purpose "testing utilities" module for the COBRAxy tool. +# This code was written entirely by m.ferrari133@campus.unimib.it and then (hopefully) many +# more people contributed by writing tests for this tool's modules, feel free to send an email for +# any questions. + +# How the testing module works: +# The testing module allows you to easily set up unit tests for functions in a module, obtaining +# information on what each method returns, when and how it fails and so on. + +# How do I test a module? +# - create a function at the very bottom, before the __main__ +# - import the stuff you need +# - create a UnitTester instance, follow the documentation +# - fill it up with UnitTest instances, follow the documentation +# - each UnitTest tests the function by passing specific parameters to it and by veryfing the correctness +# of the output via a CheckingMode instance +# - call testModule() on the UnitTester + +# TODO(s): +# - This module was written before the utilities were introduced, it may want to use some of those functions. +# - I never got around to writing a CheckingMode for methods you WANT to fail in certain scenarios, I +# like the name "MustPanic". +# - It's good practice to enforce boolean arguments of a function to be passed as kwargs and I did it a lot +# in the code I wrote for these tool's modules, but the current implementation of UnitTest doesn't allow +# you to pass kwargs to the functions you test. +# - Implement integration tests as well, maybe! + +## Imports: +import sys +import os +from typing import Dict, Callable, Type, List +from enum import Enum, auto +from collections.abc import Iterable + +# Add src directory to path to allow imports +sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'src')) + +## Generic utilities: +class TestResult: + """ + Represents the result of a test and contains all the relevant information about it. Loosely models two variants: + - Ok: The test passed, no further information is saved besides the target's name. + - Err: The test failed, an error message and further contextual details are also saved. + + This class does not ensure a static proof of the two states' behaviour, their meaning or mutual exclusivity outside + of the :bool property "isPass", meant for outside reads. + """ + def __init__(self, isPass :bool, targetName :str, errMsg = "", details = "") -> None: + """ + (Private) Initializes an instance of TestResult. + + Args: + isPass : distinction between TestResult.Ok (True) and TestResult.Err (False). + targetName : the name of the target object / property / function / module being tested, not always set + to a meaningful value at this stage. + + errMsg : concise error message explaining the test's failure. + details : contextual details about the error. + + Returns: + None : practically, a TestResult instance. + """ + self.isPass = isPass + self.isFail = not isPass # Convenience above all + + self.targetName = targetName + if isPass: return + + self.errMsg = errMsg + self.details = details + + @classmethod + def Ok(cls, targetName = "") -> "TestResult": + """ + Factory method for TestResult.Ok, where all we need to know is that our test passed. + + Args: + targetName : the name of the target object / property / function / module being tested, not always set + to a meaningful value at this stage. + + Returns: + TestResult : a new Ok instance. + """ + return cls(True, targetName) + + @classmethod + def Err(cls, errMsg :str, details :str, targetName = "") -> "TestResult": + """ + Factory method for TestResult.Err, where we store relevant error information. + + Args: + errMsg : concise error message explaining the test's failure. + details : contextual details about the error. + targetName : the name of the target object / property / function / module being tested, not always set + to a meaningful value at this stage. + + Returns: + TestResult : a new Err instance. + """ + return cls(False, targetName, errMsg, details) + + def log(self, isCompact = True) -> str: + """ + Dumps all the available information in a :str, ready for logging. + + Args: + isCompact : if True limits the amount of information displayed to the targetName. + + Returns: + str : information about this test result. + + """ + if isCompact: + return f"{TestResult.__name__}::{'Ok' if self.isPass else 'Err'}(Unit test on {self.targetName})" + + logMsg = f"Unit test on {self.targetName} {'passed' if self.isPass else f'failed because {self.errMsg}'}" + if self.details: logMsg += f", {self.details}" + return logMsg + + def throw(self) -> None: + #TODO: finer Exception typing would be desirable + """ + Logs the result information and panics. + + Raises: + Exception : an error containing log information about the test result. + + Returns: + None + + """ + raise Exception(self.log()) + +class CheckingMode: + """ + (Private) Represents a way to check a value for correctness, in the context of "testing" it. + """ + + def __init__(self) -> None: + """ + (Private) Implemented on child classes, initializes an instance of CheckingMode. + + Returns: + None : practically, a CheckingMode instance. + """ + self.logMsg = "CheckingMode base class should not be used directly" + + def __checkPasses__(self, _) -> bool: + """ + (Private) Implemented on child classes, performs the actual correctness check on a received value. + + Returns: + bool : True if the check passed, False if it failed. + """ + return True + + def check(self, value) -> TestResult: + """ + Converts the :bool evaluation of the value's correctness to a TestResult. + + Args: + value : the value to check. + + Returns: + TestResult : the result of the check. + """ + return TestResult.Ok() if self.__checkPasses__(value) else TestResult.Err(self.logMsg, f"got {value} instead") + + def __repr__(self) -> str: + """ + (Private) Implemented on child classes, formats :object as :str. + """ + return self.__class__.__name__ + +class ExactValue(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value needs to match another exactly. + """ + + #I suggest solving the more complex equality checking edge cases with the "Satisfies" and "MatchingShape" variants. + def __init__(self, value) -> None: + self.value = value + self.logMsg = f"value needed to match {value} exactly" + + def __checkPasses__(self, value) -> bool: + return self.value == value + + def __repr__(self) -> str: + return f"{super().__repr__()}({self.value})" + +class AcceptedValues(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value needs to appear in a list of accepted values. + """ + def __init__(self, *values) -> None: + self.values = values + self.logMsg = f"value needed to be one of these: {values}" + + def __checkPasses__(self, value) -> bool: + return value in self.values + + def __repr__(self) -> str: + return f"{super().__repr__()}{self.values}" + +class SatisfiesPredicate(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value needs to verify a given predicate, as in + the predicate accepts it as input and returns True. + """ + def __init__(self, pred :Callable[..., bool], predName = "") -> None: + self.pred = pred + self.logMsg = f"value needed to verify a predicate{bool(predName) * f' called {predName}'}" + + def __checkPasses__(self, *params) -> bool: + return self.pred(*params) + + def __repr__(self) -> str: + return f"{super().__repr__()}(T) -> bool" + +class IsOfType(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value needs to be of a certain type. + """ + def __init__(self, type :Type) -> None: + self.type = type + self.logMsg = f"value needed to be of type {type.__name__}" + + def __checkPasses__(self, value :Type) -> bool: + return isinstance(value, self.type) + + def __repr__(self) -> str: + return f"{super().__repr__()}:{self.type.__name__}" + +class Exists(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value needs to exist (or not!). Mainly employed as a quick default + check that always passes, it still upholds its contract when it comes to checking for existing properties in objects + without much concern on what value they contain. + """ + def __init__(self, exists = True) -> None: + self.exists = exists + self.logMsg = f"value needed to {(not exists) * 'not '}exist" + + def __checkPasses__(self, _) -> bool: return self.exists + + def __repr__(self) -> str: + return f"{super().__repr__() if self.exists else 'IsMissing'}" + +class MatchingShape(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value is an object that needs to have a certain shape, + as in to posess properties with a given name and value. Each property is checked for existance and correctness with + its own given CheckingMode. + """ + def __init__(self, props :Dict[str, CheckingMode], objName = "") -> None: + """ + (Private) Initializes an instance of MatchingShape. + + Args: + props : :dict using property names as keys and checking modes for the property's value as values. + objName : label for the object we're testing the shape of. + + Returns: + None : practically, a MatchingShape instance. + """ + self.props = props + self.objName = objName + + self.shapeRepr = " {\n" + "\n".join([f" {propName} : {prop}" for propName, prop in props.items()]) + "\n}" + + def check(self, obj :object) -> TestResult: + objIsDict = isinstance(obj, dict) # Python forces us to distinguish between object properties and dict keys + for propName, checkingMode in self.props.items(): + # Checking if the property exists: + if (not objIsDict and not hasattr(obj, propName)) or (objIsDict and propName not in obj): + if not isinstance(checkingMode, Exists): return TestResult.Err( + f"property \"{propName}\" doesn't exist on object {self.objName}", "", self.objName) + + if not checkingMode.exists: return TestResult.Ok(self.objName) + # Either the property value is meant to be checked (checkingMode is anything but Exists) + # or we want the property to not exist, all other cases are handled correctly ahead + + checkRes = checkingMode.check(obj[propName] if objIsDict else getattr(obj, propName)) + if checkRes.isPass: continue + + checkRes.targetName = self.objName + return TestResult.Err( + f"property \"{propName}\" failed check {checkingMode} on shape {obj}", + checkRes.log(isCompact = False), + self.objName) + + return TestResult.Ok(self.objName) + + def __repr__(self) -> str: + return super().__repr__() + self.shapeRepr + +class Many(CheckingMode): + """ + CheckingMode subclass variant to be used when the checked value is an Iterable we want to check item by item. + """ + def __init__(self, *values :CheckingMode) -> None: + self.values = values + self.shapeRepr = " [\n" + "\n".join([f" {value}" for value in values]) + "\n]" + + def check(self, coll :Iterable) -> TestResult: + amt = len(coll) + expectedAmt = len(self.values) + # Length equality is forced: + if amt != expectedAmt: return TestResult.Err( + "items' quantities don't match", f"expected {expectedAmt} items, but got {amt}") + + # Items in the given collection value are paired in order with the corresponding checkingMode meant for each of them + for item, checkingMode in zip(coll, self.values): + checkRes = checkingMode.check(item) + if checkRes.isFail: return TestResult.Err( + f"item in list failed check {checkingMode}", + checkRes.log(isCompact = False)) + + return TestResult.Ok() + + def __repr__(self) -> str: + return super().__repr__() + self.shapeRepr + +class LogMode(Enum): + """ + Represents the level of detail of a logged message. Models 4 variants, in order of increasing detail: + - Minimal : Logs the overall test result for the entire module. + - Default : Also logs all single test fails, in compact mode. + - Detailed : Logs all function test results, in compact mode. + - Pedantic : Also logs all single test results in detailed mode. + """ + Minimal = auto() + Default = auto() + Detailed = auto() + Pedantic = auto() + + def isMoreVerbose(self, requiredMode :"LogMode") -> bool: + """ + Compares the instance's level of detail with that of another. + + Args: + requiredMode : the other instance. + + Returns: + bool : True if the caller instance is a more detailed variant than the other. + """ + return self.value >= requiredMode.value + +## Specific Unit Testing utilities: +class UnitTest: + """ + Represents a unit test, the test of a single function's isolated correctness. + """ + def __init__(self, func :Callable, inputParams :list, expectedRes :CheckingMode) -> None: + """ + (Private) Initializes an instance of UnitTest. + + Args: + func : the function to test. + inputParams : list of parameters to pass as inputs to the function, in order. + expectedRes : checkingMode to test the function's return value for correctness. + + Returns: + None : practically, a UnitTest instance. + """ + self.func = func + self.inputParams = inputParams + self.expectedRes = expectedRes + + self.funcName = func.__name__ + + def test(self) -> TestResult: + """ + Tests the function. + + Returns: + TestResult : the test's result. + """ + result = None + try: result = self.func(*self.inputParams) + except Exception as e: return TestResult.Err("the function panicked at runtime", e, self.funcName) + + checkRes = self.expectedRes.check(result) + checkRes.targetName = self.funcName + return checkRes + +class UnitTester: + """ + Manager class for unit testing an entire module, groups single UnitTests together and executes them in order on a + per-function basis (tests about the same function are executed consecutively) giving back as much information as + possible depending on the selected logMode. More customization options are available. + """ + def __init__(self, moduleName :str, logMode = LogMode.Default, stopOnFail = True, *funcTests :'UnitTest') -> None: + """ + (Private) initializes an instance of UnitTester. + + Args: + moduleName : name of the tested module. + logMode : level of detail applied to all messages logged during the test. + stopOnFail : if True, the test stops entirely after one unit test fails. + funcTests : the unit tests to perform on the module. + + Returns: + None : practically, a UnitTester instance. + """ + self.logMode = logMode + self.moduleName = moduleName + self.stopOnFail = stopOnFail + + # This ensures the per-function order: + self.funcTests :Dict[str, List[UnitTest]]= {} + for test in funcTests: + if test.funcName in self.funcTests: self.funcTests[test.funcName].append(test) + else: self.funcTests[test.funcName] = [test] + + def logTestResult(self, testRes :TestResult) -> None: + """ + Prints the formatted result information of a unit test. + + Args: + testRes : the result of the test. + + Returns: + None + """ + if testRes.isPass: return self.log("Passed!", LogMode.Detailed, indent = 2) + + failMsg = "Failed! " + # Doing it this way prevents .log computations when not needed + if self.logMode.isMoreVerbose(LogMode.Detailed): + # Given that Pedantic is the most verbose variant, there's no point in comparing with LogMode.isMoreVerbose + failMsg += testRes.log(self.logMode is not LogMode.Pedantic) + + self.log(failMsg, indent = 2) + + def log(self, msg :str, minRequiredMode = LogMode.Default, indent = 0) -> None: + """ + Prints and formats a message only when the UnitTester instance is set to a level of detail at least equal + to a minimum requirement, given as input. + + Args: + msg : the message to print. + minRequiredMode : minimum detail requirement. + indent : formatting information, counter from 0 that adds 2 spaces each number up + + Returns: + None + """ + if self.logMode.isMoreVerbose(minRequiredMode): print(" " * indent + msg) + + def testFunction(self, name :str) -> TestResult: + """ + Perform all unit tests relative to the same function, plus the surrounding logs and checks. + + Args: + name : the name of the tested function. + + Returns : + TestResult : the overall Ok result of all the tests passing or the first Err. This behaviour is unrelated + to that of the overall testing procedure (stopOnFail), it always works like this for tests about the + same function. + """ + self.log(f"Unit testing {name}...", indent = 1) + + allPassed = True + for unitTest in self.funcTests[name]: + testRes = unitTest.test() + self.logTestResult(testRes) + if testRes.isPass: continue + + allPassed = False + if self.stopOnFail: break + + self.log("", LogMode.Detailed) # Provides one extra newline of space when needed, to better format the output + if allPassed: return TestResult.Ok(name) + + if self.logMode is LogMode.Default: self.log("") + return TestResult.Err(f"Unlogged err", "unit test failed", name) + + def testModule(self) -> None: + """ + Runs all the provided unit tests in order but on a per-function basis. + + Returns: + None + """ + self.log(f"Unit testing module {self.moduleName}...", LogMode.Minimal) + + fails = 0 + testStatusMsg = "complete" + for funcName in self.funcTests.keys(): + if self.testFunction(funcName).isPass: continue + fails += 1 + + if self.stopOnFail: + testStatusMsg = "interrupted" + break + + self.log(f"Testing {testStatusMsg}: {fails} problem{'s' * (fails != 1)} found.\n", LogMode.Minimal) + # ^^^ Manually applied an extra newline of space. + +## Unit testing all the modules: +def unit_cobraxy() -> None: + import cobraxy as m + import math + import lxml.etree as ET + import utils.general_utils as utils + + #m.ARGS = m.process_args() + + ids = ["react1", "react2", "react3", "react4", "react5"] + metabMap = utils.Model.ENGRO2.getMap() + class_pat = { + "dataset1" :[ + [2.3, 4, 7, 0, 0.01, math.nan, math.nan], + [math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan], + [2.3, 4, 7, 0, 0.01, 5, 9], + [math.nan, math.nan, 2.3, 4, 7, 0, 0.01], + [2.3, 4, 7, math.nan, 2.3, 0, 0.01]], + + "dataset2" :[ + [2.3, 4, 7, math.nan, 2.3, 0, 0.01], + [2.3, 4, 7, 0, 0.01, math.nan, math.nan], + [math.nan, math.nan, 2.3, 4, 7, 0, 0.01], + [2.3, 4, 7, 0, 0.01, 5, 9], + [math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan]] + } + + unitTester = UnitTester("cobraxy", LogMode.Pedantic, False, + UnitTest(m.name_dataset, ["customName", 12], ExactValue("customName")), + UnitTest(m.name_dataset, ["Dataset", 12], ExactValue("Dataset_12")), + + UnitTest(m.fold_change, [0.5, 0.5], ExactValue(0.0)), + UnitTest(m.fold_change, [0, 0.35], ExactValue("-INF")), + UnitTest(m.fold_change, [0.5, 0], ExactValue("INF")), + UnitTest(m.fold_change, [0, 0], ExactValue(0)), + + UnitTest( + m.Arrow(m.Arrow.MAX_W, m.ArrowColor.DownRegulated, isDashed = True).toStyleStr, [], + ExactValue(";stroke:#0000FF;stroke-width:12;stroke-dasharray:5,5")), + + UnitTest(m.computeEnrichment, [metabMap, class_pat, ids], ExactValue(None)), + + UnitTest(m.computePValue, [class_pat["dataset1"][0], class_pat["dataset2"][0]], SatisfiesPredicate(math.isnan)), + + UnitTest(m.reactionIdIsDirectional, ["reactId"], ExactValue(m.ReactionDirection.Unknown)), + UnitTest(m.reactionIdIsDirectional, ["reactId_F"], ExactValue(m.ReactionDirection.Direct)), + UnitTest(m.reactionIdIsDirectional, ["reactId_B"], ExactValue(m.ReactionDirection.Inverse)), + + UnitTest(m.ArrowColor.fromFoldChangeSign, [-2], ExactValue(m.ArrowColor.DownRegulated)), + UnitTest(m.ArrowColor.fromFoldChangeSign, [2], ExactValue(m.ArrowColor.UpRegulated)), + + UnitTest( + m.Arrow(m.Arrow.MAX_W, m.ArrowColor.UpRegulated).styleReactionElements, + [metabMap, "reactId"], + ExactValue(None)), + + UnitTest(m.getArrowBodyElementId, ["reactId"], ExactValue("R_reactId")), + UnitTest(m.getArrowBodyElementId, ["reactId_F"], ExactValue("R_reactId")), + + UnitTest( + m.getArrowHeadElementId, ["reactId"], + Many(ExactValue("F_reactId"), ExactValue("B_reactId"))), + + UnitTest( + m.getArrowHeadElementId, ["reactId_F"], + Many(ExactValue("F_reactId"), ExactValue(""))), + + UnitTest( + m.getArrowHeadElementId, ["reactId_B"], + Many(ExactValue("B_reactId"), ExactValue(""))), + + UnitTest( + m.getElementById, ["reactId_F", metabMap], + SatisfiesPredicate(lambda res : res.isErr and isinstance(res.value, utils.Result.ResultErr))), + + UnitTest( + m.getElementById, ["F_tyr_L_t", metabMap], + SatisfiesPredicate(lambda res : res.isOk and res.unwrap().get("id") == "F_tyr_L_t")), + ).testModule() + +def unit_rps_generator() -> None: + import rps_generator as rps + import math + import pandas as pd + import utils.general_utils as utils + dataset = pd.DataFrame({ + "cell lines" : ["normal", "cancer"], + "pyru_vate" : [5.3, 7.01], + "glu,cose" : [8.2, 4.0], + "unknown" : [3.0, 3.97], + "()atp" : [7.05, 8.83], + }) + + abundancesNormalRaw = { + "pyru_vate" : 5.3, + "glu,cose" : 8.2, + "unknown" : 3.0, + "()atp" : 7.05, + } + + abundancesNormal = { + "pyr" : 5.3, + "glc__D" : 8.2, + "atp" : 7.05, + } + + # TODO: this currently doesn't work due to "the pickle extension problem", see FileFormat class for details. + synsDict = utils.readPickle(utils.FilePath("synonyms", utils.FileFormat.PICKLE, prefix = "./local/pickle files")) + + reactionsDict = { + "r1" : { + "glc__D" : 1 + }, + + "r2" : { + "co2" : 2, + "pyr" : 3, + }, + + "r3" : { + "atp" : 2, + "glc__D" : 4, + }, + + "r4" : { + "atp" : 3, + } + } + + abundancesNormalEdited = { + "pyr" : 5.3, + "glc__D" : 8.2, + "atp" : 7.05, + "co2" : 1, + } + + blackList = ["atp"] # No jokes allowed! + missingInDataset = ["co2"] + + normalRpsShape = MatchingShape({ + "r1" : ExactValue(8.2 ** 1), + "r2" : ExactValue((1 ** 2) * (5.3 ** 3)), + "r3" : ExactValue((8.2 ** 4) * (7.05 ** 2)), + "r4" : SatisfiesPredicate(lambda n : math.isnan(n)) + }, "rps dict") + + UnitTester("rps_generator", LogMode.Pedantic, False, + UnitTest(rps.get_abund_data, [dataset, 0], MatchingShape({ + "pyru_vate" : ExactValue(5.3), + "glu,cose" : ExactValue(8.2), + "unknown" : ExactValue(3.0), + "()atp" : ExactValue(7.05), + "name" : ExactValue("normal") + }, "abundance series")), + + UnitTest(rps.get_abund_data, [dataset, 1], MatchingShape({ + "pyru_vate" : ExactValue(7.01), + "glu,cose" : ExactValue(4.0), + "unknown" : ExactValue(3.97), + "()atp" : ExactValue(8.83), + "name" : ExactValue("cancer") + }, "abundance series")), + + UnitTest(rps.get_abund_data, [dataset, -1], ExactValue(None)), + + UnitTest(rps.check_missing_metab, [reactionsDict, abundancesNormal.copy()], Many(MatchingShape({ + "pyr" : ExactValue(5.3), + "glc__D" : ExactValue(8.2), + "atp" : ExactValue(7.05), + "co2" : ExactValue(1) + }, "updated abundances"), Many(ExactValue("co2")))), + + UnitTest(rps.clean_metabolite_name, ["4,4'-diphenylmethane diisocyanate"], ExactValue("44diphenylmethanediisocyanate")), + + UnitTest(rps.get_metabolite_id, ["tryptophan", synsDict], ExactValue("trp__L")), + + UnitTest(rps.calculate_rps, [reactionsDict, abundancesNormalEdited, blackList, missingInDataset], normalRpsShape), + + UnitTest(rps.rps_for_cell_lines, [dataset, reactionsDict, blackList, synsDict, "", True], Many(normalRpsShape, MatchingShape({ + "r1" : ExactValue(4.0 ** 1), + "r2" : ExactValue((1 ** 2) * (7.01 ** 3)), + "r3" : ExactValue((4.0 ** 4) * (8.83 ** 2)), + "r4" : SatisfiesPredicate(lambda n : math.isnan(n)) + }, "rps dict"))), + + #UnitTest(rps.main, [], ExactValue(None)) # Complains about sys argvs + ).testModule() + +def unit_custom_data_generator() -> None: + import custom_data_generator as cdg + + UnitTester("custom data generator", LogMode.Pedantic, False, + UnitTest(lambda :True, [], ExactValue(True)), # No tests can be done without a model at hand! + ).testModule() + +def unit_utils() -> None: + import utils.general_utils as utils + import utils.rule_parsing as ruleUtils + import utils.reaction_parsing as reactionUtils + + UnitTester("utils", LogMode.Pedantic, False, + UnitTest(utils.CustomErr, ["myMsg", "more details"], MatchingShape({ + "details" : ExactValue("more details"), + "msg" : ExactValue("myMsg"), + "id" : ExactValue(0) # this will fail if any custom errors happen anywhere else before! + })), + + UnitTest(utils.CustomErr, ["myMsg", "more details", 42], MatchingShape({ + "details" : ExactValue("more details"), + "msg" : ExactValue("myMsg"), + "id" : ExactValue(42) + })), + + UnitTest(utils.Bool("someArg").check, ["TrUe"], ExactValue(True)), + UnitTest(utils.Bool("someArg").check, ["FALse"], ExactValue(False)), + UnitTest(utils.Bool("someArg").check, ["foo"], Exists(False)), # should panic! + + UnitTest(utils.Model.ENGRO2.getRules, ["."], IsOfType(dict)), + UnitTest(utils.Model.Custom.getRules, [".", ""], Exists(False)), # expected panic + + # rule utilities tests: + UnitTest(ruleUtils.parseRuleToNestedList, ["A"], Many(ExactValue("A"))), + UnitTest(ruleUtils.parseRuleToNestedList, ["A or B"], Many(ExactValue("A"), ExactValue("B"))), + UnitTest(ruleUtils.parseRuleToNestedList, ["A and B"], Many(ExactValue("A"), ExactValue("B"))), + UnitTest(ruleUtils.parseRuleToNestedList, ["A foo B"], Exists(False)), # expected panic + UnitTest(ruleUtils.parseRuleToNestedList, ["A)"], Exists(False)), # expected panic + + UnitTest( + ruleUtils.parseRuleToNestedList, ["A or B"], + MatchingShape({ "op" : ExactValue(ruleUtils.RuleOp.OR)})), + + UnitTest( + ruleUtils.parseRuleToNestedList, ["A and B"], + MatchingShape({ "op" : ExactValue(ruleUtils.RuleOp.AND)})), + + UnitTest( + ruleUtils.parseRuleToNestedList, ["A or B and C"], + MatchingShape({ "op" : ExactValue(ruleUtils.RuleOp.OR)})), + + UnitTest( + ruleUtils.parseRuleToNestedList, ["A or B and C or (D and E)"], + Many( + ExactValue("A"), + Many(ExactValue("B"), ExactValue("C")), + Many(ExactValue("D"), ExactValue("E")) + )), + + UnitTest(lambda s : ruleUtils.RuleOp(s), ["or"], ExactValue(ruleUtils.RuleOp.OR)), + UnitTest(lambda s : ruleUtils.RuleOp(s), ["and"], ExactValue(ruleUtils.RuleOp.AND)), + UnitTest(lambda s : ruleUtils.RuleOp(s), ["foo"], Exists(False)), # expected panic + + UnitTest(ruleUtils.RuleOp.isOperator, ["or"], ExactValue(True)), + UnitTest(ruleUtils.RuleOp.isOperator, ["and"], ExactValue(True)), + UnitTest(ruleUtils.RuleOp.isOperator, ["foo"], ExactValue(False)), + + # reaction utilities tests: + UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp <=> adp + pi"], ExactValue(reactionUtils.ReactionDir.REVERSIBLE)), + UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp --> adp + pi"], ExactValue(reactionUtils.ReactionDir.FORWARD)), + UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp <-- adp + pi"], ExactValue(reactionUtils.ReactionDir.BACKWARD)), + UnitTest(reactionUtils.ReactionDir.fromReaction, ["atp ??? adp + pi"], Exists(False)), # should panic + + UnitTest( + reactionUtils.create_reaction_dict, + [{'shdgd': '2 pyruvate + 1 h2o <=> 1 h2o + 2 acetate', 'sgwrw': '2 co2 + 6 h2o --> 3 atp'}], + MatchingShape({ + "shdgd_B" : MatchingShape({ + "acetate" : ExactValue(2), + "h2o" : ExactValue(1), + }), + + "shdgd_F" : MatchingShape({ + "pyruvate" : ExactValue(2), + "h2o" : ExactValue(1) + }), + + "sgwrw" : MatchingShape({ + "co2" : ExactValue(2), + "h2o" : ExactValue(6), + }) + }, "reaction dict")), + ).testModule() + + rule = "A and B or C or D and (E or F and G) or H" + print(f"rule \"{rule}\" should comes out as: {ruleUtils.parseRuleToNestedList(rule)}") + +def unit_ras_generator() -> None: + import ras_generator as ras + import utils.rule_parsing as ruleUtils + + # Making an alias to mask the name of the inner function and separate the 2 tests: + def opListAlias(op_list, dataset): + ras.ARGS.none = False + return ras.ras_op_list(op_list, dataset) + + ras.ARGS = ras.process_args() + rule = ruleUtils.OpList(ruleUtils.RuleOp.AND) + rule.extend(["foo", "bar", "baz"]) + + dataset = { "foo" : 5, "bar" : 2, "baz" : None } + + UnitTester("ras generator", LogMode.Pedantic, False, + UnitTest(ras.ras_op_list, [rule, dataset], ExactValue(2)), + UnitTest(opListAlias, [rule, dataset], ExactValue(None)), + ).testModule() + +if __name__ == "__main__": + unit_cobraxy() + unit_custom_data_generator() + unit_utils() + unit_ras_generator() \ No newline at end of file
