cobraxy: COBRAxy/docs/tutorials/python-api.md comparison

comparison COBRAxy/docs/tutorials/python-api.md @ 492:4ed95023af20 draft

Uploaded

author	francesco_lapi
date	Tue, 30 Sep 2025 14:02:17 +0000
parents
children

comparison

equal deleted inserted replaced

-:7a413a5ec566
+:4ed95023af20
+# Python API Tutorial
+Learn how to use COBRAxy tools programmatically in Python scripts and analysis pipelines.
+## Overview
+This tutorial teaches you to integrate COBRAxy into Python workflows by calling tool main functions directly with parsed arguments.
+**Time required**: ~45 minutes
+**Difficulty**: Intermediate
+**Prerequisites**: Basic Python knowledge, COBRAxy installation
+## Understanding COBRAxy Architecture
+### Tool Structure
+Each COBRAxy tool is a Python module with:
+- `main(args)` function that accepts argument list
+- Command-line argument parsing
+- Self-contained execution logic
+```python
+# General pattern for all tools
+import tool_module
+tool_module.main(['-arg1', 'value1', '-arg2', 'value2'])
+```
+### Available Tools
+| Python Module | Purpose | Key Arguments |
+|---------------|---------|---------------|
+| `ras_generator` | Compute reaction activity scores | `-in`, `-ra`, `-rs` |
+| `rps_generator` | Compute reaction propensity scores | `-id`, `-rp` |
+| `marea` | Statistical pathway analysis | `-input_data`, `-choice_map` |
+| `ras_to_bounds` | Apply RAS constraints to model | `-ir`, `-ms`, `-idop` |
+| `flux_simulation` | Sample metabolic fluxes | `-ms`, `-in`, `-a`, `-ns` |
+| `flux_to_map` | Add flux data to maps | `-if`, `-mp`, `-idop` |
+## Setup Your Environment
+### Import Required Modules
+```python
+import sys
+import os
+from pathlib import Path
+# Add COBRAxy to Python path
+cobraxy_path = "/path/to/COBRAxy"
+sys.path.insert(0, cobraxy_path)
+# Import COBRAxy tools
+import ras_generator
+import rps_generator
+import marea
+import ras_to_bounds
+import flux_simulation
+import flux_to_map
+import metabolicModel2Tabular as model_setting
+```
+### Set Working Directory
+```python
+# Set up working directory
+work_dir = Path("/path/to/analysis")
+work_dir.mkdir(exist_ok=True)
+os.chdir(work_dir)
+# COBRAxy tools expect this parameter
+tool_dir = str(Path(cobraxy_path).absolute())
+```
+## Basic Workflow Example
+### Step 1: Prepare Sample Data
+```python
+import pandas as pd
+import numpy as np
+# Create sample gene expression data
+genes = ['HGNC:5', 'HGNC:10', 'HGNC:15', 'HGNC:25', 'HGNC:30']
+samples = ['Control_1', 'Control_2', 'Treatment_1', 'Treatment_2']
+# Generate random expression values
+np.random.seed(42)
+data = np.random.lognormal(mean=2, sigma=1, size=(len(genes), len(samples)))
+# Create DataFrame
+expression_df = pd.DataFrame(data, index=genes, columns=samples)
+expression_df.index.name = 'Gene_ID'
+# Save to file
+expression_file = work_dir / "expression_data.tsv"
+expression_df.to_csv(expression_file, sep='\t')
+print(f"Created sample data: {expression_file}")
+```
+### Step 2: Extract Model Information
+```python
+# Extract model components (optional, for understanding model structure)
+model_args = [
+'-td', tool_dir,
+'-ms', 'ENGRO2',  # Use built-in ENGRO2 model
+'-idop', str(work_dir / 'model_info')
+]
+try:
+model_setting.main(model_args)
+print("✓ Model information extracted")
+except Exception as e:
+print(f"Model extraction failed: {e}")
+```
+### Step 3: Generate RAS Scores
+```python
+# Generate Reaction Activity Scores
+ras_output = work_dir / "ras_scores.tsv"
+ras_args = [
+'-td', tool_dir,
+'-in', str(expression_file),
+'-ra', str(ras_output),
+'-rs', 'ENGRO2',  # Built-in model
+'-n', 'true'  # Handle missing genes
+]
+try:
+ras_generator.main(ras_args)
+print(f"✓ RAS scores generated: {ras_output}")
+except Exception as e:
+print(f"RAS generation failed: {e}")
+raise
+```
+### Step 4: Generate RPS Scores (Optional)
+```python
+# Create sample metabolite data
+metabolites = ['glucose', 'pyruvate', 'lactate', 'ATP', 'NADH']
+met_data = np.random.lognormal(mean=3, sigma=0.5, size=(len(metabolites), len(samples)))
+met_df = pd.DataFrame(met_data, index=metabolites, columns=samples)
+met_df.index.name = 'Metabolite_ID'
+metabolite_file = work_dir / "metabolite_data.tsv"
+met_df.to_csv(metabolite_file, sep='\t')
+# Generate Reaction Propensity Scores
+rps_output = work_dir / "rps_scores.tsv"
+rps_args = [
+'-td', tool_dir,
+'-id', str(metabolite_file),
+'-rp', str(rps_output)
+]
+try:
+rps_generator.main(rps_args)
+print(f"✓ RPS scores generated: {rps_output}")
+except Exception as e:
+print(f"RPS generation warning: {e}")
+# RPS generation might fail with sample data - that's OK
+```
+### Step 5: Statistical Analysis with MAREA
+```python
+# Create enriched pathway maps
+maps_output = work_dir / "pathway_maps"
+marea_args = [
+'-td', tool_dir,
+'-using_RAS', 'true',
+'-input_data', str(ras_output),
+'-choice_map', 'ENGRO2',
+'-gs', 'true',  # Gene set analysis
+'-idop', str(maps_output)
+]
+try:
+marea.main(marea_args)
+print(f"✓ Pathway maps created: {maps_output}")
+except Exception as e:
+print(f"MAREA analysis failed: {e}")
+```
+### Step 6: Flux Simulation Pipeline
+```python
+# Apply RAS constraints to model
+bounds_output = work_dir / "model_bounds"
+bounds_args = [
+'-td', tool_dir,
+'-ms', 'ENGRO2',
+'-ir', str(ras_output),
+'-rs', 'true',  # Use RAS values
+'-idop', str(bounds_output)
+]
+try:
+ras_to_bounds.main(bounds_args)
+print(f"✓ Model constraints applied: {bounds_output}")
+except Exception as e:
+print(f"Bounds generation failed: {e}")
+raise
+# Sample metabolic fluxes
+flux_output = work_dir / "flux_samples"
+flux_args = [
+'-td', tool_dir,
+'-ms', 'ENGRO2',
+'-in', str(bounds_output / "*.tsv"),  # Will be expanded by tool
+'-a', 'CBS',  # Sampling algorithm
+'-ns', '1000',  # Number of samples
+'-idop', str(flux_output)
+]
+try:
+flux_simulation.main(flux_args)
+print(f"✓ Flux samples generated: {flux_output}")
+except Exception as e:
+print(f"Flux simulation failed: {e}")
+```
+### Step 7: Create Final Visualizations
+```python
+# Add flux data to enriched maps
+final_maps = work_dir / "final_visualizations"
+# Check if we have both maps and flux data
+maps_dir = maps_output
+flux_dir = flux_output
+if maps_dir.exists() and flux_dir.exists():
+flux_to_map_args = [
+'-td', tool_dir,
+'-if', str(flux_dir / "*.tsv"),
+'-mp', str(maps_dir / "*.svg"),
+'-idop', str(final_maps)
+]
+try:
+flux_to_map.main(flux_to_map_args)
+print(f"✓ Final visualizations created: {final_maps}")
+except Exception as e:
+print(f"Final mapping failed: {e}")
+else:
+print("Skipping final visualization - missing input files")
+```
+## Advanced Usage Patterns
+### Error Handling and Validation
+```python
+def run_cobraxy_tool(tool_module, args, description):
+"""Helper function to run COBRAxy tools with error handling."""
+try:
+print(f"Running {description}...")
+tool_module.main(args)
+print(f"✓ {description} completed successfully")
+return True
+except Exception as e:
+print(f"✗ {description} failed: {e}")
+return False
+# Usage
+success = run_cobraxy_tool(
+ras_generator,
+ras_args,
+"RAS generation"
+)
+if not success:
+print("Pipeline stopped due to error")
+exit(1)
+```
+### Batch Processing Multiple Datasets
+```python
+def process_dataset(dataset_path, output_dir):
+"""Process a single dataset through COBRAxy pipeline."""
+dataset_name = dataset_path.stem
+out_dir = Path(output_dir) / dataset_name
+out_dir.mkdir(exist_ok=True)
+# Generate RAS
+ras_file = out_dir / "ras_scores.tsv"
+ras_args = [
+'-td', tool_dir,
+'-in', str(dataset_path),
+'-ra', str(ras_file),
+'-rs', 'ENGRO2'
+]
+if run_cobraxy_tool(ras_generator, ras_args, f"RAS for {dataset_name}"):
+# Continue with MAREA analysis
+maps_dir = out_dir / "maps"
+marea_args = [
+'-td', tool_dir,
+'-using_RAS', 'true',
+'-input_data', str(ras_file),
+'-choice_map', 'ENGRO2',
+'-idop', str(maps_dir)
+]
+run_cobraxy_tool(marea, marea_args, f"MAREA for {dataset_name}")
+return out_dir
+# Process multiple datasets
+datasets = [
+"/path/to/dataset1.tsv",
+"/path/to/dataset2.tsv",
+"/path/to/dataset3.tsv"
+]
+results = []
+for dataset in datasets:
+result_dir = process_dataset(Path(dataset), work_dir / "batch_results")
+results.append(result_dir)
+print(f"Processed {len(results)} datasets")
+```
+### Custom Analysis Pipelines
+```python
+class COBRAxyPipeline:
+"""Custom COBRAxy analysis pipeline."""
+def __init__(self, tool_dir, work_dir):
+self.tool_dir = tool_dir
+self.work_dir = Path(work_dir)
+self.work_dir.mkdir(exist_ok=True)
+def run_enrichment_analysis(self, expression_file, model='ENGRO2'):
+"""Run enrichment-focused analysis."""
+# Generate RAS
+ras_file = self.work_dir / "ras_scores.tsv"
+ras_args = ['-td', self.tool_dir, '-in', str(expression_file),
+'-ra', str(ras_file), '-rs', model]
+if not run_cobraxy_tool(ras_generator, ras_args, "RAS generation"):
+return None
+# Run MAREA
+maps_dir = self.work_dir / "enrichment_maps"
+marea_args = ['-td', self.tool_dir, '-using_RAS', 'true',
+'-input_data', str(ras_file), '-choice_map', model,
+'-gs', 'true', '-idop', str(maps_dir)]
+if run_cobraxy_tool(marea, marea_args, "MAREA analysis"):
+return maps_dir
+return None
+def run_flux_analysis(self, expression_file, model='ENGRO2', n_samples=1000):
+"""Run flux sampling analysis."""
+# Generate RAS and apply bounds
+ras_file = self.work_dir / "ras_scores.tsv"
+bounds_dir = self.work_dir / "bounds"
+flux_dir = self.work_dir / "flux_samples"
+# RAS generation
+ras_args = ['-td', self.tool_dir, '-in', str(expression_file),
+'-ra', str(ras_file), '-rs', model]
+if not run_cobraxy_tool(ras_generator, ras_args, "RAS generation"):
+return None
+# Apply bounds
+bounds_args = ['-td', self.tool_dir, '-ms', model, '-ir', str(ras_file),
+'-rs', 'true', '-idop', str(bounds_dir)]
+if not run_cobraxy_tool(ras_to_bounds, bounds_args, "Bounds application"):
+return None
+# Flux sampling
+flux_args = ['-td', self.tool_dir, '-ms', model,
+'-in', str(bounds_dir / "*.tsv"),
+'-a', 'CBS', '-ns', str(n_samples), '-idop', str(flux_dir)]
+if run_cobraxy_tool(flux_simulation, flux_args, "Flux simulation"):
+return flux_dir
+return None
+# Usage
+pipeline = COBRAxyPipeline(tool_dir, work_dir / "custom_analysis")
+# Run enrichment analysis
+enrichment_results = pipeline.run_enrichment_analysis(expression_file)
+if enrichment_results:
+print(f"Enrichment analysis completed: {enrichment_results}")
+# Run flux analysis
+flux_results = pipeline.run_flux_analysis(expression_file, n_samples=500)
+if flux_results:
+print(f"Flux analysis completed: {flux_results}")
+```
+## Integration with Data Analysis Libraries
+### Pandas Integration
+```python
+# Read COBRAxy results back into pandas
+ras_df = pd.read_csv(ras_output, sep='\t', index_col=0)
+print(f"RAS data shape: {ras_df.shape}")
+print(f"Sample statistics:\n{ras_df.describe()}")
+# Filter highly variable reactions
+ras_std = ras_df.std(axis=1)
+variable_reactions = ras_std.nlargest(20).index
+print(f"Most variable reactions: {list(variable_reactions)}")
+```
+### Matplotlib Visualization
+```python
+import matplotlib.pyplot as plt
+import seaborn as sns
+# Visualize RAS distributions
+plt.figure(figsize=(12, 8))
+sns.heatmap(ras_df.iloc[:50], cmap='RdBu_r', center=0, cbar_kws={'label': 'RAS Score'})
+plt.title('Reaction Activity Scores (Top 50 Reactions)')
+plt.xlabel('Samples')
+plt.ylabel('Reactions')
+plt.tight_layout()
+plt.savefig(work_dir / 'ras_heatmap.png', dpi=300)
+plt.show()
+```
+## Best Practices
+### 1. Environment Management
+```python
+# Use pathlib for cross-platform compatibility
+from pathlib import Path
+# Use absolute paths
+tool_dir = str(Path(cobraxy_path).absolute())
+work_dir = Path("/analysis").absolute()
+```
+### 2. Error Handling
+```python
+# Always wrap tool calls in try-except
+try:
+ras_generator.main(ras_args)
+except Exception as e:
+print(f"RAS generation failed: {e}")
+# Log details, cleanup, or alternative action
+```
+### 3. Argument Validation
+```python
+def validate_file_exists(filepath):
+"""Validate input file exists."""
+path = Path(filepath)
+if not path.exists():
+raise FileNotFoundError(f"Input file not found: {filepath}")
+return str(path.absolute())
+# Use before calling tools
+expression_file = validate_file_exists(expression_file)
+```
+## Troubleshooting
+### Common Issues
+**Import errors**
+```python
+# Check if COBRAxy path is correct
+import sys
+print("Python path includes:")
+for p in sys.path:
+print(f"  {p}")
+# Add COBRAxy path
+sys.path.insert(0, "/correct/path/to/COBRAxy")
+```
+**Tool execution failures**
+```python
+# Enable verbose output
+import logging
+logging.basicConfig(level=logging.DEBUG)
+# Check working directory
+print(f"Current directory: {os.getcwd()}")
+print(f"Directory contents: {list(Path('.').iterdir())}")
+```
+**File path issues**
+```python
+# Use absolute paths
+ras_args = [
+'-td', str(Path(tool_dir).absolute()),
+'-in', str(Path(expression_file).absolute()),
+'-ra', str(Path(ras_output).absolute()),
+'-rs', 'ENGRO2'
+]
+```
+## Next Steps
+Now that you can use COBRAxy programmatically:
+1. **[Tools Reference](../tools/)** - Detailed parameter documentation
+2. **[Examples](../examples/)** - Real-world analysis scripts
+3. **Build custom analysis pipelines** for your research needs
+4. **Integrate with workflow managers** like Snakemake or Nextflow
+## Resources
+- [COBRApy Documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling library
+- [Pandas Documentation](https://pandas.pydata.org/) - Data manipulation
+- [Matplotlib Gallery](https://matplotlib.org/gallery/) - Visualization examples
+- [Python Pathlib](https://docs.python.org/3/library/pathlib.html) - Modern path handling

Mercurial > repos > bimib > cobraxy

comparison COBRAxy/docs/tutorials/python-api.md @ 492:4ed95023af20 draft