Mercurial > repos > bimib > cobraxy
view COBRAxy/docs/tutorials/python-api.md @ 492:4ed95023af20 draft
Uploaded
author | francesco_lapi |
---|---|
date | Tue, 30 Sep 2025 14:02:17 +0000 |
parents | |
children |
line wrap: on
line source
# Python API Tutorial Learn how to use COBRAxy tools programmatically in Python scripts and analysis pipelines. ## Overview This tutorial teaches you to integrate COBRAxy into Python workflows by calling tool main functions directly with parsed arguments. **Time required**: ~45 minutes **Difficulty**: Intermediate **Prerequisites**: Basic Python knowledge, COBRAxy installation ## Understanding COBRAxy Architecture ### Tool Structure Each COBRAxy tool is a Python module with: - `main(args)` function that accepts argument list - Command-line argument parsing - Self-contained execution logic ```python # General pattern for all tools import tool_module tool_module.main(['-arg1', 'value1', '-arg2', 'value2']) ``` ### Available Tools | Python Module | Purpose | Key Arguments | |---------------|---------|---------------| | `ras_generator` | Compute reaction activity scores | `-in`, `-ra`, `-rs` | | `rps_generator` | Compute reaction propensity scores | `-id`, `-rp` | | `marea` | Statistical pathway analysis | `-input_data`, `-choice_map` | | `ras_to_bounds` | Apply RAS constraints to model | `-ir`, `-ms`, `-idop` | | `flux_simulation` | Sample metabolic fluxes | `-ms`, `-in`, `-a`, `-ns` | | `flux_to_map` | Add flux data to maps | `-if`, `-mp`, `-idop` | ## Setup Your Environment ### Import Required Modules ```python import sys import os from pathlib import Path # Add COBRAxy to Python path cobraxy_path = "/path/to/COBRAxy" sys.path.insert(0, cobraxy_path) # Import COBRAxy tools import ras_generator import rps_generator import marea import ras_to_bounds import flux_simulation import flux_to_map import metabolicModel2Tabular as model_setting ``` ### Set Working Directory ```python # Set up working directory work_dir = Path("/path/to/analysis") work_dir.mkdir(exist_ok=True) os.chdir(work_dir) # COBRAxy tools expect this parameter tool_dir = str(Path(cobraxy_path).absolute()) ``` ## Basic Workflow Example ### Step 1: Prepare Sample Data ```python import pandas as pd import numpy as np # Create sample gene expression data genes = ['HGNC:5', 'HGNC:10', 'HGNC:15', 'HGNC:25', 'HGNC:30'] samples = ['Control_1', 'Control_2', 'Treatment_1', 'Treatment_2'] # Generate random expression values np.random.seed(42) data = np.random.lognormal(mean=2, sigma=1, size=(len(genes), len(samples))) # Create DataFrame expression_df = pd.DataFrame(data, index=genes, columns=samples) expression_df.index.name = 'Gene_ID' # Save to file expression_file = work_dir / "expression_data.tsv" expression_df.to_csv(expression_file, sep='\t') print(f"Created sample data: {expression_file}") ``` ### Step 2: Extract Model Information ```python # Extract model components (optional, for understanding model structure) model_args = [ '-td', tool_dir, '-ms', 'ENGRO2', # Use built-in ENGRO2 model '-idop', str(work_dir / 'model_info') ] try: model_setting.main(model_args) print("✓ Model information extracted") except Exception as e: print(f"Model extraction failed: {e}") ``` ### Step 3: Generate RAS Scores ```python # Generate Reaction Activity Scores ras_output = work_dir / "ras_scores.tsv" ras_args = [ '-td', tool_dir, '-in', str(expression_file), '-ra', str(ras_output), '-rs', 'ENGRO2', # Built-in model '-n', 'true' # Handle missing genes ] try: ras_generator.main(ras_args) print(f"✓ RAS scores generated: {ras_output}") except Exception as e: print(f"RAS generation failed: {e}") raise ``` ### Step 4: Generate RPS Scores (Optional) ```python # Create sample metabolite data metabolites = ['glucose', 'pyruvate', 'lactate', 'ATP', 'NADH'] met_data = np.random.lognormal(mean=3, sigma=0.5, size=(len(metabolites), len(samples))) met_df = pd.DataFrame(met_data, index=metabolites, columns=samples) met_df.index.name = 'Metabolite_ID' metabolite_file = work_dir / "metabolite_data.tsv" met_df.to_csv(metabolite_file, sep='\t') # Generate Reaction Propensity Scores rps_output = work_dir / "rps_scores.tsv" rps_args = [ '-td', tool_dir, '-id', str(metabolite_file), '-rp', str(rps_output) ] try: rps_generator.main(rps_args) print(f"✓ RPS scores generated: {rps_output}") except Exception as e: print(f"RPS generation warning: {e}") # RPS generation might fail with sample data - that's OK ``` ### Step 5: Statistical Analysis with MAREA ```python # Create enriched pathway maps maps_output = work_dir / "pathway_maps" marea_args = [ '-td', tool_dir, '-using_RAS', 'true', '-input_data', str(ras_output), '-choice_map', 'ENGRO2', '-gs', 'true', # Gene set analysis '-idop', str(maps_output) ] try: marea.main(marea_args) print(f"✓ Pathway maps created: {maps_output}") except Exception as e: print(f"MAREA analysis failed: {e}") ``` ### Step 6: Flux Simulation Pipeline ```python # Apply RAS constraints to model bounds_output = work_dir / "model_bounds" bounds_args = [ '-td', tool_dir, '-ms', 'ENGRO2', '-ir', str(ras_output), '-rs', 'true', # Use RAS values '-idop', str(bounds_output) ] try: ras_to_bounds.main(bounds_args) print(f"✓ Model constraints applied: {bounds_output}") except Exception as e: print(f"Bounds generation failed: {e}") raise # Sample metabolic fluxes flux_output = work_dir / "flux_samples" flux_args = [ '-td', tool_dir, '-ms', 'ENGRO2', '-in', str(bounds_output / "*.tsv"), # Will be expanded by tool '-a', 'CBS', # Sampling algorithm '-ns', '1000', # Number of samples '-idop', str(flux_output) ] try: flux_simulation.main(flux_args) print(f"✓ Flux samples generated: {flux_output}") except Exception as e: print(f"Flux simulation failed: {e}") ``` ### Step 7: Create Final Visualizations ```python # Add flux data to enriched maps final_maps = work_dir / "final_visualizations" # Check if we have both maps and flux data maps_dir = maps_output flux_dir = flux_output if maps_dir.exists() and flux_dir.exists(): flux_to_map_args = [ '-td', tool_dir, '-if', str(flux_dir / "*.tsv"), '-mp', str(maps_dir / "*.svg"), '-idop', str(final_maps) ] try: flux_to_map.main(flux_to_map_args) print(f"✓ Final visualizations created: {final_maps}") except Exception as e: print(f"Final mapping failed: {e}") else: print("Skipping final visualization - missing input files") ``` ## Advanced Usage Patterns ### Error Handling and Validation ```python def run_cobraxy_tool(tool_module, args, description): """Helper function to run COBRAxy tools with error handling.""" try: print(f"Running {description}...") tool_module.main(args) print(f"✓ {description} completed successfully") return True except Exception as e: print(f"✗ {description} failed: {e}") return False # Usage success = run_cobraxy_tool( ras_generator, ras_args, "RAS generation" ) if not success: print("Pipeline stopped due to error") exit(1) ``` ### Batch Processing Multiple Datasets ```python def process_dataset(dataset_path, output_dir): """Process a single dataset through COBRAxy pipeline.""" dataset_name = dataset_path.stem out_dir = Path(output_dir) / dataset_name out_dir.mkdir(exist_ok=True) # Generate RAS ras_file = out_dir / "ras_scores.tsv" ras_args = [ '-td', tool_dir, '-in', str(dataset_path), '-ra', str(ras_file), '-rs', 'ENGRO2' ] if run_cobraxy_tool(ras_generator, ras_args, f"RAS for {dataset_name}"): # Continue with MAREA analysis maps_dir = out_dir / "maps" marea_args = [ '-td', tool_dir, '-using_RAS', 'true', '-input_data', str(ras_file), '-choice_map', 'ENGRO2', '-idop', str(maps_dir) ] run_cobraxy_tool(marea, marea_args, f"MAREA for {dataset_name}") return out_dir # Process multiple datasets datasets = [ "/path/to/dataset1.tsv", "/path/to/dataset2.tsv", "/path/to/dataset3.tsv" ] results = [] for dataset in datasets: result_dir = process_dataset(Path(dataset), work_dir / "batch_results") results.append(result_dir) print(f"Processed {len(results)} datasets") ``` ### Custom Analysis Pipelines ```python class COBRAxyPipeline: """Custom COBRAxy analysis pipeline.""" def __init__(self, tool_dir, work_dir): self.tool_dir = tool_dir self.work_dir = Path(work_dir) self.work_dir.mkdir(exist_ok=True) def run_enrichment_analysis(self, expression_file, model='ENGRO2'): """Run enrichment-focused analysis.""" # Generate RAS ras_file = self.work_dir / "ras_scores.tsv" ras_args = ['-td', self.tool_dir, '-in', str(expression_file), '-ra', str(ras_file), '-rs', model] if not run_cobraxy_tool(ras_generator, ras_args, "RAS generation"): return None # Run MAREA maps_dir = self.work_dir / "enrichment_maps" marea_args = ['-td', self.tool_dir, '-using_RAS', 'true', '-input_data', str(ras_file), '-choice_map', model, '-gs', 'true', '-idop', str(maps_dir)] if run_cobraxy_tool(marea, marea_args, "MAREA analysis"): return maps_dir return None def run_flux_analysis(self, expression_file, model='ENGRO2', n_samples=1000): """Run flux sampling analysis.""" # Generate RAS and apply bounds ras_file = self.work_dir / "ras_scores.tsv" bounds_dir = self.work_dir / "bounds" flux_dir = self.work_dir / "flux_samples" # RAS generation ras_args = ['-td', self.tool_dir, '-in', str(expression_file), '-ra', str(ras_file), '-rs', model] if not run_cobraxy_tool(ras_generator, ras_args, "RAS generation"): return None # Apply bounds bounds_args = ['-td', self.tool_dir, '-ms', model, '-ir', str(ras_file), '-rs', 'true', '-idop', str(bounds_dir)] if not run_cobraxy_tool(ras_to_bounds, bounds_args, "Bounds application"): return None # Flux sampling flux_args = ['-td', self.tool_dir, '-ms', model, '-in', str(bounds_dir / "*.tsv"), '-a', 'CBS', '-ns', str(n_samples), '-idop', str(flux_dir)] if run_cobraxy_tool(flux_simulation, flux_args, "Flux simulation"): return flux_dir return None # Usage pipeline = COBRAxyPipeline(tool_dir, work_dir / "custom_analysis") # Run enrichment analysis enrichment_results = pipeline.run_enrichment_analysis(expression_file) if enrichment_results: print(f"Enrichment analysis completed: {enrichment_results}") # Run flux analysis flux_results = pipeline.run_flux_analysis(expression_file, n_samples=500) if flux_results: print(f"Flux analysis completed: {flux_results}") ``` ## Integration with Data Analysis Libraries ### Pandas Integration ```python # Read COBRAxy results back into pandas ras_df = pd.read_csv(ras_output, sep='\t', index_col=0) print(f"RAS data shape: {ras_df.shape}") print(f"Sample statistics:\n{ras_df.describe()}") # Filter highly variable reactions ras_std = ras_df.std(axis=1) variable_reactions = ras_std.nlargest(20).index print(f"Most variable reactions: {list(variable_reactions)}") ``` ### Matplotlib Visualization ```python import matplotlib.pyplot as plt import seaborn as sns # Visualize RAS distributions plt.figure(figsize=(12, 8)) sns.heatmap(ras_df.iloc[:50], cmap='RdBu_r', center=0, cbar_kws={'label': 'RAS Score'}) plt.title('Reaction Activity Scores (Top 50 Reactions)') plt.xlabel('Samples') plt.ylabel('Reactions') plt.tight_layout() plt.savefig(work_dir / 'ras_heatmap.png', dpi=300) plt.show() ``` ## Best Practices ### 1. Environment Management ```python # Use pathlib for cross-platform compatibility from pathlib import Path # Use absolute paths tool_dir = str(Path(cobraxy_path).absolute()) work_dir = Path("/analysis").absolute() ``` ### 2. Error Handling ```python # Always wrap tool calls in try-except try: ras_generator.main(ras_args) except Exception as e: print(f"RAS generation failed: {e}") # Log details, cleanup, or alternative action ``` ### 3. Argument Validation ```python def validate_file_exists(filepath): """Validate input file exists.""" path = Path(filepath) if not path.exists(): raise FileNotFoundError(f"Input file not found: {filepath}") return str(path.absolute()) # Use before calling tools expression_file = validate_file_exists(expression_file) ``` ## Troubleshooting ### Common Issues **Import errors** ```python # Check if COBRAxy path is correct import sys print("Python path includes:") for p in sys.path: print(f" {p}") # Add COBRAxy path sys.path.insert(0, "/correct/path/to/COBRAxy") ``` **Tool execution failures** ```python # Enable verbose output import logging logging.basicConfig(level=logging.DEBUG) # Check working directory print(f"Current directory: {os.getcwd()}") print(f"Directory contents: {list(Path('.').iterdir())}") ``` **File path issues** ```python # Use absolute paths ras_args = [ '-td', str(Path(tool_dir).absolute()), '-in', str(Path(expression_file).absolute()), '-ra', str(Path(ras_output).absolute()), '-rs', 'ENGRO2' ] ``` ## Next Steps Now that you can use COBRAxy programmatically: 1. **[Tools Reference](../tools/)** - Detailed parameter documentation 2. **[Examples](../examples/)** - Real-world analysis scripts 3. **Build custom analysis pipelines** for your research needs 4. **Integrate with workflow managers** like Snakemake or Nextflow ## Resources - [COBRApy Documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling library - [Pandas Documentation](https://pandas.pydata.org/) - Data manipulation - [Matplotlib Gallery](https://matplotlib.org/gallery/) - Visualization examples - [Python Pathlib](https://docs.python.org/3/library/pathlib.html) - Modern path handling