view COBRAxy/docs/getting-started.md @ 492:4ed95023af20 draft

Uploaded
author francesco_lapi
date Tue, 30 Sep 2025 14:02:17 +0000
parents
children
line wrap: on
line source

# Getting Started

Welcome to COBRAxy! This guide will help you get up and running with metabolic flux analysis.

## What is COBRAxy?

COBRAxy is a comprehensive toolkit for metabolic flux analysis that bridges the gap between omics data and biological insights. It provides:

- **Data Integration**: Combine gene expression and metabolite data
- **Metabolic Modeling**: Use constraint-based models for flux analysis
- **Visualization**: Generate interactive pathway maps
- **Statistical Analysis**: Perform enrichment and sensitivity analysis

## Core Concepts

### Reaction Activity Scores (RAS)
RAS quantify how active metabolic reactions are based on gene expression data. COBRAxy computes RAS by:
1. Mapping genes to reactions via GPR (Gene-Protein-Reaction) rules
2. Applying logical operations (AND/OR) based on enzyme complexes
3. Producing activity scores for each reaction in each sample

### Reaction Propensity Scores (RPS)
RPS indicate metabolic preferences based on metabolite abundance:
1. Map metabolites to reactions as substrates/products
2. Weight by stoichiometry and frequency
3. Compute propensity scores using log-normalized formulas

### Flux Sampling
Sample feasible flux distributions using:
- **CBS (Coordinate Hit-and-Run with Rounding)**: Fast, uniform sampling
- **OptGP (Optimal Growth Parallel)**: Growth-optimized sampling

## Analysis Workflows

COBRAxy supports two main analysis paths:

### 1. Enrichment Analysis Workflow
```bash
# Generate activity scores
ras_generator → RAS values
rps_generator → RPS values

# Statistical enrichment analysis  
marea → Enriched pathway maps
```

**Use when**: You want to identify significantly altered pathways and create publication-ready maps.

### 2. Flux Simulation Workflow  
```bash
# Apply constraints to model
ras_generator → RAS values
ras_to_bounds → Constrained model

# Sample flux distributions
flux_simulation → Flux samples
flux_to_map → Final visualizations
```

**Use when**: You want to predict metabolic flux distributions and study network-wide changes.

## Your First Analysis

Let's run a basic analysis with sample data:

### Step 1: Prepare Your Data

You'll need:
- **Gene expression data**: TSV file with genes (rows) × samples (columns)
- **Metabolic model**: SBML file or use built-in models (ENGRO2, Recon)
- **Metabolite data** (optional): TSV file with metabolites (rows) × samples (columns)

### Step 2: Generate Activity Scores

```bash
# Generate RAS from expression data
ras_generator -td $(pwd) \
  -in expression_data.tsv \
  -ra ras_output.tsv \
  -rs ENGRO2
```

### Step 3: Create Pathway Maps

```bash
# Generate enriched pathway maps
marea -td $(pwd) \
  -using_RAS true \
  -input_data ras_output.tsv \
  -choice_map ENGRO2 \
  -gs true \
  -idop pathway_maps
```

### Step 4: View Results

Your analysis will generate:
- **RAS values**: `ras_output.tsv` - Activity scores for each reaction
- **Statistical maps**: `pathway_maps/` - SVG files with enrichment visualization
- **Log files**: Detailed execution logs for troubleshooting

## Built-in Models

COBRAxy includes ready-to-use metabolic models:

| Model | Organism | Reactions | Genes | Description |
|-------|----------|-----------|-------|-------------|
| **ENGRO2** | Human | ~2,000 | ~500 | Focused human metabolism model |
| **Recon** | Human | ~10,000 | ~2,000 | Comprehensive human metabolism |

Models are stored in the `local/` directory and include:
- SBML files
- GPR rules  
- Gene mapping tables
- Pathway templates

## Data Formats

### Gene Expression Format
```tsv
Gene_ID	Sample_1	Sample_2	Sample_3
HGNC:5	12.5	8.3	15.7
HGNC:10	3.2	4.1	2.8
HGNC:15	7.9	11.2	6.4
```

### Metabolite Format
```tsv
Metabolite_ID	Sample_1	Sample_2	Sample_3
glucose	100.5	85.3	120.7
pyruvate	45.2	38.1	52.8
lactate	23.9	41.2	19.4
```

## Command Line vs Python API

COBRAxy offers two usage modes:

### Command Line (Quick Analysis)
```bash
# Simple command-line execution
ras_generator -td $(pwd) -in data.tsv -ra output.tsv -rs ENGRO2
```

### Python API (Programming)
```python
import ras_generator
# Call main function with arguments
ras_generator.main(['-td', '/path', '-in', 'data.tsv', '-ra', 'output.tsv', '-rs', 'ENGRO2'])
```

## Next Steps

Now that you understand the basics:

1. **[Quick Start Guide](quickstart.md)** - Complete walkthrough with example data
2. **[Python API Tutorial](tutorials/python-api.md)** - Learn programmatic usage
3. **[Tools Reference](tools/)** - Detailed documentation for each tool
4. **[Examples](examples/)** - Real-world analysis examples

## Need Help?

- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
- **[GitHub Issues](https://github.com/CompBtBs/COBRAxy/issues)** - Report bugs or ask questions
- **[Contributing](contributing.md)** - Help improve COBRAxy