492
|
1 # Getting Started
|
|
2
|
|
3 Welcome to COBRAxy! This guide will help you get up and running with metabolic flux analysis.
|
|
4
|
|
5 ## What is COBRAxy?
|
|
6
|
|
7 COBRAxy is a comprehensive toolkit for metabolic flux analysis that bridges the gap between omics data and biological insights. It provides:
|
|
8
|
|
9 - **Data Integration**: Combine gene expression and metabolite data
|
|
10 - **Metabolic Modeling**: Use constraint-based models for flux analysis
|
|
11 - **Visualization**: Generate interactive pathway maps
|
|
12 - **Statistical Analysis**: Perform enrichment and sensitivity analysis
|
|
13
|
|
14 ## Core Concepts
|
|
15
|
|
16 ### Reaction Activity Scores (RAS)
|
|
17 RAS quantify how active metabolic reactions are based on gene expression data. COBRAxy computes RAS by:
|
|
18 1. Mapping genes to reactions via GPR (Gene-Protein-Reaction) rules
|
|
19 2. Applying logical operations (AND/OR) based on enzyme complexes
|
|
20 3. Producing activity scores for each reaction in each sample
|
|
21
|
|
22 ### Reaction Propensity Scores (RPS)
|
|
23 RPS indicate metabolic preferences based on metabolite abundance:
|
|
24 1. Map metabolites to reactions as substrates/products
|
|
25 2. Weight by stoichiometry and frequency
|
|
26 3. Compute propensity scores using log-normalized formulas
|
|
27
|
|
28 ### Flux Sampling
|
|
29 Sample feasible flux distributions using:
|
|
30 - **CBS (Coordinate Hit-and-Run with Rounding)**: Fast, uniform sampling
|
|
31 - **OptGP (Optimal Growth Parallel)**: Growth-optimized sampling
|
|
32
|
|
33 ## Analysis Workflows
|
|
34
|
|
35 COBRAxy supports two main analysis paths:
|
|
36
|
|
37 ### 1. Enrichment Analysis Workflow
|
|
38 ```bash
|
|
39 # Generate activity scores
|
|
40 ras_generator → RAS values
|
|
41 rps_generator → RPS values
|
|
42
|
|
43 # Statistical enrichment analysis
|
|
44 marea → Enriched pathway maps
|
|
45 ```
|
|
46
|
|
47 **Use when**: You want to identify significantly altered pathways and create publication-ready maps.
|
|
48
|
|
49 ### 2. Flux Simulation Workflow
|
|
50 ```bash
|
|
51 # Apply constraints to model
|
|
52 ras_generator → RAS values
|
|
53 ras_to_bounds → Constrained model
|
|
54
|
|
55 # Sample flux distributions
|
|
56 flux_simulation → Flux samples
|
|
57 flux_to_map → Final visualizations
|
|
58 ```
|
|
59
|
|
60 **Use when**: You want to predict metabolic flux distributions and study network-wide changes.
|
|
61
|
|
62 ## Your First Analysis
|
|
63
|
|
64 Let's run a basic analysis with sample data:
|
|
65
|
|
66 ### Step 1: Prepare Your Data
|
|
67
|
|
68 You'll need:
|
|
69 - **Gene expression data**: TSV file with genes (rows) × samples (columns)
|
|
70 - **Metabolic model**: SBML file or use built-in models (ENGRO2, Recon)
|
|
71 - **Metabolite data** (optional): TSV file with metabolites (rows) × samples (columns)
|
|
72
|
|
73 ### Step 2: Generate Activity Scores
|
|
74
|
|
75 ```bash
|
|
76 # Generate RAS from expression data
|
|
77 ras_generator -td $(pwd) \
|
|
78 -in expression_data.tsv \
|
|
79 -ra ras_output.tsv \
|
|
80 -rs ENGRO2
|
|
81 ```
|
|
82
|
|
83 ### Step 3: Create Pathway Maps
|
|
84
|
|
85 ```bash
|
|
86 # Generate enriched pathway maps
|
|
87 marea -td $(pwd) \
|
|
88 -using_RAS true \
|
|
89 -input_data ras_output.tsv \
|
|
90 -choice_map ENGRO2 \
|
|
91 -gs true \
|
|
92 -idop pathway_maps
|
|
93 ```
|
|
94
|
|
95 ### Step 4: View Results
|
|
96
|
|
97 Your analysis will generate:
|
|
98 - **RAS values**: `ras_output.tsv` - Activity scores for each reaction
|
|
99 - **Statistical maps**: `pathway_maps/` - SVG files with enrichment visualization
|
|
100 - **Log files**: Detailed execution logs for troubleshooting
|
|
101
|
|
102 ## Built-in Models
|
|
103
|
|
104 COBRAxy includes ready-to-use metabolic models:
|
|
105
|
|
106 | Model | Organism | Reactions | Genes | Description |
|
|
107 |-------|----------|-----------|-------|-------------|
|
|
108 | **ENGRO2** | Human | ~2,000 | ~500 | Focused human metabolism model |
|
|
109 | **Recon** | Human | ~10,000 | ~2,000 | Comprehensive human metabolism |
|
|
110
|
|
111 Models are stored in the `local/` directory and include:
|
|
112 - SBML files
|
|
113 - GPR rules
|
|
114 - Gene mapping tables
|
|
115 - Pathway templates
|
|
116
|
|
117 ## Data Formats
|
|
118
|
|
119 ### Gene Expression Format
|
|
120 ```tsv
|
|
121 Gene_ID Sample_1 Sample_2 Sample_3
|
|
122 HGNC:5 12.5 8.3 15.7
|
|
123 HGNC:10 3.2 4.1 2.8
|
|
124 HGNC:15 7.9 11.2 6.4
|
|
125 ```
|
|
126
|
|
127 ### Metabolite Format
|
|
128 ```tsv
|
|
129 Metabolite_ID Sample_1 Sample_2 Sample_3
|
|
130 glucose 100.5 85.3 120.7
|
|
131 pyruvate 45.2 38.1 52.8
|
|
132 lactate 23.9 41.2 19.4
|
|
133 ```
|
|
134
|
|
135 ## Command Line vs Python API
|
|
136
|
|
137 COBRAxy offers two usage modes:
|
|
138
|
|
139 ### Command Line (Quick Analysis)
|
|
140 ```bash
|
|
141 # Simple command-line execution
|
|
142 ras_generator -td $(pwd) -in data.tsv -ra output.tsv -rs ENGRO2
|
|
143 ```
|
|
144
|
|
145 ### Python API (Programming)
|
|
146 ```python
|
|
147 import ras_generator
|
|
148 # Call main function with arguments
|
|
149 ras_generator.main(['-td', '/path', '-in', 'data.tsv', '-ra', 'output.tsv', '-rs', 'ENGRO2'])
|
|
150 ```
|
|
151
|
|
152 ## Next Steps
|
|
153
|
|
154 Now that you understand the basics:
|
|
155
|
|
156 1. **[Quick Start Guide](quickstart.md)** - Complete walkthrough with example data
|
|
157 2. **[Python API Tutorial](tutorials/python-api.md)** - Learn programmatic usage
|
|
158 3. **[Tools Reference](tools/)** - Detailed documentation for each tool
|
|
159 4. **[Examples](examples/)** - Real-world analysis examples
|
|
160
|
|
161 ## Need Help?
|
|
162
|
|
163 - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
|
|
164 - **[GitHub Issues](https://github.com/CompBtBs/COBRAxy/issues)** - Report bugs or ask questions
|
|
165 - **[Contributing](contributing.md)** - Help improve COBRAxy |