comparison COBRAxy/docs/tools/export-metabolic-model.md @ 542:fcdbc81feb45 draft

Uploaded
author francesco_lapi
date Sun, 26 Oct 2025 19:27:41 +0000
parents
children 73f2f7e2be17
comparison
equal deleted inserted replaced
541:fa93040a75af 542:fcdbc81feb45
1 # Export Metabolic Model
2
3 Export tabular data (CSV/TSV) into COBRA metabolic models in various formats.
4
5 ## Overview
6
7 Export Metabolic Model (exportMetabolicModel) converts structured tabular data containing reaction information into fully functional COBRA metabolic models. This tool enables creation of custom models from spreadsheet data and supports multiple output formats including SBML, JSON, MATLAB, and YAML.
8
9 ## Usage
10
11 ### Command Line
12
13 ```bash
14 exportMetabolicModel --input model_data.csv \
15 --format sbml \
16 --output custom_model.xml \
17 --out_log conversion.log \
18 --tool_dir /path/to/COBRAxy/src
19 ```
20
21 ### Galaxy Interface
22
23 Select "Export Metabolic Model" from the COBRAxy tool suite and configure conversion parameters.
24
25 ## Parameters
26
27 ### Required Parameters
28
29 | Parameter | Flag | Description |
30 |-----------|------|-------------|
31 | Input File | `--input` | Tabular file (CSV/TSV) with model data |
32 | Output Format | `--format` | Model format (sbml, json, mat, yaml) |
33 | Output File | `--output` | Output model file path |
34 | Output Log | `--out_log` | Log file for conversion process |
35
36 ### Optional Parameters
37
38 | Parameter | Flag | Description | Default |
39 |-----------|------|-------------|---------|
40 | Tool Directory | `--tool_dir` | COBRAxy installation directory | Current directory |
41
42 ## Input Format
43
44 ### Tabular Model Data
45
46 The input file must contain structured model information with the following columns:
47
48 ```csv
49 Reaction_ID,GPR_Rule,Reaction_Formula,Lower_Bound,Upper_Bound,Objective_Coefficient,Medium_Member,Compartment,Subsystem
50 R00001,GENE1 or GENE2,A + B -> C + D,-1000.0,1000.0,0.0,FALSE,cytosol,Glycolysis
51 R00002,GENE3 and GENE4,E <-> F,-1000.0,1000.0,0.0,FALSE,mitochondria,TCA_Cycle
52 EX_glc_e,-,glc_e <->,-1000.0,1000.0,0.0,TRUE,extracellular,Exchange
53 BIOMASS,GENE5,0.5 A + 0.3 B -> 1 BIOMASS,0.0,1000.0,1.0,FALSE,cytosol,Biomass
54 ```
55
56 ### Required Columns
57
58 | Column | Description | Format |
59 |--------|-------------|--------|
60 | **Reaction_ID** | Unique reaction identifier | String |
61 | **Reaction_Formula** | Stoichiometric equation | Metabolite notation |
62 | **Lower_Bound** | Minimum flux constraint | Numeric |
63 | **Upper_Bound** | Maximum flux constraint | Numeric |
64
65 ### Optional Columns
66
67 | Column | Description | Default |
68 |--------|-------------|---------|
69 | **GPR_Rule** | Gene-protein-reaction association | Empty string |
70 | **Objective_Coefficient** | Biomass/objective weight | 0.0 |
71 | **Medium_Member** | Exchange reaction flag | FALSE |
72 | **Compartment** | Subcellular location | Empty |
73 | **Subsystem** | Metabolic pathway | Empty |
74
75 ## Output Formats
76
77 ### SBML (Systems Biology Markup Language)
78 - **Format**: XML-based standard
79 - **Extension**: `.xml` or `.sbml`
80 - **Use Case**: Interoperability with other tools
81 - **Advantages**: Widely supported, standardized
82
83 ### JSON (JavaScript Object Notation)
84 - **Format**: COBRApy native JSON
85 - **Extension**: `.json`
86 - **Use Case**: Python/COBRApy workflows
87 - **Advantages**: Human-readable, lightweight
88
89 ### MATLAB (.mat)
90 - **Format**: MATLAB workspace format
91 - **Extension**: `.mat`
92 - **Use Case**: MATLAB COBRA Toolbox
93 - **Advantages**: Direct MATLAB compatibility
94
95 ### YAML (YAML Ain't Markup Language)
96 - **Format**: Human-readable data serialization
97 - **Extension**: `.yml` or `.yaml`
98 - **Use Case**: Configuration and documentation
99 - **Advantages**: Most human-readable format
100
101 ## Reaction Formula Syntax
102
103 ### Standard Notation
104 ```
105 # Irreversible reaction
106 A + B -> C + D
107
108 # Reversible reaction
109 A + B <-> C + D
110
111 # With stoichiometric coefficients
112 2 A + 3 B -> 1 C + 4 D
113
114 # Compartmentalized metabolites
115 glc_c + atp_c -> g6p_c + adp_c
116 ```
117
118 ### Compartment Suffixes
119 - `_c`: Cytosol
120 - `_m`: Mitochondria
121 - `_e`: Extracellular
122 - `_r`: Endoplasmic reticulum
123 - `_x`: Peroxisome
124 - `_n`: Nucleus
125
126 ### Exchange Reactions
127 ```
128 # Import reaction
129 EX_glc_e: glc_e <->
130
131 # Export reaction
132 EX_co2_e: co2_e <->
133 ```
134
135 ## GPR Rule Syntax
136
137 ### Logical Operators
138 - **AND**: Gene products required together
139 - **OR**: Alternative gene products
140 - **Parentheses**: Grouping for complex logic
141
142 ### Examples
143 ```
144 # Single gene
145 GENE1
146
147 # Alternative genes (isozymes)
148 GENE1 or GENE2 or GENE3
149
150 # Required genes (complex)
151 GENE1 and GENE2
152
153 # Complex logic
154 (GENE1 and GENE2) or (GENE3 and GENE4)
155 ```
156
157 ## Examples
158
159 ### Create Basic Model
160
161 ```bash
162 # Convert simple CSV to SBML model
163 exportMetabolicModel --input simple_model.csv \
164 --format sbml \
165 --output simple_model.xml \
166 --out_log simple_conversion.log \
167 --tool_dir /opt/COBRAxy/src
168 ```
169
170 ### Multi-format Export
171
172 ```bash
173 # Create models in all supported formats
174 formats=("sbml" "json" "mat" "yaml")
175 for fmt in "${formats[@]}"; do
176 exportMetabolicModel --input comprehensive_model.csv \
177 --format "$fmt" \
178 --output "model.$fmt" \
179 --out_log "conversion_$fmt.log" \
180 --tool_dir /opt/COBRAxy/src
181 done
182 ```
183
184 ### Custom Model Creation
185
186 ```bash
187 # Build tissue-specific model from curated data
188 exportMetabolicModel --input liver_reactions.tsv \
189 --format sbml \
190 --output liver_model.xml \
191 --out_log liver_model.log \
192 --tool_dir /opt/COBRAxy/src
193 ```
194
195 ### Model Integration Pipeline
196
197 ```bash
198 # Extract existing model, modify, and recreate
199 importMetabolicModel --model ENGRO2 \
200 --out_tabular base_model.csv \
201 --tool_dir /opt/COBRAxy/src
202
203 # Edit base_model.csv with custom reactions/constraints
204
205 # Create modified model
206 exportMetabolicModel --input modified_model.csv \
207 --format sbml \
208 --output custom_model.xml \
209 --out_log custom_creation.log \
210 --tool_dir /opt/COBRAxy/src
211 ```
212
213 ## Model Validation
214
215 ### Automatic Checks
216
217 The tool performs validation during conversion:
218 - **Stoichiometric Balance**: Reaction mass balance
219 - **Metabolite Consistency**: Compartment assignments
220 - **Bound Validation**: Feasible constraint ranges
221 - **Objective Function**: Valid biomass reaction
222
223 ### Post-conversion Validation
224
225 ```python
226 import cobra
227
228 # Load and validate model
229 model = cobra.io.read_sbml_model('custom_model.xml')
230
231 # Check basic properties
232 print(f"Reactions: {len(model.reactions)}")
233 print(f"Metabolites: {len(model.metabolites)}")
234 print(f"Genes: {len(model.genes)}")
235
236 # Test model solvability
237 solution = model.optimize()
238 print(f"Growth rate: {solution.objective_value}")
239
240 # Validate mass balance
241 unbalanced = cobra.flux_analysis.check_mass_balance(model)
242 if unbalanced:
243 print("Unbalanced reactions found:", unbalanced)
244 ```
245
246 ## Integration Workflow
247
248 ### Upstream Data Sources
249
250 #### COBRAxy Tools
251 - [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data for modification
252
253 #### External Sources
254 - **Databases**: KEGG, Reactome, BiGG
255 - **Literature**: Manually curated reactions
256 - **Spreadsheets**: User-defined custom models
257
258 ### Downstream Applications
259
260 #### COBRAxy Analysis
261 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom model
262 - [Flux Simulation](flux-simulation.md) - Sample fluxes from custom model
263 - [MAREA](marea.md) - Analyze custom pathways
264
265 #### External Tools
266 - **COBRApy**: Python-based analysis
267 - **COBRA Toolbox**: MATLAB analysis
268 - **OptFlux**: Strain design
269 - **Escher**: Pathway visualization
270
271 ### Typical Pipeline
272
273 ```bash
274 # 1. Start with existing model data
275 importMetabolicModel --model ENGRO2 \
276 --out_tabular base_reactions.csv \
277 --tool_dir /opt/COBRAxy/src
278
279 # 2. Modify/extend the reaction data
280 # Edit base_reactions.csv to add tissue-specific reactions
281
282 # 3. Create custom model
283 exportMetabolicModel --input modified_reactions.csv \
284 --format sbml \
285 --output tissue_model.xml \
286 --out_log tissue_creation.log \
287 --tool_dir /opt/COBRAxy/src
288
289 # 4. Validate and use custom model
290 ras_to_bounds --model Custom --input tissue_model.xml \
291 --ras_input tissue_expression.tsv \
292 --idop tissue_bounds/ \
293 --tool_dir /opt/COBRAxy/src
294
295 # 5. Perform flux analysis
296 flux_simulation --model Custom --input tissue_model.xml \
297 --bounds tissue_bounds/*.tsv \
298 --algorithm CBS --idop tissue_fluxes/ \
299 --tool_dir /opt/COBRAxy/src
300 ```
301
302 ## Quality Control
303
304 ### Input Data Validation
305
306 #### Pre-conversion Checks
307 - **Format Consistency**: Verify column headers and data types
308 - **Reaction Completeness**: Check for missing required fields
309 - **Stoichiometric Validity**: Validate reaction formulas
310 - **Bound Feasibility**: Ensure lower ≤ upper bounds
311
312 #### Common Data Issues
313 ```bash
314 # Check for missing reaction IDs
315 awk -F',' 'NR>1 && ($1=="" || $1=="NA") {print "Empty ID in line " NR}' input.csv
316
317 # Validate reaction directions
318 awk -F',' 'NR>1 && $3 !~ /->|<->/ {print "Invalid formula: " $1 ", " $3}' input.csv
319
320 # Check bound consistency
321 awk -F',' 'NR>1 && $4>$5 {print "Invalid bounds: " $1 ", LB=" $4 " > UB=" $5}' input.csv
322 ```
323
324 ### Model Quality Assessment
325
326 #### Structural Properties
327 - **Network Connectivity**: Ensure realistic pathway structure
328 - **Compartmentalization**: Validate transport reactions
329 - **Exchange Reactions**: Verify medium composition
330 - **Biomass Function**: Check objective reaction completeness
331
332 #### Functional Testing
333 ```python
334 # Test model functionality
335 model = cobra.io.read_sbml_model('custom_model.xml')
336
337 # Check growth capability
338 growth = model.optimize().objective_value
339 print(f"Maximum growth rate: {growth}")
340
341 # Flux Variability Analysis
342 fva_result = cobra.flux_analysis.flux_variability_analysis(model)
343 blocked_reactions = fva_result[(fva_result.minimum == 0) & (fva_result.maximum == 0)]
344 print(f"Blocked reactions: {len(blocked_reactions)}")
345
346 # Essential gene analysis
347 essential_genes = cobra.flux_analysis.find_essential_genes(model)
348 print(f"Essential genes: {len(essential_genes)}")
349 ```
350
351 ## Tips and Best Practices
352
353 ### Data Preparation
354 - **Consistent Naming**: Use systematic metabolite/reaction IDs
355 - **Compartment Notation**: Follow standard suffixes (_c, _m, _e)
356 - **Balanced Reactions**: Verify mass and charge balance
357 - **Realistic Bounds**: Use physiologically relevant constraints
358
359 ### Model Design
360 - **Modular Structure**: Organize reactions by pathway/subsystem
361 - **Exchange Reactions**: Include all necessary transport processes
362 - **Biomass Function**: Define appropriate growth objective
363 - **Gene Associations**: Add GPR rules where available
364
365 ### Format Selection
366 - **SBML**: Choose for maximum compatibility and sharing
367 - **JSON**: Use for COBRApy-specific workflows
368 - **MATLAB**: Select for COBRA Toolbox integration
369 - **YAML**: Pick for human-readable documentation
370
371 ### Performance Optimization
372 - **Model Size**: Balance comprehensiveness with computational efficiency
373 - **Reaction Pruning**: Remove unnecessary or blocked reactions
374 - **Compartmentalization**: Minimize unnecessary compartments
375 - **Validation**: Test model properties before distribution
376
377 ## Troubleshooting
378
379 ### Common Issues
380
381 **Conversion fails with format error**
382 - Check CSV/TSV column headers and data consistency
383 - Verify reaction formula syntax
384 - Ensure numeric fields contain valid numbers
385
386 **Model is infeasible after conversion**
387 - Check reaction bounds for conflicts
388 - Verify exchange reaction setup
389 - Validate stoichiometric balance
390
391 **Missing metabolites or reactions**
392 - Confirm all required columns present in input
393 - Check for empty rows or malformed data
394 - Validate reaction formula parsing
395
396 ### Error Messages
397
398 | Error | Cause | Solution |
399 |-------|-------|----------|
400 | "Input file not found" | Invalid file path | Check file location and permissions |
401 | "Unknown format" | Invalid output format | Use: sbml, json, mat, or yaml |
402 | "Formula parsing failed" | Malformed reaction equation | Check reaction formula syntax |
403 | "Model infeasible" | Conflicting constraints | Review bounds and exchange reactions |
404
405 ### Performance Issues
406
407 **Slow conversion**
408 - Large input files require more processing time
409 - Complex GPR rules increase parsing overhead
410 - Monitor system memory usage
411
412 **Memory errors**
413 - Reduce model size or split into smaller files
414 - Increase available system memory
415 - Use more efficient data structures
416
417 **Output file corruption**
418 - Ensure sufficient disk space
419 - Check file write permissions
420 - Verify format-specific requirements
421
422 ## Advanced Usage
423
424 ### Batch Model Creation
425
426 ```python
427 #!/usr/bin/env python3
428 import subprocess
429 import pandas as pd
430
431 # Create multiple tissue-specific models
432 tissues = ['liver', 'muscle', 'brain', 'heart']
433 base_data = pd.read_csv('base_model.csv')
434
435 for tissue in tissues:
436 # Modify base data for tissue specificity
437 tissue_data = customize_for_tissue(base_data, tissue)
438 tissue_data.to_csv(f'{tissue}_model.csv', index=False)
439
440 # Convert to SBML
441 subprocess.run([
442 'exportMetabolicModel',
443 '--input', f'{tissue}_model.csv',
444 '--format', 'sbml',
445 '--output', f'{tissue}_model.xml',
446 '--out_log', f'{tissue}_conversion.log',
447 '--tool_dir', '/opt/COBRAxy/src'
448 ])
449 ```
450
451 ### Model Merging
452
453 Combine multiple tabular files into comprehensive models:
454
455 ```bash
456 # Merge core metabolism with tissue-specific pathways
457 cat core_reactions.csv > combined_model.csv
458 tail -n +2 tissue_reactions.csv >> combined_model.csv
459 tail -n +2 disease_reactions.csv >> combined_model.csv
460
461 # Create merged model
462 exportMetabolicModel --input combined_model.csv \
463 --format sbml \
464 --output comprehensive_model.xml \
465 --tool_dir /opt/COBRAxy/src
466 ```
467
468 ### Model Versioning
469
470 Track model versions and changes:
471
472 ```bash
473 # Version control for model development
474 git add model_v1.csv
475 git commit -m "Initial model version"
476
477 # Create versioned models
478 exportMetabolicModel --input model_v1.csv --format sbml \
479 --output model_v1.xml --tool_dir /opt/COBRAxy/src
480 exportMetabolicModel --input model_v2.csv --format sbml \
481 --output model_v2.xml --tool_dir /opt/COBRAxy/src
482
483 # Compare model versions
484 cobra_compare_models model_v1.xml model_v2.xml
485 ```
486
487 ## See Also
488
489 - [Import Metabolic Model](import-metabolic-model.md) - Extract tabular data from existing models
490 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom models
491 - [Flux Simulation](flux-simulation.md) - Analyze custom models with flux sampling
492 - [Model Creation Tutorial](/tutorials/custom-model-creation.md)
493 - [COBRA Model Standards](/tutorials/cobra-model-standards.md)