492
|
1 # Tabular to Metabolic Model
|
|
2
|
|
3 Convert tabular data (CSV/TSV) into COBRA metabolic models in various formats.
|
|
4
|
|
5 ## Overview
|
|
6
|
|
7 Tabular to Metabolic Model (tabular2MetabolicModel) converts structured tabular data containing reaction information into fully functional COBRA metabolic models. This tool enables creation of custom models from spreadsheet data and supports multiple output formats including SBML, JSON, MATLAB, and YAML.
|
|
8
|
|
9 ## Usage
|
|
10
|
|
11 ### Command Line
|
|
12
|
|
13 ```bash
|
|
14 tabular2MetabolicModel --input model_data.csv \
|
|
15 --format sbml \
|
|
16 --output custom_model.xml \
|
|
17 --out_log conversion.log \
|
|
18 --tool_dir /path/to/COBRAxy
|
|
19 ```
|
|
20
|
|
21 ### Galaxy Interface
|
|
22
|
|
23 Select "Tabular to Metabolic Model" from the COBRAxy tool suite and configure conversion parameters.
|
|
24
|
|
25 ## Parameters
|
|
26
|
|
27 ### Required Parameters
|
|
28
|
|
29 | Parameter | Flag | Description |
|
|
30 |-----------|------|-------------|
|
|
31 | Input File | `--input` | Tabular file (CSV/TSV) with model data |
|
|
32 | Output Format | `--format` | Model format (sbml, json, mat, yaml) |
|
|
33 | Output File | `--output` | Output model file path |
|
|
34 | Output Log | `--out_log` | Log file for conversion process |
|
|
35
|
|
36 ### Optional Parameters
|
|
37
|
|
38 | Parameter | Flag | Description | Default |
|
|
39 |-----------|------|-------------|---------|
|
|
40 | Tool Directory | `--tool_dir` | COBRAxy installation directory | Current directory |
|
|
41
|
|
42 ## Input Format
|
|
43
|
|
44 ### Tabular Model Data
|
|
45
|
|
46 The input file must contain structured model information with the following columns:
|
|
47
|
|
48 ```csv
|
|
49 Reaction_ID,GPR_Rule,Reaction_Formula,Lower_Bound,Upper_Bound,Objective_Coefficient,Medium_Member,Compartment,Subsystem
|
|
50 R00001,GENE1 or GENE2,A + B -> C + D,-1000.0,1000.0,0.0,FALSE,cytosol,Glycolysis
|
|
51 R00002,GENE3 and GENE4,E <-> F,-1000.0,1000.0,0.0,FALSE,mitochondria,TCA_Cycle
|
|
52 EX_glc_e,-,glc_e <->,-1000.0,1000.0,0.0,TRUE,extracellular,Exchange
|
|
53 BIOMASS,GENE5,0.5 A + 0.3 B -> 1 BIOMASS,0.0,1000.0,1.0,FALSE,cytosol,Biomass
|
|
54 ```
|
|
55
|
|
56 ### Required Columns
|
|
57
|
|
58 | Column | Description | Format |
|
|
59 |--------|-------------|--------|
|
|
60 | **Reaction_ID** | Unique reaction identifier | String |
|
|
61 | **Reaction_Formula** | Stoichiometric equation | Metabolite notation |
|
|
62 | **Lower_Bound** | Minimum flux constraint | Numeric |
|
|
63 | **Upper_Bound** | Maximum flux constraint | Numeric |
|
|
64
|
|
65 ### Optional Columns
|
|
66
|
|
67 | Column | Description | Default |
|
|
68 |--------|-------------|---------|
|
|
69 | **GPR_Rule** | Gene-protein-reaction association | Empty string |
|
|
70 | **Objective_Coefficient** | Biomass/objective weight | 0.0 |
|
|
71 | **Medium_Member** | Exchange reaction flag | FALSE |
|
|
72 | **Compartment** | Subcellular location | Empty |
|
|
73 | **Subsystem** | Metabolic pathway | Empty |
|
|
74
|
|
75 ## Output Formats
|
|
76
|
|
77 ### SBML (Systems Biology Markup Language)
|
|
78 - **Format**: XML-based standard
|
|
79 - **Extension**: `.xml` or `.sbml`
|
|
80 - **Use Case**: Interoperability with other tools
|
|
81 - **Advantages**: Widely supported, standardized
|
|
82
|
|
83 ### JSON (JavaScript Object Notation)
|
|
84 - **Format**: COBRApy native JSON
|
|
85 - **Extension**: `.json`
|
|
86 - **Use Case**: Python/COBRApy workflows
|
|
87 - **Advantages**: Human-readable, lightweight
|
|
88
|
|
89 ### MATLAB (.mat)
|
|
90 - **Format**: MATLAB workspace format
|
|
91 - **Extension**: `.mat`
|
|
92 - **Use Case**: MATLAB COBRA Toolbox
|
|
93 - **Advantages**: Direct MATLAB compatibility
|
|
94
|
|
95 ### YAML (YAML Ain't Markup Language)
|
|
96 - **Format**: Human-readable data serialization
|
|
97 - **Extension**: `.yml` or `.yaml`
|
|
98 - **Use Case**: Configuration and documentation
|
|
99 - **Advantages**: Most human-readable format
|
|
100
|
|
101 ## Reaction Formula Syntax
|
|
102
|
|
103 ### Standard Notation
|
|
104 ```
|
|
105 # Irreversible reaction
|
|
106 A + B -> C + D
|
|
107
|
|
108 # Reversible reaction
|
|
109 A + B <-> C + D
|
|
110
|
|
111 # With stoichiometric coefficients
|
|
112 2 A + 3 B -> 1 C + 4 D
|
|
113
|
|
114 # Compartmentalized metabolites
|
|
115 glc_c + atp_c -> g6p_c + adp_c
|
|
116 ```
|
|
117
|
|
118 ### Compartment Suffixes
|
|
119 - `_c`: Cytosol
|
|
120 - `_m`: Mitochondria
|
|
121 - `_e`: Extracellular
|
|
122 - `_r`: Endoplasmic reticulum
|
|
123 - `_x`: Peroxisome
|
|
124 - `_n`: Nucleus
|
|
125
|
|
126 ### Exchange Reactions
|
|
127 ```
|
|
128 # Import reaction
|
|
129 EX_glc_e: glc_e <->
|
|
130
|
|
131 # Export reaction
|
|
132 EX_co2_e: co2_e <->
|
|
133 ```
|
|
134
|
|
135 ## GPR Rule Syntax
|
|
136
|
|
137 ### Logical Operators
|
|
138 - **AND**: Gene products required together
|
|
139 - **OR**: Alternative gene products
|
|
140 - **Parentheses**: Grouping for complex logic
|
|
141
|
|
142 ### Examples
|
|
143 ```
|
|
144 # Single gene
|
|
145 GENE1
|
|
146
|
|
147 # Alternative genes (isozymes)
|
|
148 GENE1 or GENE2 or GENE3
|
|
149
|
|
150 # Required genes (complex)
|
|
151 GENE1 and GENE2
|
|
152
|
|
153 # Complex logic
|
|
154 (GENE1 and GENE2) or (GENE3 and GENE4)
|
|
155 ```
|
|
156
|
|
157 ## Examples
|
|
158
|
|
159 ### Create Basic Model
|
|
160
|
|
161 ```bash
|
|
162 # Convert simple CSV to SBML model
|
|
163 tabular2MetabolicModel --input simple_model.csv \
|
|
164 --format sbml \
|
|
165 --output simple_model.xml \
|
|
166 --out_log simple_conversion.log
|
|
167 ```
|
|
168
|
|
169 ### Multi-format Export
|
|
170
|
|
171 ```bash
|
|
172 # Create models in all supported formats
|
|
173 formats=("sbml" "json" "mat" "yaml")
|
|
174 for fmt in "${formats[@]}"; do
|
|
175 tabular2MetabolicModel --input comprehensive_model.csv \
|
|
176 --format "$fmt" \
|
|
177 --output "model.$fmt" \
|
|
178 --out_log "conversion_$fmt.log"
|
|
179 done
|
|
180 ```
|
|
181
|
|
182 ### Custom Model Creation
|
|
183
|
|
184 ```bash
|
|
185 # Build tissue-specific model from curated data
|
|
186 tabular2MetabolicModel --input liver_reactions.tsv \
|
|
187 --format sbml \
|
|
188 --output liver_model.xml \
|
|
189 --out_log liver_model.log \
|
|
190 --tool_dir /opt/COBRAxy
|
|
191 ```
|
|
192
|
|
193 ### Model Integration Pipeline
|
|
194
|
|
195 ```bash
|
|
196 # Extract existing model, modify, and recreate
|
|
197 metabolicModel2Tabular --model ENGRO2 --out_tabular base_model.csv
|
|
198
|
|
199 # Edit base_model.csv with custom reactions/constraints
|
|
200
|
|
201 # Create modified model
|
|
202 tabular2MetabolicModel --input modified_model.csv \
|
|
203 --format sbml \
|
|
204 --output custom_model.xml \
|
|
205 --out_log custom_creation.log
|
|
206 ```
|
|
207
|
|
208 ## Model Validation
|
|
209
|
|
210 ### Automatic Checks
|
|
211
|
|
212 The tool performs validation during conversion:
|
|
213 - **Stoichiometric Balance**: Reaction mass balance
|
|
214 - **Metabolite Consistency**: Compartment assignments
|
|
215 - **Bound Validation**: Feasible constraint ranges
|
|
216 - **Objective Function**: Valid biomass reaction
|
|
217
|
|
218 ### Post-conversion Validation
|
|
219
|
|
220 ```python
|
|
221 import cobra
|
|
222
|
|
223 # Load and validate model
|
|
224 model = cobra.io.read_sbml_model('custom_model.xml')
|
|
225
|
|
226 # Check basic properties
|
|
227 print(f"Reactions: {len(model.reactions)}")
|
|
228 print(f"Metabolites: {len(model.metabolites)}")
|
|
229 print(f"Genes: {len(model.genes)}")
|
|
230
|
|
231 # Test model solvability
|
|
232 solution = model.optimize()
|
|
233 print(f"Growth rate: {solution.objective_value}")
|
|
234
|
|
235 # Validate mass balance
|
|
236 unbalanced = cobra.flux_analysis.check_mass_balance(model)
|
|
237 if unbalanced:
|
|
238 print("Unbalanced reactions found:", unbalanced)
|
|
239 ```
|
|
240
|
|
241 ## Integration Workflow
|
|
242
|
|
243 ### Upstream Data Sources
|
|
244
|
|
245 #### COBRAxy Tools
|
|
246 - [Metabolic Model Setting](metabolic-model-setting.md) - Extract tabular data for modification
|
|
247
|
|
248 #### External Sources
|
|
249 - **Databases**: KEGG, Reactome, BiGG
|
|
250 - **Literature**: Manually curated reactions
|
|
251 - **Spreadsheets**: User-defined custom models
|
|
252
|
|
253 ### Downstream Applications
|
|
254
|
|
255 #### COBRAxy Analysis
|
|
256 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom model
|
|
257 - [Flux Simulation](flux-simulation.md) - Sample fluxes from custom model
|
|
258 - [MAREA](marea.md) - Analyze custom pathways
|
|
259
|
|
260 #### External Tools
|
|
261 - **COBRApy**: Python-based analysis
|
|
262 - **COBRA Toolbox**: MATLAB analysis
|
|
263 - **OptFlux**: Strain design
|
|
264 - **Escher**: Pathway visualization
|
|
265
|
|
266 ### Typical Pipeline
|
|
267
|
|
268 ```bash
|
|
269 # 1. Start with existing model data
|
|
270 metabolicModel2Tabular --model ENGRO2 \
|
|
271 --out_tabular base_reactions.csv
|
|
272
|
|
273 # 2. Modify/extend the reaction data
|
|
274 # Edit base_reactions.csv to add tissue-specific reactions
|
|
275
|
|
276 # 3. Create custom model
|
|
277 tabular2MetabolicModel --input modified_reactions.csv \
|
|
278 --format sbml \
|
|
279 --output tissue_model.xml \
|
|
280 --out_log tissue_creation.log
|
|
281
|
|
282 # 4. Validate and use custom model
|
|
283 ras_to_bounds --model Custom --input tissue_model.xml \
|
|
284 --ras_input tissue_expression.tsv \
|
|
285 --idop tissue_bounds/
|
|
286
|
|
287 # 5. Perform flux analysis
|
|
288 flux_simulation --model Custom --input tissue_model.xml \
|
|
289 --bounds tissue_bounds/*.tsv \
|
|
290 --algorithm CBS --idop tissue_fluxes/
|
|
291 ```
|
|
292
|
|
293 ## Quality Control
|
|
294
|
|
295 ### Input Data Validation
|
|
296
|
|
297 #### Pre-conversion Checks
|
|
298 - **Format Consistency**: Verify column headers and data types
|
|
299 - **Reaction Completeness**: Check for missing required fields
|
|
300 - **Stoichiometric Validity**: Validate reaction formulas
|
|
301 - **Bound Feasibility**: Ensure lower ≤ upper bounds
|
|
302
|
|
303 #### Common Data Issues
|
|
304 ```bash
|
|
305 # Check for missing reaction IDs
|
|
306 awk -F',' 'NR>1 && ($1=="" || $1=="NA") {print "Empty ID in line " NR}' input.csv
|
|
307
|
|
308 # Validate reaction directions
|
|
309 awk -F',' 'NR>1 && $3 !~ /->|<->/ {print "Invalid formula: " $1 ", " $3}' input.csv
|
|
310
|
|
311 # Check bound consistency
|
|
312 awk -F',' 'NR>1 && $4>$5 {print "Invalid bounds: " $1 ", LB=" $4 " > UB=" $5}' input.csv
|
|
313 ```
|
|
314
|
|
315 ### Model Quality Assessment
|
|
316
|
|
317 #### Structural Properties
|
|
318 - **Network Connectivity**: Ensure realistic pathway structure
|
|
319 - **Compartmentalization**: Validate transport reactions
|
|
320 - **Exchange Reactions**: Verify medium composition
|
|
321 - **Biomass Function**: Check objective reaction completeness
|
|
322
|
|
323 #### Functional Testing
|
|
324 ```python
|
|
325 # Test model functionality
|
|
326 model = cobra.io.read_sbml_model('custom_model.xml')
|
|
327
|
|
328 # Check growth capability
|
|
329 growth = model.optimize().objective_value
|
|
330 print(f"Maximum growth rate: {growth}")
|
|
331
|
|
332 # Flux Variability Analysis
|
|
333 fva_result = cobra.flux_analysis.flux_variability_analysis(model)
|
|
334 blocked_reactions = fva_result[(fva_result.minimum == 0) & (fva_result.maximum == 0)]
|
|
335 print(f"Blocked reactions: {len(blocked_reactions)}")
|
|
336
|
|
337 # Essential gene analysis
|
|
338 essential_genes = cobra.flux_analysis.find_essential_genes(model)
|
|
339 print(f"Essential genes: {len(essential_genes)}")
|
|
340 ```
|
|
341
|
|
342 ## Tips and Best Practices
|
|
343
|
|
344 ### Data Preparation
|
|
345 - **Consistent Naming**: Use systematic metabolite/reaction IDs
|
|
346 - **Compartment Notation**: Follow standard suffixes (_c, _m, _e)
|
|
347 - **Balanced Reactions**: Verify mass and charge balance
|
|
348 - **Realistic Bounds**: Use physiologically relevant constraints
|
|
349
|
|
350 ### Model Design
|
|
351 - **Modular Structure**: Organize reactions by pathway/subsystem
|
|
352 - **Exchange Reactions**: Include all necessary transport processes
|
|
353 - **Biomass Function**: Define appropriate growth objective
|
|
354 - **Gene Associations**: Add GPR rules where available
|
|
355
|
|
356 ### Format Selection
|
|
357 - **SBML**: Choose for maximum compatibility and sharing
|
|
358 - **JSON**: Use for COBRApy-specific workflows
|
|
359 - **MATLAB**: Select for COBRA Toolbox integration
|
|
360 - **YAML**: Pick for human-readable documentation
|
|
361
|
|
362 ### Performance Optimization
|
|
363 - **Model Size**: Balance comprehensiveness with computational efficiency
|
|
364 - **Reaction Pruning**: Remove unnecessary or blocked reactions
|
|
365 - **Compartmentalization**: Minimize unnecessary compartments
|
|
366 - **Validation**: Test model properties before distribution
|
|
367
|
|
368 ## Troubleshooting
|
|
369
|
|
370 ### Common Issues
|
|
371
|
|
372 **Conversion fails with format error**
|
|
373 - Check CSV/TSV column headers and data consistency
|
|
374 - Verify reaction formula syntax
|
|
375 - Ensure numeric fields contain valid numbers
|
|
376
|
|
377 **Model is infeasible after conversion**
|
|
378 - Check reaction bounds for conflicts
|
|
379 - Verify exchange reaction setup
|
|
380 - Validate stoichiometric balance
|
|
381
|
|
382 **Missing metabolites or reactions**
|
|
383 - Confirm all required columns present in input
|
|
384 - Check for empty rows or malformed data
|
|
385 - Validate reaction formula parsing
|
|
386
|
|
387 ### Error Messages
|
|
388
|
|
389 | Error | Cause | Solution |
|
|
390 |-------|-------|----------|
|
|
391 | "Input file not found" | Invalid file path | Check file location and permissions |
|
|
392 | "Unknown format" | Invalid output format | Use: sbml, json, mat, or yaml |
|
|
393 | "Formula parsing failed" | Malformed reaction equation | Check reaction formula syntax |
|
|
394 | "Model infeasible" | Conflicting constraints | Review bounds and exchange reactions |
|
|
395
|
|
396 ### Performance Issues
|
|
397
|
|
398 **Slow conversion**
|
|
399 - Large input files require more processing time
|
|
400 - Complex GPR rules increase parsing overhead
|
|
401 - Monitor system memory usage
|
|
402
|
|
403 **Memory errors**
|
|
404 - Reduce model size or split into smaller files
|
|
405 - Increase available system memory
|
|
406 - Use more efficient data structures
|
|
407
|
|
408 **Output file corruption**
|
|
409 - Ensure sufficient disk space
|
|
410 - Check file write permissions
|
|
411 - Verify format-specific requirements
|
|
412
|
|
413 ## Advanced Usage
|
|
414
|
|
415 ### Batch Model Creation
|
|
416
|
|
417 ```python
|
|
418 #!/usr/bin/env python3
|
|
419 import subprocess
|
|
420 import pandas as pd
|
|
421
|
|
422 # Create multiple tissue-specific models
|
|
423 tissues = ['liver', 'muscle', 'brain', 'heart']
|
|
424 base_data = pd.read_csv('base_model.csv')
|
|
425
|
|
426 for tissue in tissues:
|
|
427 # Modify base data for tissue specificity
|
|
428 tissue_data = customize_for_tissue(base_data, tissue)
|
|
429 tissue_data.to_csv(f'{tissue}_model.csv', index=False)
|
|
430
|
|
431 # Convert to SBML
|
|
432 subprocess.run([
|
|
433 'tabular2MetabolicModel',
|
|
434 '--input', f'{tissue}_model.csv',
|
|
435 '--format', 'sbml',
|
|
436 '--output', f'{tissue}_model.xml',
|
|
437 '--out_log', f'{tissue}_conversion.log'
|
|
438 ])
|
|
439 ```
|
|
440
|
|
441 ### Model Merging
|
|
442
|
|
443 Combine multiple tabular files into comprehensive models:
|
|
444
|
|
445 ```bash
|
|
446 # Merge core metabolism with tissue-specific pathways
|
|
447 cat core_reactions.csv > combined_model.csv
|
|
448 tail -n +2 tissue_reactions.csv >> combined_model.csv
|
|
449 tail -n +2 disease_reactions.csv >> combined_model.csv
|
|
450
|
|
451 # Create merged model
|
|
452 tabular2MetabolicModel --input combined_model.csv \
|
|
453 --format sbml \
|
|
454 --output comprehensive_model.xml
|
|
455 ```
|
|
456
|
|
457 ### Model Versioning
|
|
458
|
|
459 Track model versions and changes:
|
|
460
|
|
461 ```bash
|
|
462 # Version control for model development
|
|
463 git add model_v1.csv
|
|
464 git commit -m "Initial model version"
|
|
465
|
|
466 # Create versioned models
|
|
467 tabular2MetabolicModel --input model_v1.csv --format sbml --output model_v1.xml
|
|
468 tabular2MetabolicModel --input model_v2.csv --format sbml --output model_v2.xml
|
|
469
|
|
470 # Compare model versions
|
|
471 cobra_compare_models model_v1.xml model_v2.xml
|
|
472 ```
|
|
473
|
|
474 ## See Also
|
|
475
|
|
476 - [Metabolic Model Setting](metabolic-model-setting.md) - Extract tabular data from existing models
|
|
477 - [RAS to Bounds](ras-to-bounds.md) - Apply constraints to custom models
|
|
478 - [Flux Simulation](flux-simulation.md) - Analyze custom models with flux sampling
|
|
479 - [Model Creation Tutorial](../tutorials/custom-model-creation.md)
|
|
480 - [COBRA Model Standards](../tutorials/cobra-model-standards.md) |