comparison COBRAxy/docs/troubleshooting.md @ 550:4cf00f21f609 draft default tip

Uploaded
author francesco_lapi
date Mon, 03 Nov 2025 14:49:49 +0000
parents 73f2f7e2be17
children
comparison
equal deleted inserted replaced
549:4c5fdcefce8e 550:4cf00f21f609
64 64
65 # Windows (using conda) 65 # Windows (using conda)
66 conda install -c conda-forge glpk swiglpk 66 conda install -c conda-forge glpk swiglpk
67 ``` 67 ```
68 68
69 **Problem**: SVG processing errors 69
70 ## Galaxy Tool Issues
71
72 ### Import Metabolic Model
73
74 **Error message**:
70 ```bash 75 ```bash
71 # Install libvips for image processing 76 Traceback (most recent call last):
72 # Ubuntu/Debian: sudo apt-get install libvips 77 File "/export/tool_deps/_conda/envs/mulled-v1-d3fef6bda7daedb89425f527672b54ab0a4be6cfe3c8725b7f8c0948e0c80773/lib/python3.11/site-packages/cobra/io/sbml.py", line 458, in read_sbml_model
73 # macOS: brew install vips 78 return _sbml_to_model(doc, number=number, f_replace=f_replace, **kwargs)
79 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
80 File "/export/tool_deps/_conda/envs/mulled-v1-d3fef6bda7daedb89425f527672b54ab0a4be6cfe3c8725b7f8c0948e0c80773/lib/python3.11/site-packages/cobra/io/sbml.py", line 563, in _sbml_to_model
81 raise CobraSBMLError("No SBML model detected in file.")
82 cobra.io.sbml.CobraSBMLError: No SBML model detected in file.
74 ``` 83 ```
75 84
76 ## Data Format Issues 85 **Meaning:**
86 The Import Metabolic Model tool cannot read the input file as a valid SBML model with FBC annotations.
77 87
78 ### Gene Expression Problems 88 **Suggested Action:**
89 Verify that the input XML file is in proper SBML format and includes all necessary FBC annotations.
79 90
80 **Problem**: "No computable scores" error 91
81 ``` 92 ### Flux simulation
82 Cause: Gene IDs don't match between data and model 93
83 Solution: 94 **Error message**:
84 1. Check gene ID format (HGNC vs symbols vs Ensembl) 95 ```bash
85 2. Verify first column contains gene identifiers 96 Execution aborted: wrong format of bounds dataset
86 3. Ensure tab-separated format
87 4. Try different built-in model
88 ``` 97 ```
89 98
90 **Problem**: Many "gene not found" warnings 99 **Meaning:**
91 ```python 100 Flux simulation cannot read the bounds of the metabolic model for the constrained simulation problem (optimization or sampling).
92 # Check gene overlap with model 101 This usually happens if the input “Bound file(s): *” is incorrect. For example, it occurs when the **RasToBounds - Cell Class** file is passed instead of the collection of bound files named **"RAS to bounds"**.
93 import pickle
94 genes_dict = pickle.load(open('src/local/pickle files/ENGRO2_genes.p', 'rb'))
95 model_genes = set(genes_dict['hugo_id'].keys())
96 102
97 import pandas as pd 103 **Suggested Action:**
98 data_genes = set(pd.read_csv('expression.tsv', sep='\t').iloc[:, 0]) 104 Check the input files and ensure the correct bounds collection is used.
99
100 overlap = len(model_genes.intersection(data_genes))
101 print(f"Gene overlap: {overlap}/{len(data_genes)} ({overlap/len(data_genes)*100:.1f}%)")
102 ```
103
104 **Problem**: File format not recognized
105 ```tsv
106 # Correct format - tab-separated:
107 Gene_ID Sample_1 Sample_2
108 HGNC:5 10.5 11.2
109 HGNC:10 3.2 4.1
110
111 # Wrong - comma-separated or spaces will fail
112 ```
113
114 ### Model Issues
115
116 **Problem**: Custom model not loading
117 ```
118 Solution:
119 1. Check TSV format with "GPR" column header
120 2. Verify reaction IDs are unique
121 3. Test GPR syntax (use 'and'/'or', proper parentheses)
122 4. Check file permissions and encoding (UTF-8)
123 ```
124
125 ## Tool Execution Errors
126
127
128
129 ### File Path Problems
130
131 **Problem**: "File not found" errors
132 ```python
133 # Use absolute paths
134 from pathlib import Path
135
136 input_file = str(Path('expression.tsv').absolute())
137
138 args = ['-in', input_file, ...]
139 ```
140
141 **Problem**: Permission denied
142 ```bash
143 # Check write permissions
144 ls -la output_directory/
145
146 # Fix permissions
147 chmod 755 output_directory/
148 chmod 644 input_files/*
149 ```
150
151 ### Galaxy Integration Issues
152
153 **Problem**: COBRAxy tools not appearing in Galaxy
154 ```xml
155 <!-- Check tool_conf.xml syntax -->
156 <section id="cobraxy" name="COBRAxy">
157 <tool file="cobraxy/ras_generator.xml" />
158 </section>
159
160 <!-- Verify file paths are correct -->
161 ls tools/cobraxy/ras_generator.xml
162 ```
163
164 **Problem**: Tool execution fails in Galaxy
165 ```
166 Check Galaxy logs:
167 - main.log: General Galaxy issues
168 - handler.log: Job execution problems
169 - uwsgi.log: Web server issues
170
171 Common fixes:
172 1. Restart Galaxy after adding tools
173 2. Check Python environment has COBRApy installed
174 3. Verify file permissions on tool files
175 ```
176
177
178
179 **Problem**: Flux sampling hangs
180 ```bash
181 # Check solver availability
182 python -c "import cobra; print(cobra.Configuration().solver)"
183
184 # Should show: glpk, cplex, or gurobi
185 # Install GLPK if missing:
186 pip install swiglpk
187 ```
188
189 ### Large Dataset Handling
190
191 **Problem**: Cannot process large expression matrices
192 ```python
193 # Process in chunks
194 def process_large_dataset(expression_file, chunk_size=1000):
195 df = pd.read_csv(expression_file, sep='\t')
196
197 for i in range(0, len(df), chunk_size):
198 chunk = df.iloc[i:i+chunk_size]
199 chunk_file = f'chunk_{i}.tsv'
200 chunk.to_csv(chunk_file, sep='\t', index=False)
201
202 # Process chunk
203 ras_generator.main(['-in', chunk_file, ...])
204 ```
205
206 ## Output Validation
207
208 ### Unexpected Results
209
210 **Problem**: All RAS values are zero or null
211 ```python
212 # Debug gene mapping
213 import pandas as pd
214 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
215
216 # Check data quality
217 print(f"Null percentage: {ras_df.isnull().sum().sum() / ras_df.size * 100:.1f}%")
218 print(f"Zero percentage: {(ras_df == 0).sum().sum() / ras_df.size * 100:.1f}%")
219
220 # Check expression data preprocessing
221 expr_df = pd.read_csv('expression.tsv', sep='\t', index_col=0)
222 print(f"Expression range: {expr_df.min().min():.2f} to {expr_df.max().max():.2f}")
223 ```
224
225 **Problem**: RAS values seem too high/low
226 ```
227 Possible causes:
228 1. Expression data not log-transformed
229 2. Wrong normalization method
230 3. Incorrect gene ID mapping
231 4. GPR rule interpretation issues
232
233 Solutions:
234 1. Check expression data preprocessing
235 2. Validate against known control genes
236 3. Compare with published metabolic activity patterns
237 ```
238
239 ### Missing Pathway Maps
240
241 **Problem**: MAREA generates no output maps
242 ```
243 Debug steps:
244 1. Check RAS input has non-null values
245 2. Verify model choice matches RAS generation
246 3. Check statistical significance thresholds
247 4. Look at log files for specific errors
248 ```
249
250 ## Environment Issues
251
252 ### Conda/Virtual Environment Problems
253
254 **Problem**: Tool import fails in virtual environment
255 ```bash
256 # Activate environment properly
257 source venv/bin/activate # Linux/macOS
258 # or
259 venv\Scripts\activate # Windows
260
261 # Verify COBRAxy installation
262 pip list | grep cobra
263 python -c "import cobra; print('COBRApy version:', cobra.__version__)"
264 ```
265
266 **Problem**: Version conflicts
267 ```bash
268 # Create clean environment
269 conda create -n cobraxy python=3.9
270 conda activate cobraxy
271
272 # Install COBRAxy fresh
273 cd COBRAxy/src
274 pip install -e .
275 ```
276
277 ### Cross-Platform Issues
278
279 **Problem**: Windows path separator issues
280 ```python
281 # Use pathlib for cross-platform paths
282 from pathlib import Path
283
284 # Instead of: '/path/to/file'
285 # Use: str(Path('path') / 'to' / 'file')
286 ```
287
288 **Problem**: Line ending issues (Windows/Unix)
289 ```bash
290 # Convert line endings if needed
291 dos2unix input_file.tsv # Unix
292 unix2dos input_file.tsv # Windows
293 ```
294
295 ## Debugging Strategies
296
297 ### Enable Detailed Logging
298
299 ```python
300 import logging
301 logging.basicConfig(level=logging.DEBUG)
302
303 # Many tools accept log file parameter
304 args = [..., '--out_log', 'detailed.log']
305 ```
306
307 ### Test with Small Datasets
308
309 ```python
310 # Create minimal test case
311 test_data = """Gene_ID Sample1 Sample2
312 HGNC:5 10.0 15.0
313 HGNC:10 5.0 8.0"""
314
315 with open('test_input.tsv', 'w') as f:
316 f.write(test_data)
317
318 # Test basic functionality
319 ras_generator.main(['-in', 'test_input.tsv',
320 '-ra', 'test_output.tsv', '-rs', 'ENGRO2'])
321 ```
322
323 ### Check Dependencies
324
325 ```python
326 # Verify all required packages
327 required_packages = ['cobra', 'pandas', 'numpy', 'scipy']
328
329 for package in required_packages:
330 try:
331 __import__(package)
332 print(f"✓ {package}")
333 except ImportError:
334 print(f"✗ {package} - MISSING")
335 ```
336 105
337 ## Getting Help 106 ## Getting Help
338 107
339 ### Information to Include in Bug Reports 108 ### Information to Include in Bug Reports
340 109
366 - Tested with built-in example data 135 - Tested with built-in example data
367 - Searched existing GitHub issues 136 - Searched existing GitHub issues
368 - Tried alternative models/parameters 137 - Tried alternative models/parameters
369 - Checked file formats and permissions 138 - Checked file formats and permissions
370 139
371 ## Prevention Tips
372
373 ### Best Practices
374
375 1. **Use virtual environments** to avoid conflicts
376 2. **Validate input data** before processing
377 3. **Start with small datasets** for testing
378 4. **Keep backups** of working configurations
379 5. **Document successful workflows** for reuse
380 6. **Test after updates** to catch regressions
381
382 ### Data Quality Checks
383
384 ```python
385 def validate_expression_data(filename):
386 """Validate gene expression file format."""
387 df = pd.read_csv(filename, sep='\t')
388
389 # Check basic format
390 assert df.shape[0] > 0, "Empty file"
391 assert df.shape[1] > 1, "Need at least 2 columns"
392
393 # Check numeric data
394 numeric_cols = df.select_dtypes(include=[np.number]).columns
395 assert len(numeric_cols) > 0, "No numeric expression data"
396
397 # Check for missing values
398 null_pct = df.isnull().sum().sum() / df.size * 100
399 if null_pct > 50:
400 print(f"Warning: {null_pct:.1f}% missing values")
401
402 print(f"✓ File valid: {df.shape[0]} genes × {df.shape[1]-1} samples")
403 ```
404 140
405 This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages. 141 This troubleshooting guide covers the most common issues. For tool-specific problems, check the individual tool documentation pages.