|
492
|
1 # RAS Generator
|
|
|
2
|
|
|
3 Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules.
|
|
|
4
|
|
|
5 ## Overview
|
|
|
6
|
|
|
7 The RAS Generator computes metabolic reaction activity by:
|
|
|
8 1. Mapping gene expression to reactions via GPR rules
|
|
|
9 2. Applying logical operations (AND/OR) for enzyme complexes
|
|
|
10 3. Producing activity scores for each reaction in each sample
|
|
|
11
|
|
|
12 **Input**: Gene expression data + GPR rules
|
|
|
13 **Output**: Reaction activity scores (RAS)
|
|
|
14
|
|
|
15 ## Parameters
|
|
|
16
|
|
|
17 ### Required Parameters
|
|
|
18
|
|
|
19 | Parameter | Short | Type | Description |
|
|
|
20 |-----------|--------|------|-------------|
|
|
|
21 | `--input` | `-in` | file | Gene expression dataset (TSV format) |
|
|
|
22 | `--ras_output` | `-ra` | file | Output file for RAS values |
|
|
|
23 | `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) |
|
|
|
24
|
|
|
25 ### Optional Parameters
|
|
|
26
|
|
|
27 | Parameter | Short | Type | Default | Description |
|
|
|
28 |-----------|--------|------|---------|-------------|
|
|
542
|
29 | `--tool_dir` | `-td` | string | auto-detected | COBRAxy installation directory (automatically detected after pip install) |
|
|
492
|
30 | `--none` | `-n` | boolean | true | Handle missing gene values |
|
|
|
31 | `--model_upload` | `-rl` | file | - | Custom GPR rules file |
|
|
|
32 | `--model_upload_name` | `-rn` | string | - | Custom model name |
|
|
|
33 | `--out_log` | - | file | log.txt | Output log file |
|
|
|
34
|
|
542
|
35 > **Note**: After installing COBRAxy via pip, the `--tool_dir` parameter is automatically detected and doesn't need to be specified.
|
|
|
36
|
|
492
|
37 ## Input Format
|
|
|
38
|
|
|
39 ### Gene Expression File
|
|
|
40 ```tsv
|
|
|
41 Gene_ID Sample_1 Sample_2 Sample_3 Sample_4
|
|
|
42 HGNC:5 10.5 11.2 15.7 14.3
|
|
|
43 HGNC:10 3.2 4.1 8.8 7.9
|
|
|
44 HGNC:15 7.9 8.2 4.4 5.1
|
|
|
45 HGNC:25 12.1 13.5 18.2 17.8
|
|
|
46 ```
|
|
|
47
|
|
|
48 **Requirements**:
|
|
|
49 - First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.)
|
|
|
50 - Subsequent columns: Expression values (numeric)
|
|
|
51 - Header row with sample names
|
|
|
52 - Tab-separated format
|
|
|
53
|
|
|
54 ### Custom GPR Rules File (Optional)
|
|
|
55 ```tsv
|
|
|
56 Reaction_ID GPR
|
|
|
57 R_HEX1 HGNC:4922
|
|
|
58 R_PGI HGNC:8906
|
|
|
59 R_PFK HGNC:8877 or HGNC:8878
|
|
|
60 R_ALDOA HGNC:414 and HGNC:417
|
|
|
61 ```
|
|
|
62
|
|
|
63 ## Algorithm Details
|
|
|
64
|
|
|
65 ### GPR Rule Processing
|
|
|
66
|
|
|
67 **Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules.
|
|
|
68
|
|
|
69 **Logical Operations**:
|
|
538
|
70 - **OR**: `Gene1 or Gene2` → `expr1 + expr2`
|
|
492
|
71 - **AND**: `Gene1 and Gene2` → `min(expr1, expr2)`
|
|
|
72
|
|
|
73 **Missing Gene Handling**:
|
|
538
|
74 - `-n true`: Ignore missing genes in the GPR rules.
|
|
|
75 - `-n false`: Missing genes cause reaction score to be NaN
|
|
492
|
76
|
|
|
77 ### RAS Computation
|
|
|
78
|
|
|
79 **Example**:
|
|
|
80 ```
|
|
|
81 GPR: (HGNC:5 and HGNC:10) or HGNC:15
|
|
|
82 Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9
|
|
|
83 RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9
|
|
|
84 ```
|
|
|
85
|
|
|
86 ## Output Format
|
|
|
87
|
|
|
88 ### RAS Values File
|
|
|
89 ```tsv
|
|
|
90 Reactions Sample_1 Sample_2 Sample_3 Sample_4
|
|
|
91 R_HEX1 8.5 9.2 12.1 11.3
|
|
|
92 R_PGI 7.3 8.1 6.4 7.2
|
|
|
93 R_PFK 15.2 16.8 20.1 18.9
|
|
|
94 R_ALDOA 3.2 4.1 4.4 5.1
|
|
|
95 ```
|
|
|
96
|
|
|
97 **Format**:
|
|
|
98 - First column: Reaction identifiers
|
|
|
99 - Subsequent columns: RAS values for each sample
|
|
|
100 - Missing values represented as "None"
|
|
|
101
|
|
|
102 ## Usage Examples
|
|
|
103
|
|
|
104 ### Command Line
|
|
|
105
|
|
|
106 ```bash
|
|
542
|
107 # Basic usage with built-in model (after pip install)
|
|
|
108 ras_generator \
|
|
492
|
109 -in expression_data.tsv \
|
|
|
110 -ra ras_output.tsv \
|
|
|
111 -rs ENGRO2
|
|
|
112
|
|
|
113 # With custom model and strict missing gene handling
|
|
542
|
114 ras_generator \
|
|
492
|
115 -in expression_data.tsv \
|
|
|
116 -ra ras_output.tsv \
|
|
|
117 -rl custom_rules.tsv \
|
|
|
118 -rn "CustomModel" \
|
|
|
119 -n false
|
|
|
120
|
|
542
|
121 # Explicitly specify tool directory (only needed if not using pip install)
|
|
|
122 ras_generator -td /path/to/COBRAxy \
|
|
|
123 -in expression_data.tsv \
|
|
|
124 -ra ras_output.tsv \
|
|
|
125 -rs ENGRO2
|
|
492
|
126 ```
|
|
|
127
|
|
|
128 ### Galaxy Usage
|
|
|
129
|
|
|
130 1. Upload gene expression file to Galaxy
|
|
|
131 2. Select **RAS Generator** from COBRAxy tools
|
|
|
132 3. Configure parameters:
|
|
|
133 - **Input dataset**: Your expression file
|
|
|
134 - **Rule selector**: ENGRO2 (or other model)
|
|
|
135 - **Handle missing genes**: Yes/No
|
|
|
136 4. Click **Execute**
|
|
|
137
|
|
|
138 ## Built-in Models
|
|
|
139
|
|
|
140 ### ENGRO2 (Recommended for most analyses)
|
|
|
141 - **Scope**: Focused human metabolism
|
|
538
|
142 - **Reactions**: ~500
|
|
492
|
143 - **Genes**: ~500
|
|
538
|
144 - **Use case**: Core metabolic analysis
|
|
492
|
145
|
|
|
146 ### Recon (Comprehensive analysis)
|
|
|
147 - **Scope**: Complete human metabolism
|
|
|
148 - **Reactions**: ~10,000
|
|
|
149 - **Genes**: ~2,000
|
|
538
|
150 - **Use case**: Genome-wide metabolic studies
|
|
492
|
151
|
|
|
152 ## Gene ID Mapping
|
|
|
153
|
|
|
154 COBRAxy supports multiple gene identifier formats:
|
|
|
155
|
|
|
156 | Format | Example | Notes |
|
|
|
157 |--------|---------|--------|
|
|
|
158 | **HGNC ID** | HGNC:5 | Recommended, most stable |
|
|
|
159 | **HGNC Symbol** | ALDOA | Human-readable but may change |
|
|
|
160 | **Ensembl** | ENSG00000149925 | Version-specific |
|
|
|
161 | **Entrez** | 226 | Numeric identifier |
|
|
|
162
|
|
|
163 **Recommendation**: Use HGNC IDs for best compatibility and stability.
|
|
|
164
|
|
|
165
|
|
|
166
|
|
|
167 ## Troubleshooting
|
|
|
168
|
|
|
169 ### Common Issues
|
|
|
170
|
|
|
171 **"Gene not found" warnings**
|
|
|
172 ```
|
|
|
173 Solution: Check gene ID format matches model expectations
|
|
|
174 - Verify gene identifiers (HGNC vs symbols vs Ensembl)
|
|
|
175 - Use gene mapping tools if needed
|
|
538
|
176 - Set -n true to handle missing genes
|
|
492
|
177 ```
|
|
|
178
|
|
|
179 **"No computable scores" error**
|
|
|
180 ```
|
|
|
181 Solution: Insufficient gene overlap between data and model
|
|
|
182 - Check gene ID format compatibility
|
|
|
183 - Verify expression file format
|
|
|
184 - Try different built-in model
|
|
|
185 ```
|
|
|
186
|
|
|
187 **Empty output file**
|
|
|
188 ```
|
|
|
189 Solution: Check input file format and permissions
|
|
|
190 - Ensure TSV format with proper headers
|
|
|
191 - Verify file paths are correct
|
|
|
192 - Check write permissions for output directory
|
|
|
193 ```
|
|
|
194
|
|
|
195
|
|
|
196
|
|
|
197 ### Debug Mode
|
|
|
198
|
|
|
199 Enable detailed logging:
|
|
|
200
|
|
|
201 ```bash
|
|
|
202 ras_generator -td /path/to/COBRAxy \
|
|
|
203 -in expression_data.tsv \
|
|
|
204 -ra ras_output.tsv \
|
|
|
205 -rs ENGRO2 \
|
|
|
206 --out_log detailed_log.txt
|
|
|
207 ```
|
|
|
208
|
|
|
209 Check log file for detailed error messages and processing statistics.
|
|
|
210
|
|
|
211 ## Validation
|
|
|
212
|
|
|
213 ### Check Output Quality
|
|
|
214
|
|
|
215 ```python
|
|
|
216 import pandas as pd
|
|
|
217
|
|
|
218 # Read RAS output
|
|
|
219 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
|
|
|
220
|
|
|
221 # Basic statistics
|
|
|
222 print(f"RAS matrix shape: {ras_df.shape}")
|
|
|
223 print(f"Non-null values: {ras_df.count().sum()}")
|
|
|
224 print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}")
|
|
|
225
|
|
|
226 # Check for problematic reactions
|
|
|
227 null_reactions = ras_df.isnull().all(axis=1).sum()
|
|
|
228 print(f"Reactions with no data: {null_reactions}")
|
|
|
229 ```
|
|
|
230
|
|
|
231
|
|
|
232 ## Integration with Other Tools
|
|
|
233
|
|
|
234 ### Downstream Analysis
|
|
|
235
|
|
|
236 RAS output can be used with:
|
|
|
237
|
|
|
238 - **[MAREA](marea.md)**: Statistical enrichment analysis
|
|
|
239 - **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application
|
|
|
240 - **[MAREA Cluster](marea-cluster.md)**: Sample clustering
|
|
|
241
|
|
|
242 ### Preprocessing Options
|
|
|
243
|
|
|
244 Before RAS generation:
|
|
|
245 - **Normalize** expression data (log2, quantile, etc.)
|
|
|
246 - **Filter** low-expression genes
|
|
|
247 - **Batch correct** if multiple datasets
|
|
|
248
|
|
|
249 ## Advanced Usage
|
|
|
250
|
|
|
251 ### Custom Model Integration
|
|
|
252
|
|
|
253 ```python
|
|
|
254 # Create custom GPR rules
|
|
|
255 custom_rules = {
|
|
|
256 'R_CUSTOM1': 'HGNC:5 and HGNC:10',
|
|
|
257 'R_CUSTOM2': 'HGNC:15 or HGNC:20'
|
|
|
258 }
|
|
|
259
|
|
|
260 # Save as TSV
|
|
|
261 import pandas as pd
|
|
|
262 rules_df = pd.DataFrame(list(custom_rules.items()),
|
|
|
263 columns=['Reaction_ID', 'GPR'])
|
|
|
264 rules_df.to_csv('custom_rules.tsv', sep='\t', index=False)
|
|
|
265
|
|
|
266 # Use with RAS generator
|
|
|
267 args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel']
|
|
|
268 ```
|
|
|
269
|
|
|
270 ### Batch Processing
|
|
|
271
|
|
|
272 ```python
|
|
|
273 # Process multiple expression files
|
|
|
274 expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv']
|
|
|
275
|
|
|
276 for i, exp_file in enumerate(expression_files):
|
|
|
277 output_file = f'ras_output_{i}.tsv'
|
|
|
278
|
|
|
279 args = [
|
|
|
280 '-td', '/path/to/COBRAxy',
|
|
|
281 '-in', exp_file,
|
|
|
282 '-ra', output_file,
|
|
|
283 '-rs', 'ENGRO2'
|
|
|
284 ]
|
|
|
285
|
|
|
286 ras_generator.main(args)
|
|
|
287 print(f"Processed {exp_file} → {output_file}")
|
|
|
288 ```
|
|
|
289
|
|
|
290 ## References
|
|
|
291
|
|
|
292 - [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling
|
|
|
293 - [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification
|
|
|
294 - [HGNC database](https://www.genenames.org/) - Gene nomenclature standards |