annotate COBRAxy/docs/tools/ras-generator.md @ 542:fcdbc81feb45 draft

Uploaded
author francesco_lapi
date Sun, 26 Oct 2025 19:27:41 +0000
parents fd53d42348bd
children 73f2f7e2be17
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
1 # RAS Generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
3 Generate Reaction Activity Scores (RAS) from gene expression data and GPR (Gene-Protein-Reaction) rules.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
5 ## Overview
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
6
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
7 The RAS Generator computes metabolic reaction activity by:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
8 1. Mapping gene expression to reactions via GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
9 2. Applying logical operations (AND/OR) for enzyme complexes
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
10 3. Producing activity scores for each reaction in each sample
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
11
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
12 **Input**: Gene expression data + GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
13 **Output**: Reaction activity scores (RAS)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
14
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
15 ## Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
16
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
17 ### Required Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
18
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
19 | Parameter | Short | Type | Description |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
20 |-----------|--------|------|-------------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
21 | `--input` | `-in` | file | Gene expression dataset (TSV format) |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
22 | `--ras_output` | `-ra` | file | Output file for RAS values |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
23 | `--rules_selector` | `-rs` | choice | Built-in model (ENGRO2, Recon, HMRcore) |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
24
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
25 ### Optional Parameters
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
26
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
27 | Parameter | Short | Type | Default | Description |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
28 |-----------|--------|------|---------|-------------|
542
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
29 | `--tool_dir` | `-td` | string | auto-detected | COBRAxy installation directory (automatically detected after pip install) |
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
30 | `--none` | `-n` | boolean | true | Handle missing gene values |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
31 | `--model_upload` | `-rl` | file | - | Custom GPR rules file |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
32 | `--model_upload_name` | `-rn` | string | - | Custom model name |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
33 | `--out_log` | - | file | log.txt | Output log file |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
34
542
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
35 > **Note**: After installing COBRAxy via pip, the `--tool_dir` parameter is automatically detected and doesn't need to be specified.
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
36
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
37 ## Input Format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
38
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
39 ### Gene Expression File
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
40 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
41 Gene_ID Sample_1 Sample_2 Sample_3 Sample_4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
42 HGNC:5 10.5 11.2 15.7 14.3
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
43 HGNC:10 3.2 4.1 8.8 7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
44 HGNC:15 7.9 8.2 4.4 5.1
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
45 HGNC:25 12.1 13.5 18.2 17.8
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
46 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
47
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
48 **Requirements**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
49 - First column: Gene identifiers (HGNC, Ensembl, Entrez, etc.)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
50 - Subsequent columns: Expression values (numeric)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
51 - Header row with sample names
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
52 - Tab-separated format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
53
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
54 ### Custom GPR Rules File (Optional)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
55 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
56 Reaction_ID GPR
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
57 R_HEX1 HGNC:4922
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
58 R_PGI HGNC:8906
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
59 R_PFK HGNC:8877 or HGNC:8878
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
60 R_ALDOA HGNC:414 and HGNC:417
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
61 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
62
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
63 ## Algorithm Details
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
64
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
65 ### GPR Rule Processing
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
66
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
67 **Gene Mapping**: Each gene in the expression data is mapped to reactions via GPR rules.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
68
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
69 **Logical Operations**:
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
70 - **OR**: `Gene1 or Gene2` → `expr1 + expr2`
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
71 - **AND**: `Gene1 and Gene2` → `min(expr1, expr2)`
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
72
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
73 **Missing Gene Handling**:
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
74 - `-n true`: Ignore missing genes in the GPR rules.
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
75 - `-n false`: Missing genes cause reaction score to be NaN
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
76
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
77 ### RAS Computation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
78
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
79 **Example**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
80 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
81 GPR: (HGNC:5 and HGNC:10) or HGNC:15
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
82 Expression: HGNC:5=10.5, HGNC:10=3.2, HGNC:15=7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
83 RAS = max(min(10.5, 3.2), 7.9) = max(3.2, 7.9) = 7.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
84 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
85
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
86 ## Output Format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
87
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
88 ### RAS Values File
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
89 ```tsv
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
90 Reactions Sample_1 Sample_2 Sample_3 Sample_4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
91 R_HEX1 8.5 9.2 12.1 11.3
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
92 R_PGI 7.3 8.1 6.4 7.2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
93 R_PFK 15.2 16.8 20.1 18.9
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
94 R_ALDOA 3.2 4.1 4.4 5.1
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
95 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
96
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
97 **Format**:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
98 - First column: Reaction identifiers
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
99 - Subsequent columns: RAS values for each sample
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
100 - Missing values represented as "None"
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
101
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
102 ## Usage Examples
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
103
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
104 ### Command Line
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
105
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
106 ```bash
542
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
107 # Basic usage with built-in model (after pip install)
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
108 ras_generator \
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
109 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
110 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
111 -rs ENGRO2
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
112
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
113 # With custom model and strict missing gene handling
542
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
114 ras_generator \
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
115 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
116 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
117 -rl custom_rules.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
118 -rn "CustomModel" \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
119 -n false
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
120
542
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
121 # Explicitly specify tool directory (only needed if not using pip install)
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
122 ras_generator -td /path/to/COBRAxy \
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
123 -in expression_data.tsv \
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
124 -ra ras_output.tsv \
fcdbc81feb45 Uploaded
francesco_lapi
parents: 538
diff changeset
125 -rs ENGRO2
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
126 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
127
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
128 ### Galaxy Usage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
129
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
130 1. Upload gene expression file to Galaxy
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
131 2. Select **RAS Generator** from COBRAxy tools
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
132 3. Configure parameters:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
133 - **Input dataset**: Your expression file
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
134 - **Rule selector**: ENGRO2 (or other model)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
135 - **Handle missing genes**: Yes/No
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
136 4. Click **Execute**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
137
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
138 ## Built-in Models
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
139
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
140 ### ENGRO2 (Recommended for most analyses)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
141 - **Scope**: Focused human metabolism
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
142 - **Reactions**: ~500
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
143 - **Genes**: ~500
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
144 - **Use case**: Core metabolic analysis
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
145
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
146 ### Recon (Comprehensive analysis)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
147 - **Scope**: Complete human metabolism
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
148 - **Reactions**: ~10,000
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
149 - **Genes**: ~2,000
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
150 - **Use case**: Genome-wide metabolic studies
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
151
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
152 ## Gene ID Mapping
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
153
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
154 COBRAxy supports multiple gene identifier formats:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
155
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
156 | Format | Example | Notes |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
157 |--------|---------|--------|
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
158 | **HGNC ID** | HGNC:5 | Recommended, most stable |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
159 | **HGNC Symbol** | ALDOA | Human-readable but may change |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
160 | **Ensembl** | ENSG00000149925 | Version-specific |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
161 | **Entrez** | 226 | Numeric identifier |
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
162
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
163 **Recommendation**: Use HGNC IDs for best compatibility and stability.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
164
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
165
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
166
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
167 ## Troubleshooting
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
168
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
169 ### Common Issues
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
170
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
171 **"Gene not found" warnings**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
172 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
173 Solution: Check gene ID format matches model expectations
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
174 - Verify gene identifiers (HGNC vs symbols vs Ensembl)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
175 - Use gene mapping tools if needed
538
fd53d42348bd Uploaded
francesco_lapi
parents: 492
diff changeset
176 - Set -n true to handle missing genes
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
177 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
178
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
179 **"No computable scores" error**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
180 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
181 Solution: Insufficient gene overlap between data and model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
182 - Check gene ID format compatibility
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
183 - Verify expression file format
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
184 - Try different built-in model
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
185 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
186
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
187 **Empty output file**
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
188 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
189 Solution: Check input file format and permissions
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
190 - Ensure TSV format with proper headers
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
191 - Verify file paths are correct
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
192 - Check write permissions for output directory
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
193 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
194
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
195
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
196
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
197 ### Debug Mode
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
198
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
199 Enable detailed logging:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
200
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
201 ```bash
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
202 ras_generator -td /path/to/COBRAxy \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
203 -in expression_data.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
204 -ra ras_output.tsv \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
205 -rs ENGRO2 \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
206 --out_log detailed_log.txt
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
207 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
208
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
209 Check log file for detailed error messages and processing statistics.
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
210
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
211 ## Validation
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
212
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
213 ### Check Output Quality
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
214
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
215 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
216 import pandas as pd
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
217
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
218 # Read RAS output
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
219 ras_df = pd.read_csv('ras_output.tsv', sep='\t', index_col=0)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
220
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
221 # Basic statistics
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
222 print(f"RAS matrix shape: {ras_df.shape}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
223 print(f"Non-null values: {ras_df.count().sum()}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
224 print(f"Value range: {ras_df.min().min():.2f} to {ras_df.max().max():.2f}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
225
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
226 # Check for problematic reactions
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
227 null_reactions = ras_df.isnull().all(axis=1).sum()
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
228 print(f"Reactions with no data: {null_reactions}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
229 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
230
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
231
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
232 ## Integration with Other Tools
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
233
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
234 ### Downstream Analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
235
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
236 RAS output can be used with:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
237
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
238 - **[MAREA](marea.md)**: Statistical enrichment analysis
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
239 - **[RAS to Bounds](ras-to-bounds.md)**: Flux constraint application
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
240 - **[MAREA Cluster](marea-cluster.md)**: Sample clustering
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
241
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
242 ### Preprocessing Options
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
243
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
244 Before RAS generation:
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
245 - **Normalize** expression data (log2, quantile, etc.)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
246 - **Filter** low-expression genes
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
247 - **Batch correct** if multiple datasets
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
248
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
249 ## Advanced Usage
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
250
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
251 ### Custom Model Integration
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
252
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
253 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
254 # Create custom GPR rules
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
255 custom_rules = {
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
256 'R_CUSTOM1': 'HGNC:5 and HGNC:10',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
257 'R_CUSTOM2': 'HGNC:15 or HGNC:20'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
258 }
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
259
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
260 # Save as TSV
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
261 import pandas as pd
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
262 rules_df = pd.DataFrame(list(custom_rules.items()),
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
263 columns=['Reaction_ID', 'GPR'])
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
264 rules_df.to_csv('custom_rules.tsv', sep='\t', index=False)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
265
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
266 # Use with RAS generator
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
267 args = ['-rl', 'custom_rules.tsv', '-rn', 'CustomModel']
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
268 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
269
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
270 ### Batch Processing
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
271
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
272 ```python
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
273 # Process multiple expression files
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
274 expression_files = ['data1.tsv', 'data2.tsv', 'data3.tsv']
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
275
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
276 for i, exp_file in enumerate(expression_files):
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
277 output_file = f'ras_output_{i}.tsv'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
278
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
279 args = [
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
280 '-td', '/path/to/COBRAxy',
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
281 '-in', exp_file,
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
282 '-ra', output_file,
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
283 '-rs', 'ENGRO2'
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
284 ]
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
285
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
286 ras_generator.main(args)
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
287 print(f"Processed {exp_file} → {output_file}")
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
288 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
289
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
290 ## References
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
291
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
292 - [COBRApy documentation](https://cobrapy.readthedocs.io/) - Underlying metabolic modeling
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
293 - [GPR rules format](https://cobrapy.readthedocs.io/en/stable/getting_started.html#gene-protein-reaction-rules) - Standard format specification
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
294 - [HGNC database](https://www.genenames.org/) - Gene nomenclature standards