annotate COBRAxy/docs/tools/marea-cluster.md @ 548:5aef7b860706 draft default tip

Uploaded
author francesco_lapi
date Tue, 28 Oct 2025 11:04:40 +0000
parents 73f2f7e2be17
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
1 # MAREA Cluster
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
2
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
3 Cluster analysis for metabolic data (RAS/RPS scores, flux distributions).
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
4
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
5 ## Overview
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
6
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
7 MAREA Cluster performs unsupervised clustering on metabolic data using K-means, DBSCAN, or hierarchical algorithms.
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
8
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
9 ## Galaxy Interface
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
10
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
11 In Galaxy: **COBRAxy → Cluster Analysis**
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
12
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
13 1. Upload metabolic data file
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
14 2. Select clustering algorithm and parameters
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
15 3. Click **Run tool**
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
16
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
17 ## Command-line console
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
18
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
19 ```bash
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
20 marea_cluster -in metabolic_data.tsv \
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
21 -cy kmeans \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
22 -sc true \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
23 -k1 2 \
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
24 -k2 10 \
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
25 -idop output/
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
26 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
27
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
28 ## Parameters
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
29
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
30 | Parameter | Flag | Description | Default |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
31 |-----------|------|-------------|---------|
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
32 | Input Data | `-in` | Metabolic data TSV file | - |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
33 | Algorithm | `-cy` | kmeans, dbscan, hierarchy | kmeans |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
34 | Scaling | `-sc` | Scale data | false |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
35 | K Min | `-k1` | Minimum clusters (K-means/hierarchy) | 2 |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
36 | K Max | `-k2` | Maximum clusters (K-means/hierarchy) | 10 |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
37 | Epsilon | `-ep` | DBSCAN radius | 0.5 |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
38 | Min Samples | `-ms` | DBSCAN minimum samples | 5 |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
39 | Elbow Plot | `-el` | Generate elbow plot | false |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
40 | Silhouette | `-si` | Compute silhouette scores | false |
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
41 | Output Path | `-idop` | Output directory | marea_cluster/ |
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
42
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
43 ## Input Format
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
44
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
45 ```
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
46 Reaction Sample1 Sample2 Sample3
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
47 R00001 1.25 0.85 1.42
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
48 R00002 0.65 1.35 0.72
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
49 ```
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
50
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
51 **File Format Notes:**
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
52 - Use **tab-separated** values (TSV) or **comma-separated** (CSV)
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
53 - First row must contain column headers (Reaction, Sample names)
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
54 - Numeric values only for metabolic data
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
55 - Missing values should be avoided or handled before clustering
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
56
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
57 ## Algorithms
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
58
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
59 - **K-means**: Fast, requires number of clusters
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
60 - **DBSCAN**: Density-based, handles noise and irregular shapes
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
61 - **Hierarchical**: Tree-based, good for small datasets
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
62
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
63 ## Output
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
64
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
65 - `clusters.tsv`: Sample assignments
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
66 - `silhouette_scores.tsv`: Cluster quality metrics
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
67 - `elbow_plot.svg`: Optimal K visualization (K-means)
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
68 - `*.log`: Processing log
492
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
69
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
70 ## See Also
4ed95023af20 Uploaded
francesco_lapi
parents:
diff changeset
71
547
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
72 - [MAREA](tools/marea)
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
73 - [RAS Generator](tools/ras-generator)
73f2f7e2be17 Uploaded
francesco_lapi
parents: 542
diff changeset
74 - [Flux Simulation](tools/flux-simulation)