|
492
|
1 # MAREA Cluster
|
|
|
2
|
|
547
|
3 Cluster analysis for metabolic data (RAS/RPS scores, flux distributions).
|
|
492
|
4
|
|
|
5 ## Overview
|
|
|
6
|
|
547
|
7 MAREA Cluster performs unsupervised clustering on metabolic data using K-means, DBSCAN, or hierarchical algorithms.
|
|
|
8
|
|
|
9 ## Galaxy Interface
|
|
|
10
|
|
|
11 In Galaxy: **COBRAxy → Cluster Analysis**
|
|
492
|
12
|
|
547
|
13 1. Upload metabolic data file
|
|
|
14 2. Select clustering algorithm and parameters
|
|
|
15 3. Click **Run tool**
|
|
492
|
16
|
|
547
|
17 ## Command-line console
|
|
492
|
18
|
|
|
19 ```bash
|
|
547
|
20 marea_cluster -in metabolic_data.tsv \
|
|
492
|
21 -cy kmeans \
|
|
|
22 -sc true \
|
|
|
23 -k1 2 \
|
|
|
24 -k2 10 \
|
|
547
|
25 -idop output/
|
|
492
|
26 ```
|
|
|
27
|
|
547
|
28 ## Parameters
|
|
492
|
29
|
|
547
|
30 | Parameter | Flag | Description | Default |
|
|
|
31 |-----------|------|-------------|---------|
|
|
|
32 | Input Data | `-in` | Metabolic data TSV file | - |
|
|
|
33 | Algorithm | `-cy` | kmeans, dbscan, hierarchy | kmeans |
|
|
|
34 | Scaling | `-sc` | Scale data | false |
|
|
|
35 | K Min | `-k1` | Minimum clusters (K-means/hierarchy) | 2 |
|
|
|
36 | K Max | `-k2` | Maximum clusters (K-means/hierarchy) | 10 |
|
|
|
37 | Epsilon | `-ep` | DBSCAN radius | 0.5 |
|
|
|
38 | Min Samples | `-ms` | DBSCAN minimum samples | 5 |
|
|
|
39 | Elbow Plot | `-el` | Generate elbow plot | false |
|
|
|
40 | Silhouette | `-si` | Compute silhouette scores | false |
|
|
|
41 | Output Path | `-idop` | Output directory | marea_cluster/ |
|
|
492
|
42
|
|
547
|
43 ## Input Format
|
|
492
|
44
|
|
547
|
45 ```
|
|
|
46 Reaction Sample1 Sample2 Sample3
|
|
|
47 R00001 1.25 0.85 1.42
|
|
|
48 R00002 0.65 1.35 0.72
|
|
492
|
49 ```
|
|
|
50
|
|
547
|
51 **File Format Notes:**
|
|
|
52 - Use **tab-separated** values (TSV) or **comma-separated** (CSV)
|
|
|
53 - First row must contain column headers (Reaction, Sample names)
|
|
|
54 - Numeric values only for metabolic data
|
|
|
55 - Missing values should be avoided or handled before clustering
|
|
492
|
56
|
|
547
|
57 ## Algorithms
|
|
492
|
58
|
|
547
|
59 - **K-means**: Fast, requires number of clusters
|
|
|
60 - **DBSCAN**: Density-based, handles noise and irregular shapes
|
|
|
61 - **Hierarchical**: Tree-based, good for small datasets
|
|
492
|
62
|
|
547
|
63 ## Output
|
|
492
|
64
|
|
547
|
65 - `clusters.tsv`: Sample assignments
|
|
|
66 - `silhouette_scores.tsv`: Cluster quality metrics
|
|
|
67 - `elbow_plot.svg`: Optimal K visualization (K-means)
|
|
|
68 - `*.log`: Processing log
|
|
492
|
69
|
|
|
70 ## See Also
|
|
|
71
|
|
547
|
72 - [MAREA](tools/marea)
|
|
|
73 - [RAS Generator](tools/ras-generator)
|
|
|
74 - [Flux Simulation](tools/flux-simulation)
|