Google Summer of Code 2024

COBRAxy: COBRA and MaREA4Galaxy

National Resource for Network Biology (NRNB)

Mentors:

Alex Graudenzi, alex.graudenzi@unimib.it
Chiara Damiani, chiara.damiani@unimib.it
Marco Antoniotti, marco.antoniotti@unimib.it

Contributor:

Luca Milazzo (University of Milano-Bicocca) – lucmil2000@gmail.com, luca.milazzo@epfl.ch

- -

Project Description

- The project focused on developing an advanced Galaxy tool that enhances the data mapping capabilities of MaREA4Galaxy. The extension of this framework includes the analysis of fluxomics data, starting from a metabolic model and progressing to the representation of up-regulated fluxes on a metabolic map. This tool enables users to perform constraint-based enrichment analysis of metabolic pathways. -

The primary goals of the project were:

Create a flux sampling and analysis interface to allow users to work with constraint-based metabolic models (e.g., sampling algorithms, FBA, pFBA, and FVA).
Adapt the existing clustering module to clusterize fluxomics data and implement additional clustering algorithms (e.g., Leiden and Louvain).
Build upon the existing module for visualizing enriched reactions based on RAS to create a new module for enrichment analysis of metabolic pathways based on simulated fluxomics data, and visualize the results on the metabolic map.

- -

What I Did

Updated all existing modules of MaREA4Galaxy to use recent versions of Python libraries, ensuring greater future compatibility.
Modified the "Custom Data Generator" tool to extract rules, reactions, bounds, and medium information from a COBRA model.
Developed the "RAS to Bound" tool, which generates metabolic reaction bounds based on the RAS matrix and a growth medium (either custom or one of 26 pre-defined settings), enabling the creation of cell-specific bounds from a generic metabolic model (e.g., ENGRO2 or a custom model).
Developed the "Flux Simulation" tool, allowing users to sample multiple metabolic models using cell-specific bounds, employing the CBS and OPTGP algorithms. This tool also supports flux analysis using FBA, pFBA, FVA, and biomass sensitivity analysis.
Developed the "Metabolic Flux Enrichment Analysis" tool, which visualizes up-regulated fluxes identified by the "Flux Simulation" tool, compares different sub-classes identified by the clustering tool over fluxomics data, and visualizes all results on the metabolic map.

- -

Current State and Future Extensions

- Currently, the updated MaREA4Galaxy tool allows users to perform constraint-based enrichment analysis of metabolic pathways using RNA-seq profiles by simulating fluxomics. Additionally, users can compare different sub-populations identified by the clustering tool. The architecture minimizes computational costs by handling cell-specific models through a set of bounds, without storing complete COBRA models, which would contain a large amount of redundant information. -

- The implementation of the "Metabolic Flux Enrichment Analysis" tool did not leave enough time to extend the clustering module to new algorithms such as HDBSCAN, Leiden, and Louvain. This is a potential future extension to consider. Moreover, implementing a more advanced clustering grid search could further optimize clustering results. -

- -

About the Code

- I worked on the Mercurial repository of MaREA4Galaxy, where this document is stored. I committed all my changes, as shown by the repository history, though without using any Git-like merge operations due to the limitations of the Mercurial interface. -

- -

Conclusions

- Over the past years, I have focused on biology-related subjects, particularly metabolic fluxes and other omics data such as gene expression datasets. Through this project, I was able to apply the knowledge I have gained in constraint-based modeling, flux sampling, and omics enrichment analysis by expanding the MaREA4Galaxy tool. This experience not only enhanced my programming skills but also deepened my understanding of the real needs of biologists when working with such omics data. -