A Machine Learning Tool for Diagnosing, Monitoring Colorectal Cancer

Scientists aiming to advance cancer diagnostics have developed a machine learning tool that is able to identify metabolism-related molecular profile differences between patients with colorectal cancer and healthy people.

The analysis of biological samples from more than 1,000 people also revealed metabolic shifts associated with changing disease severity and with genetic mutations known to increase the risk for colorectal cancer.

Though there is more analysis to come, the resulting "biomarker discovery pipeline" shows promise as a noninvasive method of diagnosing colorectal cancer and monitoring disease progression, said Jiangjiang Zhu, co-senior author of the study and an associate professor of human sciences at The Ohio State University.

"We believe this is a good tool for disease diagnostics and monitoring, especially because metabolic-based biomarker analysis could also be utilized to monitor treatment effectiveness," said Zhu, also an investigator in The Ohio State University Comprehensive Cancer Center Molecular Carcinogenesis and Chemoprevention Research Program.

"When a patient is taking drug A versus drug B, especially for cancer, time is essential. If they don’t have a good response, we want to know that as soon as possible so we can change the treatment regimen. If metabolites can help indicate a treatment's effectiveness faster than traditional methods like pathology or protein markers, we hope they could be good indicators for doctors who are caring for patients."

The tool is not intended to replace colonoscopy as the gold standard for cancer screening, Zhu said, and further study with additional samples is planned before the pipeline would be ready for translation to a clinical setting.

The research was published recently in the journal iMetaOmics.

This work also represents an advance in machine learning techniques, combining two established methods to design the new platform: partial least squares-discriminant analysis (PLS-DA) for big-picture differentiation of molecular profiles, and an artificial neural network (ANN) that, in this case, pinpoints molecules that improve the platform’s predictive value. The team called the resulting biomarker pipeline PANDA, short for PLS-ANN-DA.

"We took the best of both worlds and put them together to leverage their strengths and complement each other to offset their potential weaknesses," Zhu said. "We were looking at all kinds of possibilities to tease out the biomarkers that could be predictive or indicative of disease progression and the different stages of the disease. That gave us some strong confidence that this method has great potential for future diagnoses."

Two sets of biological data extracted from blood samples were analyzed: metabolites, products of biochemical reactions that break down food to produce energy and perform other essential functions, and transcripts, RNA readouts of DNA instructions that predict related protein changes.

The biological samples are a significant part of the study’s strength, Zhu said, because they were collected as part of large research projects: The Ohio Colorectal Cancer Prevention Initiative (OCCPI) and an Ohio State Wexner Medical Center clinical laboratory biobank. In all, 626 samples came from people with colorectal cancer - including patients with high-risk genetic mutations. Another 402 samples from age- and gender-matched healthy individuals were obtained by Jieli Li, co-senior study author and associate professor-clinical of pathology in Ohio State’s College of Medicine.

"We, as humans, at different stages of our lives, actually have quite different biochemistry," Zhu said. "This valuable collection of samples enabled us to run high-throughput metabolomics analysis to understand the molecular changes from people who don't have cancer with people who have cancer, and also from early-stage to late-stage disease.

"We also have data from patients with genetic mutations that we can compare to the metabolite data to look at whether metabolic changes are an indication of predictive values for the genetic mutations. To our knowledge, this is the first time this has been done at this scope and scale because we are looking at literally hundreds of patients."

Biomarkers are tricky to rely on for diagnostics across different populations because of the many conditions that affect molecular profiles in living systems - so this study highlights several molecular changes with potential, but not certainty, in assessing colorectal cancer's presence and progression in a nationally representative group of patients.

The metabolism pathways linked to one family of compounds called purines, which are needed for DNA formation and degradation, stood out in the analysis because they were more active overall in cancer patients compared to healthy controls, and were less active with more advanced tumor stages.

"It's certainly an indication that this biomarker may be associated with the underlying mechanisms of cancer biology," Zhu said. "We are cautiously optimistic in saying that we’re not only doing biomarker discovery, but we’re also providing clues for mechanistic investigations."

The team plans to continue analyzing metabolites related to different types of biological signals to refine the PANDA biomarker pipeline.

"Some of the markers we identified are a little bit finicky, and there’s a lot of noise within those signals, but we have pushed the field forward to develop potential next-generation biomarkers and the novel bioinformatics pipeline for colorectal cancer diagnosis and monitoring," Zhu said.

This work was supported by the National Institute of General Medical Sciences, an Ohio State fellowship and Pelotonia, which funded the statewide OCCPI. Zhu is also supported by the Provost’s Scarlet and Gray Associate Professor Program at Ohio State.

Xu R, Jung H, Choueiry F, Zhang S, Pearlman R, Hampel H, Jin N, Li J, Zhu J.
Novel machine-learning bioinformatics reveal distinct metabolic alterations for enhanced colorectal cancer diagnosis and monitoring.
iMetaOmics, 2: e70003, 2025. doi: 10.1002/imo2.70003

Most Popular Now

Study Finds One-Year Change on CT Scans …

Researchers at National Jewish Health have shown that subtle increases in lung scarring, detected by an artificial intelligence-based tool on CT scans taken one year apart, are associated with disease...

Yousif's Story with Sectra and The …

Embarking on healthcare technology career after leaving his home as a refugee during his teenage years, Yousif is passionate about making a difference. He reflects on an apprenticeship in which...

New AI Tools Help Scientists Track How D…

Artificial intelligence (AI) can solve problems at remarkable speed, but it’s the people developing the algorithms who are truly driving discovery. At The University of Texas at Arlington, data scientists...

AI Tool Offers Deep Insight into the Imm…

Researchers explore the human immune system by looking at the active components, namely the various genes and cells involved. But there is a broad range of these, and observations necessarily...

New Antibiotic Targets IBD - and AI Pred…

Researchers at McMaster University and the Massachusetts Institute of Technology (MIT) have made two scientific breakthroughs at once: they not only discovered a brand-new antibiotic that targets inflammatory bowel diseases...

Highland to Help Companies Seize 'N…

Health tech growth partner Highland has today revealed its new identity - reflecting a sharper focus as it helps health tech companies to find market opportunities, convince target audiences, and...