Welcome Evo, Generative AI for the Genome

Brian Hie runs the Laboratory of Evolutionary Design at Stanford, where he works at the crossroads of artificial intelligence and biology. Not long ago, Hie pondered a provocative question: If a tool like ChatGPT can write original sentences based on patterns found in massive collections of previously written words, what happens if we replace written words with genetic code?

The answer to that seemingly simple question has become Evo, a generative AI model that writes genetic code. Hie and his colleagues at the Arc Institute and the University of California, Berkeley, introduced Evo in a paper in the journal Science. Hie says that researchers might use Evo to understand how microbial and viral genomes work, to fashion new proteins (i.e., drugs) that never existed before, and to reprogram microbes to accomplish remarkable tasks, from improving photosynthesis for carbon sequestration and higher crop yields to gobbling up microplastics from the oceans.

"Instead of having to use brute force testing or mining promising sequences from nature, all of which are quite unpredictable, we now have an AI model for generating systems of interest, allowing researchers to focus only on the most promising possibilities," said Hie, assistant professor of chemical engineering. "Evo puts the genomes of whole lifeforms within reach and accelerates the bioengineering design process."

Evo could even lead to deeper understanding of evolution itself, new understandings of genetic diseases, and new treatments – all achieved on a computer rather than in a lab.

Natural insight

The inspiration comes from nature itself. The instructions of all life are encoded in DNA. Better understanding of the complex interplay of DNA, RNA, and bioproteins - and how they have evolved over time - will lead to deeper knowledge and the ability to reprogram the microbes into useful technologies.

But all is not so easy as it seems. Even simple microbes have complex genomes with millions of base pairs. Two of Evo’s key advances compared to similar existing tools are expanding the length of sequences models can process at once from roughly 8,000 base pairs to more than 131,000 base pairs - known as the "context window" - and improving the resolution to the scale of individual nucleotides, the building blocks of DNA.

Evo was trained on the genomes of 80,000 microbes and 2.7 million prokaryotic and phage genomes, covering 300 billion nucleotides, as well as on smaller DNA loops known as plasmids. To preempt the use of Evo for the development of bioweapons, however, the team had to exclude the genomes of viruses known to infect humans and certain other organisms.

Evo is able to learn how small changes in nucleotide sequences affect the evolutionary fitness of whole organisms and generate DNA sequences of more than 1 million base pairs - more than seven times the context window of 131,000 base pairs, Hie added. By comparison, the smallest “minimal” bacterial genomes are about 580,000 base pairs in length, the researchers note.

Proof of concept

As a proof of concept of Evo's design capabilities, Hie and colleagues prompted Evo to generate novel synthetic CRISPR-Cas molecular complexes and systems. CRISPR-Cas systems are like tiny molecular machines that use proteins and RNA in tandem to edit DNA. In response to that prompt, Evo created a fully functional, previously unknown CRISPR system that was validated after testing 11 possible designs. Evo's CRISPR exploration is the first example of simultaneous protein-RNA codesign using a language model, Hie noted.

Next up, Hie is already working on expanding Evo's ability to process larger genomic sequences as well as to achieve greater control over its outputs, as well as to broaden his research beyond the microbial world to human and other genomes.

"Evo opens up a lot of very interesting research at the intersection of machine learning and biology," Hie said. "It creates opportunities for discoveries that were previously unimaginable and accelerates our ability to engineer life itself."

Evo is open source and publicly available for interested researchers to download.

The research was supported by the Fannie and John Hertz Foundation; National Science Foundation Graduate Fellowship Program; National Center for Advancing Translational Sciences of the National Institutes of Health; National Institutes of Health; National Science Foundation grants; US DEVCOM Army Research Laboratory grants; Office of Naval Research; Stanford HAI; NXP, Xilinx, LETI-CEA, Intel, IBM, Microsoft, NEC, Toshiba, TSMC, ARM, Hitachi, BASF, Accenture, Ericsson, Qualcomm, Analog Devices, Google Cloud, Salesforce, Total, the HAI-GCP Cloud Credits for Research program, the Stanford Data Science Initiative, and members of the Stanford DAWN project: Meta, Google, and VMWare; the Arc Institute; the Rainwater Foundation; the Curci Foundation; Rose Hill Investigators Program; V. and N. Khosla; S. Altman; anonymous gifts to the Hsu laboratory; V. Gupta; and R. Tonsing.

Nguyen E, Poli M, Durrant MG, Kang B, Katrekar D, Li DB, Bartie LJ, Thomas AW, King SH, Brixi G, Sullivan J, Ng MY, Lewis A, Lou A, Ermon S, Baccus SA, Hernandez-Boussard T, Ré C, Hsu PD, Hie BL.
Sequence modeling and design from molecular to genome scale with Evo.
Science. 2024 Nov 15;386(6723):eado9336. doi: 10.1126/science.ado9336

Most Popular Now

AI-Powered CRISPR could Lead to Faster G…

Stanford Medicine researchers have developed an artificial intelligence (AI) tool to help scientists better plan gene-editing experiments. The technology, CRISPR-GPT, acts as a gene-editing “copilot” supported by AI to help...

Groundbreaking AI Aims to Speed Lifesavi…

To solve a problem, we have to see it clearly. Whether it’s an infection by a novel virus or memory-stealing plaques forming in the brains of Alzheimer’s patients, visualizing disease processes...

AI Spots Hidden Signs of Depression in S…

Depression is one of the most common mental health challenges, but its early signs are often overlooked. It is often linked to reduced facial expressivity. However, whether mild depression or...

AI Model Forecasts Disease Risk Decades …

Imagine a future where your medical history could help predict what health conditions you might face in the next two decades. Researchers have developed a generative AI model that uses...

AI Tools Help Predict Severe Asthma Risk…

Mayo Clinic researchers have developed artificial intelligence (AI) tools that help identify which children with asthma face the highest risk of serious asthma exacerbation and acute respiratory infections. The study...

AI Model Indicates Four out of Ten Breas…

A project at Lund University in Sweden has trained an AI model to identify breast cancer patients who could be spared from axillary surgery. The model analyses previously unutilised information...

Smart Device Uses AI and Bioelectronics …

As a wound heals, it goes through several stages: clotting to stop bleeding, immune system response, scabbing, and scarring. A wearable device called "a-Heal," designed by engineers at the University...

AI Distinguishes Glioblastoma from Look-…

A Harvard Medical School–led research team has developed an AI tool that can reliably tell apart two look-alike cancers found in the brain but with different origins, behaviors, and treatments. The...

ChatGPT 4o Therapeutic Chatbot 'Ama…

One of the first randomized controlled trials assessing the effectiveness of a large language model (LLM) chatbot 'Amanda' for relationship support shows that a single session of chatbot therapy...

Overcoming the AI Applicability Crisis a…

Opinion Article by Harry Lykostratis, Chief Executive, Open Medical. The government’s 10 Year Health Plan makes a lot of the potential of AI-software to support clinical decision making, improve productivity, and...

Dartford and Gravesham Implements Clinis…

Dartford and Gravesham NHS Trust has taken a significant step towards a more digital future by rolling out electronic test ordering using Clinisys ICE. The trust deployed the order communications...