Proteins and Natural Language: AI Enables the Design of Novel Proteins

Artificial intelligence (AI) has created new possibilities for designing tailor-made proteins to solve everything from medical to ecological problems. A research team at the University of Bayreuth led by Prof. Dr. Birte Höcker has now successfully applied a computer-based natural language processing model to protein research. Completely independently, the ProtGPT2 model designs new proteins that are capable of stable folding and could take over defined functions in larger molecular contexts. The model and its potential are detailed scientifically in Nature Communications.

Natural languages and proteins are actually similar in structure. Amino acids arrange themselves in a multitude of combinations to form structures that have specific functions in the living organism - similar to the way words form sentences in different combinations that express certain facts. In recent years, numerous approaches have therefore been developed to use principles and processes that control the computer-assisted processing of natural language in protein research. "Natural language processing has made extraordinary progress thanks to new AI technologies. Today, models of language processing enable machines not only to understand meaningful sentences but also to generate them themselves. Such a model was the starting point of our research. With detailed information concerning about 50 million sequences of natural proteins, my colleague Noelia Ferruz trained the model and enabled it to generate protein sequences on its own. It now understands the language of proteins and can use it creatively. We have found that these creative designs follow the basic principles of natural proteins," says Prof. Dr. Birte Höcker, Head of the Protein Design Group at the University of Bayreuth.

The language processing model transferred to protein evolution is called "ProtGPT2". It can now be used to design proteins that adopt stable structures through folding and are permanently functional in this state. In addition, the Bayreuth biochemists have found out, through complex investigations, that the model can even create proteins that do not occur in nature and have possibly never existed in the history of evolution. These findings shed light on the immeasurable world of possible proteins and open a door to designing them in novel and unexplored ways. There is a further advantage: Most proteins that have been designed de novo so far have idealised structures. Before such structures can have a potential application, they usually must pass through an elaborate functionalization process - for example by inserting extensions and cavities - so that they can interact with their environment and take on precisely defined functions in larger system contexts. ProtGPT2, on the other hand, generates proteins that have such differentiated structures innately, and are thus already operational in their respective environments.

"Our new model is another impressive demonstration of the systemic affinity of protein design and natural language processing. Artificial intelligence opens up highly interesting and promising possibilities to use methods of language processing for the production of customised proteins. At the University of Bayreuth, we hope to contribute in this way to developing innovative solutions for biomedical, pharmaceutical, and ecological problems," says Prof. Dr. Birte Höcker.

Ferruz N, Schmidt S, Höcker B.
ProtGPT2 is a deep unsupervised language model for protein design.
Nat Commun 13, 4348, 2022. doi: 10.1038/s41467-022-32007-7

Most Popular Now

Is AI in Medicine Playing Fair?

As artificial intelligence (AI) rapidly integrates into health care, a new study by researchers at the Icahn School of Medicine at Mount Sinai reveals that all generative AI models may...

Generative AI's Diagnostic Capabili…

The use of generative AI for diagnostics has attracted attention in the medical field and many research papers have been published on this topic. However, because the evaluation criteria were...

New System for the Early Detection of Au…

A team from the Human-Tech Institute-Universitat Politècnica de València has developed a new system for the early detection of Autism Spectrum Disorder (ASD) using virtual reality and artificial intelligence. The...

AI Tool can Track Effectiveness of Multi…

A new artificial intelligence (AI) tool that can help interpret and assess how well treatments are working for patients with multiple sclerosis (MS) has been developed by UCL researchers. AI uses...

Diagnoses and Treatment Recommendations …

A new study led by Prof. Dan Zeltzer, a digital health expert from the Berglas School of Economics at Tel Aviv University, compared the quality of diagnostic and treatment recommendations...

Dr Jason Broch Joins the Highland Market…

The Highland Marketing advisory board has welcomed a new member - Dr Jason Broch, a GP and director with a strong track record in the NHS and IT-enabled transformation. Dr Broch...

Surrey and Sussex Healthcare NHS Trust g…

Surrey and Sussex Healthcare NHS Trust has marked an important milestone in connecting busy radiologists across large parts of South East England, following the successful go live of Sectra's enterprise...

Multi-Resistance in Bacteria Predicted b…

An AI model trained on large amounts of genetic data can predict whether bacteria will become antibiotic-resistant. The new study shows that antibiotic resistance is more easily transmitted between genetically...

DMEA 2025 Ends with Record Attendance an…

8 - 10 April 2025, Berlin, Germany. DMEA 2025 came to a successful close with record attendance and an impressive program. 20,500 participants attended Europe's leading digital health event over the...