Wednesday, December 4, 2024
HomeScienceAlphaFold's AI protein structure predictions have limitations

AlphaFold’s AI protein structure predictions have limitations

As people around the world marveled in July at the most detailed pictures of the cosmos snapped by the James Webb Space Telescope, biologists got their first glimpses of a different set of images — ones that could help revolutionize life sciences research.

These images show the predicted 3-D shapes for more than 200,000,000 proteins. They were created by AlphaFold, an artificial intelligence system. “You can think of it as covering the entire protein universe,” said Demis Hassabis at a July 26 news briefing. DeepMind, a London-based company that developed the system, is cofounded by Hassabis. Combining several deep-learning methods, the computer program can predict protein shapes using patterns that have been identified in structures. This is done by using electron microscopes and other techniques.

The AI’s first splash came in 2021, with predictions for 350,000 protein structures — including almost all known human proteins. DeepMind partnered up with the European Bioinformatics Institute at the European Molecular Biology Laboratory in order to make these structures public.

July’s massive new release expanded the library to “almost every organism on the planet that has had its genome sequenced,” Hassabis said. “You can look up a 3-D structure of a protein almost as easily as doing a key word Google search.”

These are not actual structures, but predictions. Researchers have used some 2021 predictions to help them develop. potential new malaria vaccines, improve understanding of Parkinson’s disease, work out how to protect honeybee health, gain insight into human evolution and more. DeepMind has also placed AlphaFold’s attention on neglected tropical diseases such as Chagas Disease and Leishmaniasis. If left untreated, these can lead to debilitating and even fatal complications.

Many scientists were thrilled to see the huge data set released. Others worry that the predictions will be mistaken for the actual shapes of proteins by researchers. There are still things AlphaFold can’t do — and wasn’t designed to do — that need to be tackled before the protein cosmos completely comes into focus.

Having the new catalog open to everyone is “a huge benefit,” says Julie Forman-Kay, a protein biophysicist at the Hospital for Sick Children and the University of Toronto. AlphaFold is used in many cases. RoseTTAFoldAI researchers are also excited about another AI tool, which predicts shapes that match well with experiments’ protein profiles. But, she cautions, “it’s not that way across the board.”

Some proteins are more accurately predicted than others. Erroneous predictions could leave some scientists thinking they understand how a protein works when really, they don’t. Forman-Kay states that it is still crucial to perform rigorous experiments in order to understand how proteins fold. “There’s this sense now that people don’t have to do experimental structure determination, which is not true.”

Plodding progress

Proteins begin as long chains made up of amino acids. They then fold into many curlicues or other 3-D shapes. Some look like the corkscrew ringlets from a 1980s perm, or the pleats in an accordion. Others could be mistaken for a child’s spiraling scribbles.

A protein’s architecture is more than just aesthetics; it can determine how that protein functions. Enzymes, which are proteins that can capture small molecules to carry out chemical reactions, require a space where they can do so. Proteins that function in a protein complex (two or more proteins interconnected like parts of a machine) need the right shapes to form with their partners.

Knowing the folds, coils and loops of a protein’s shape may help scientists decipher how, for example, a mutation alters that shape to cause disease. This knowledge may also be useful in developing better vaccines or drugs.

For years, scientists have bombarded protein crystals with X-rays, flash frozen cells and examined them under high­powered electron microscopes, and used other methods to discover the secrets of protein shapes. Such experimental methods take “a lot of personnel time, a lot of effort and a lot of money. So it’s been slow,” says Tamir Gonen, a membrane biophysicist and Howard Hughes Medical Institute investigator at the David Geffen School of Medicine at UCLA.

This meticulous, expensive research has led to the discovery of the 3-D structures for more than 194,000 proteins. Their data files are stored in the Protein Data Bank and supported by a consortium if research organizations. But the accelerating pace at which geneticists are deciphering the DNA instructions for making proteins has far outstripped structural biologists’ ability to keep up, says systems biologist Nazim Bouatta of Harvard Medical School. “The question for structural biologists was, how do we close the gap?” he says.

Many researchers have longed for computer programs that could analyze the DNA of a gene to predict how the protein encoded by the gene would be folded into a 3-D form.

AlphaFold is here

Scientists have made significant progress towards this AI goal over many decades. But “until two years ago, we were really a long way from anything like a good solution,” says John Moult, a computational biologist at the University of Maryland’s Rockville campus.

Moult, one of the organizers of the Critical Assessment of protein structure Prediction or CASP competition, is Moult. Organizers give competitors a set of proteins for their algorithms to fold and compare the machines’ predictions against experimentally determined structures. The actual shapes of proteins were not accurately predicted by most AIs.

“Structure doesn’t tell you everything about how a protein works.”

Jane Dyson

In 2020, AlphaFoldIt was a huge success, with a high level of accuracy in predicting the structures for 90 percent of test protein structures, including two-thirds that were as accurate as experimental methods.

Since its inception, 1994, CASP has focused on the deciphering of single proteins’ structures. With AlphaFold’s performance, “suddenly, that was essentially done,” Moult says.

Since AlphaFold’s 2021 release, more than half a million scientists have accessed its database, Hassabis said in the news briefing. Some researchers, for example, have used AlphaFold’s predictions to help them get closer to completing a massive biological puzzle: the nuclear pore complex. The key portals that allow molecules to enter and exit cell nuclei are nuclear pores. Without the pores, cells wouldn’t work properly. The pore itself is large, relative speaking. It contains approximately 1,000 pieces made of around 30 different proteins. Researchers had previously been able to place around 30% of the pieces in this puzzle.

This puzzle is solved now More than 60% completeResearchers published their findings in the June 10, 2012 issue of the journal. They combined AlphaFold predictions with experimental methods to understand how they fit together. Science.

AlphaFold is now able to solve single protein folding. This year, CASP organizers have asked teams to tackle the next challenge: To predict the structures of RNA molecules as well as model the interactions between proteins and other molecules.

For those sorts of tasks, Moult says, deep-learning AI methods “look promising but have not yet delivered the goods.”

Where AI fails

Being able to model protein interactions would be a big advantage because most proteins don’t operate in isolation. They interact with other proteins and other molecules in the cells. But AlphaFold’s accuracy at predicting how the shapes of two proteins might change when the proteins interact are “nowhere near” that of its spot-on projections for a slew of single proteins, says Forman-Kay, the University of Toronto protein biophysicist. That’s something AlphaFold’s creators acknowledge too.

AI was trained to fold proteins by studying the contours of existing structures. Experimentally, it has been possible to solve many more multiprotein complexes that single proteins.

Forman-Kay studies proteins which refuse to conform to any one shape. These Proteins that are intrinsically disorderedThey are often as loose as wet noodles (SN: 2/9/13, p. 26). When they interact with other molecules or proteins, some will form defined shapes. They can also be paired with other proteins or molecules to create new shapes.

AlphaFold’s predicted shapesThe team reached a high confidence level of about 60% of the wiggly proteins Forman-Kay, colleagues examined, they reported in a preliminary report posted February 21 at bioRxiv.org. The program often depicts shapeshifters as long corkscrews called “alpha helices”.

Forman-Kay’s group compared AlphaFold’s predictions for three disordered proteins with experimental data. The structure alpha-synuclein was assigned by the AI resembles the shape the protein takes when it interacts to lipids, according to the team. But that’s not the way the protein looks all the time.

For another protein, called eukaryotic translation initiation factor 4E-binding protein 2, AlphaFold predicted a mishmash of the protein’s two shapes when working with two different partners. That Frankenstein structure, which doesn’t exist in actual organisms, could mislead researchers about how the protein works, Forman-Kay and colleagues say.

AlphaFold might also be a bit too rigid in its predictions. A static “structure doesn’t tell you everything about how a protein works,” says Jane Dyson, a structural biologist at the Scripps Research Institute in La Jolla, Calif. Even single proteins with generally well-defined structures aren’t frozen in space. When facilitating chemical reactions, enzymes, for instance, can undergo tiny shape changes.

Dyson states that AlphaFold will display a fixed image that could closely match what scientists have determined using X-ray crystallography to predict the enzyme’s structure. “But [it will] not show you any of the subtleties that are changing as the different partners” interact with the enzyme.

“The dynamics are what Mr. AlphaFold can’t give you,” Dyson says.

Revolution in the making

Computer renderings give biologists an advantage in solving problems like how a drug might interact to a protein. But scientists should remember one thing: “These are models,” not experimentally deciphered structures, says Gonen, at UCLA.

He uses AlphaFold’s protein predictions to help make sense of experimental data, but he worries that researchers will accept the AI’s predictions as gospel. If that happens, “the risk is that it will become harder and harder and harder to justify why you need to solve an experimental structure.” That could lead to reduced funding, talent and other resources for the types of experiments needed to check the computer’s work and forge new ground, he says.

Harvard Medical School’s Bouatta is more optimistic. He thinks that researchers probably don’t need to invest experimental resources in the types of proteins that AlphaFold does a good job of predicting, which should help structural biologists triage where to put their time and money.

“There are proteins for which AlphaFold is still struggling,” Bouatta agrees. He believes that researchers should invest their capital there. “Maybe if we generate more [experimental] data for those challenging proteins, we could use them for retraining another AI system” that could make even better predictions.

He and his colleagues have already reverse engineered AlphaFold so that they can create a version called OpenFold, which researchers can use to solve other problems such as the gnarly, but crucial protein complexes.

Massive amounts are created by the Human Genome ProjectThey have opened up new areas of research and made many biological discoveries.SN: 2/12/22, p. 22). Bouatta believes that having structural information about 200 million proteins could also be revolutionary.

In the future, thanks to AlphaFold and its AI kin, he says, “we don’t even know what sorts of questions we might be asking.”

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular