“In a way, the problem is solved,” explained computational biologist John Moult in late 2020. London-based firm DeepMind had just won a biannual competition co-founded by Moult that tests teams’ abilities to predict protein structures — one of the largest of biology challenges – with its revolutionary artificial intelligence (AI) tool AlphaFold.
Two years later, Moult’s competition, the Critical Assessment of Structure Prediction (CASP), is still walking in the long shadow of AlphaFold. The results of this year’s edition (CASP15) – presented at a conference in Antalya, Turkey over the weekend – show that the most successful approaches to predicting protein structures from their amino acid sequences involve AlphaFold, which is based on an AI approach called deep learning. “Everyone uses AlphaFold,” says Yang Zhang, a computational biologist at the University of Michigan at Ann Arbor.
But AlphaFold’s progress has opened the floodgates to new challenges in protein structure prediction – some of which are included in this year’s CASP – that may require new approaches and more time to fully address them. “The low-hanging fruit was picked,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. “Some of the next problems will be more difficult.”
CASP was founded in 1994 with the goal of bringing accuracy to the field of protein structure prediction – an advance that would accelerate efforts to understand the building blocks of cells and advance drug discovery. During the year of a competition, teams are tasked with using computer tools to predict the structures of proteins that have been determined using experimental methods such as X-ray crystallography and cryo-electron microscopy, but have not yet been published.
Contributions are judged on how well the predictions for whole proteins or independently foldable subunits, called domains, match the experimental structures. Some of AlphaFold’s predictions at CASP14 were more or less indistinguishable from the experimental models – the first time such accuracy has been achieved.
Since its introduction at CASP14, AlphaFold has become ubiquitous in life science research. DeepMind released the software’s underlying code in 2021 so anyone could run the program, and an AlphaFold database updated this year contains predicted structures — of varying quality — for nearly every protein from every organism represented in genomic databases , a total of more than 200 million proteins.
The success and newfound ubiquity of AlphaFold posed a challenge for Moult, who works at the University of Maryland, Rockville, and his colleagues as they planned this year’s CASP. “People say, ‘Oh, we don’t need CASP anymore, the problem has been solved.’ And I think that’s exactly the wrong way.”
At CASP15, the most successful teams were those that had adapted and built on AlphaFold in various ways, resulting in modest gains in predicting the shape of individual proteins and domains. “The accuracy is already so high that it can hardly be better,” says Moult.
To make the competition more relevant in a post-AlphaFold world, Moult and his team added new challenges and tweaked some existing ones. New tests include determining how proteins interact with other molecules, such as drugs, and predicting the diverse forms some proteins can take. For the past decade, CASP has recorded “complexes” made up of multiple interacting proteins, says Moult, but accurately predicting the structure of such molecules has gained additional focus this year.
“That’s the way to go,” Zhang says, because predicting the structures of individual proteins, or domains — the bread and butter of past CASPs — has been largely solved by AlphaFold. Determining the shape of protein complexes in particular represents an important new challenge for the field, as there is much room for improvement, says Arne Elofsson, protein bioinformatician at Stockholm University.
AlphaFold was originally developed to predict the shape of individual proteins. But within days of its release, other scientists showed that the software could be “hacked” to model how multiple proteins interact. In the months since, researchers have developed myriad approaches to improve AlphaFold’s ability to address complexities. DeepMind even released an update called AlphaFold-Multimer with this goal.
These efforts appear to have paid off, as CASP15 saw a significant increase in the number of accurate complexes compared to previous competitions, largely due to methods that adapted AlphaFold. “It’s a new game for us to be with complexes of near-experimental accuracy,” says Mauser. “We also have some failures.”
For example, the teams made amazingly accurate predictions about a viral molecule of unknown function, made up of two identical proteins that are intertwined. This type of shape has puzzled pre-AlphaFold tools, says Ezgi Karaca, a computational structural biologist at Turkey’s Izmir Biomedicine and Genome Center, who evaluated the complex predictions. The standard version of AlphaFold couldn’t accurately model the shape of a giant 20-chain bacterial enzyme, but some teams predicted the protein’s structure by applying additional hacks to the network, adds Karaca.
Meanwhile, teams have struggled to predict complexes involving immune molecules called antibodies – including several bound to a SARS-CoV-2 protein – and related molecules called nanobodies. But there was a chance of success in some teams’ predictions, says Karaca, hinting that hacks in AlphaFold will be useful in predicting the shape of these medically important molecules.
Also notable at this year’s CASP was the absence of DeepMind. The company gave no reason for not participating, but released a brief statement during CASP15 congratulating the participating teams. (At the same time, an update was rolled out to AlphaFold to help researchers compare their progress against the network.)
Other researchers say the competition is a significant investment of time that the company might have been better able to spend on other challenges. “It would have been nice for us if they had participated,” says Moult. But he adds: “Because the methods are so good, they couldn’t make another big leap.”
Making big improvements to AlphaFold will take time, researchers say, and will likely require new innovations in machine learning and protein structure prediction. One area under development is the application of “language models” such as those used in text prediction tools to the prediction of protein structures. But these methods — including one developed by social networking giant Meta — didn’t fare nearly as well at CASP15 as tools based on AlphaFold.
However, such tools could be useful for predicting how mutations alter a protein’s structure – one of several key challenges in protein structure prediction that has emerged from the success of AlphaFold. Thanks to this, the field is no longer focused on a single target, says AlQuraishi. “There’s a whole bunch of those issues.”