AI vs. Viruses: How Machine Learning is Shaping Virology
Luca Gudermaan
The Growing Threat of Viruses and the Role of Machine Learning in Virology
Viruses are the leading cause of infectious disease and as the recent COVID-19 pandemic exemplified, viruses can be a major threat to our society by overwhelming our health care systems, leading to a high number of fatalities, economic disruptions and a general impairment of our everyday life.
The first infections of humans with SARS-CoV-2, the infectious agent causing COVID-19, were observed in late 2019. However, predecessors of this virus have circulated in bat populations as a reservoir and pangolin populations as an intermediate host before the zoonotic spillover to humans. Many other RNA viruses, which are characterized by high mutation rates and high transmissibility compared to DNA viruses, currently circulate in animal populations, raising the question when a similar event might happen again in the future. Moreover, viruses constantly escape the host’s immune mechanisms, which makes it difficult to predict outbreaks and develop lasting treatments.
Multiple research fields, such as molecular virology, epidemiology and immunology are dedicated to uncovering as much information about these infectious agents and their underlying mechanisms as possible. Advances in computational methods such as machine learning and an increase in computing power now lead to the application of artificial intelligence in these research fields, opening up new ways of classifying virus genomes, predicting molecular dynamics and discovering treatments.
This article will give a comprehensive overview of novel machine learning techniques and their applications in virology.
How AI Decodes Evolution by Unraveling Viral Family Trees
The construction of viral family trees, called phylogenetic analysis, is an important tool in virology. It helps researchers understand the origins and past evolution of viruses. Moreover, it can provide hints towards whether or not mutations are changing the genome over time in a way that leads to increased transmissibility, resistance to treatments or escape of vaccines.
Traditionally, phylogenetic analysis is performed with time-consuming computational methods. Researchers perform Sanger sequencing or next generation sequencing to obtain viral genome sequences and align these sequences using tools like ClustalW or MUSCLE. Classical algorithms are then used to construct the phylogenetic tree. This can be done either by calculating the genetic distance between sequences and grouping similar ones together, or by estimating the most likely evolutionary relationship between sequences using maximum likelihood probabilistic models, or by employing Bayesian Inference to estimate viral evolution over time and incorporating uncertainty.
These methods are accurate for small datasets but struggle with large genomic datasets. Moreover, they are statistically rigorous but computationally slow. With the rise of machine learning, phylogenetics can align sequences faster and more accurately, predict viral evolution with greater precision and handle large scale genomic data.
One example for AI driven phylogenetic analysis is the open access platform CASTOR, which was introduced by Remota et al. in 2017 and enables collaborative and reproducible classification of virus genomes by machine learning. Furthermore, tools like Pangolin (pangoLEARN) are used to classify SARS-CoV-2 lineages, which helps researchers tracking outbreaks.
The machine learning tools mentioned above are employed to classify viruses that have already emerged. An interesting application of AI in virology is the prediction of viral phenotypes. TEMPO is a transformer-based machine learning tool that aims to predict how SARS-CoV-2 might mutate in the future based on historical data. The researchers who published the code (openly accessible at https://github.com/ZJUDataIntelligence/TEMPO), state that their initial experiments suggest that TEMPO can predict SARS-CoV-2 mutations and that this method outperforms several traditional methods. Information about future mutations might lead to the identification of variants with increased transmissibility or resistance to treatments. This predictive capability might be able to inform public health strategies and guide the development of vaccines and therapeutics.
Machine Learning in the War Against Viruses: Smarter Drugs, Faster Vaccines
Taking into consideration the rapid mutation rates of some viruses and the fact that antiviral drug discovery has historically been a complex and expensive process often taking decades from studying the molecular mechanisms to clinical approval, the importance of machine learning in this field becomes indisputable. Machine learning has revolutionized the way that drugs are discovered, reducing the time, cost and effort, to identify novel treatments.
The traditional approach consists of researchers identifying viral proteins that are critical for viral replication, screening thousands to millions of chemical compounds that might halt replication, chemically modifying these compounds to improve their efficacy, selectivity and pharmacokinetics, before moving on to preclinical and clinical testing.
With machine learning, this process can be automated, accelerated and optimized. Vast amounts of genomic and proteomic data can now be analyzed in silico to identify new drug targets. Using AI-powered tools like AlphaFold reduces the need for conducting X-ray crystallography or cryo-EM studies, as it enables the researchers to study only compounds that have a high likelihood of being antiviral. A computational study by Essa Mohammad exemplifies this. In this study, 7,120 compounds were virtually screened, which identified four promising compounds likely to inhibit 3CLpro, an enzyme that is essential for SARS-CoV-2 replication.
Conclusion & Outlook
Machine learning is a powerful tool in virology, as it transforms the way that researchers classify viruses, predict mutations and develop vaccines and treatments. In phylogenetics, viral classification is accelerated and outbreaks can be tracked more easily, while models like TEMPO can help researchers prepare for the development of vaccines before new variants emerge. Also in antiviral drug discovery, AI driven approaches significantly reduce the time and cost associated with developing new treatments.
It is exciting to think about the integration of advanced AI models with virology in the future. The development of more sophisticated models might enable an identification of emerging viral threats with such precision that treatments can be developed before the virus fully emerges. If this is the case, many lives could be saved and the time and cost of the development of treatments could be reduced drastically. However, challenges like data biases, the interpretability of AI models and ensuring equitable access to these technologies must be addressed to fully realize the potential of AI in virology.
By leveraging the power of AI, we are entering a new era of virus research: one where machine learning can help us stay ahead of viral evolution and prepare better for future pandemics.
Luca Gudermaan
Author Contacts: