The Use of Machine Learning to Solve Problems in Computational Chemistry
Zhuoqin Yu is a PhD student pursuing research in Computational Chemistry at Michigan State University. She has been working under Dr. Kenneth M. Merz Jr, applying machine learning to problems in computational chemistry and biology. Her research involves the process of determining protein-ligand complex structures through developing a regression model to predict NMR chemical shift perturbations (CSPs) of protein induced by ligand binding.
NMR is a method by which the structure of molecular compounds is detected using radio signals. CSPs are the changes in these signals, which equate to structural changes within the molecular compounds. By comparing the experimentally observed and the predicted signals for any given set of structural coordinates, Yu is able to “determine the coordinates, or where the atoms are within a protein.” Yu has developed an open-source Python package which is available on GitHub (a social platform for engineers, programmers, computer scientists, etc.) Her work has opened up new opportunities for improving the quality of protein-ligand complexes using NMR-derived information.
Ideally, the application of her research will be towards the development and improvement of pharmaceuticals. Yu has performed two applications on two complex systems and demonstrated that her method can distinguish native ligand poses from decoys and refine protein–ligand complex structures. To be able to map experimental NMR signal changes to structural changes through her method proves highly useful to the industry. In fact, one pharmaceutical company used her method in their own trials, where it “performed perfectly.” and can be utilized in further trials that seek to test how the molecular components in medications interact with the drug target within our bodies.
Given the nature of this project and its heavy reliance on data science, HPCC proves to be an integral part of Yu’s research. This is because there is not much data available that pertains to the specific conditions that apply to her project. Therefore, Yu must use quantum mechanics (QM) to simulate data by calculating NMR signals and modeling the link between those signals and preexisting structures of protein-ligand complex. She also is targeting some of data science techniques towards another research project, where she uses deep learning to predict the toxicity of drug molecules. All in all, Yu takes advantage of HPCC services to process large amounts of data, run the machine learning algorithms and simulations.
Yu’s interest in applying data science to solve problems in computational chemistry began once she entered graduate school and was further solidified as she proceeded in her research. Computational chemistry is appealing to her because of how it involves research, critical thinking, modeling, and machine learning. “I am very lucky” Yu expressed, “to be equipped with solid data science skills especially on machine learning within a computational chemistry lab.” On the whole, Yu’s passion for her work shines through into all she’s accomplished.