ICER Student Highlights: Anna Yannakopoulos

Using Machine Learning to Decode Complex Diseases

Anna Yannakopoulos is a graduate student in the Department of Computational Science, Mathematics, and Engineering at MSU. Under the advisement of Arjun Krishnan, she works on uncovering the genetic basis for disease in humans. To do this, Yannakopoulos utilizes machine learning techniques. Essentially, she compiles massive quantities of data from studies, papers, and text mining1 to feed into various algorithms which are able to use the data to locate patterns and trends within biological systems. 

Yannakopoulos’s studies focus primarily on complex diseases that are caused by numerous different genes whose interactions result in a patient’s symptoms. This complexity also frequently causes symptoms to manifest in different ways depending on the patient’s unique genetic makeup. ASD (autism spectrum disorder) is a good example of a complex disorder because despite having an underlying set of symptoms, the complexity of the condition causes a different presentation in each patient. In order to demystify the genetics underlying these complex diseases, Yannakopoulos is using algorithms that train computers to find genetic patterns that are specific to different diseases. 

Given how Yannakopoulos’s research is rooted in machine learning, she makes heavy use of HPCC services to process the quantities of data needed to perform her research. While her research does not reach the standard of “Big Data” set by corporations, the quantity is significant enough that she mentioned how her “computer would explode” if she tried to perform the calculations on a standard system. She stated the largest challenge faced in her research is facilitating compatibility between programs, or “getting software you didn’t write to play nice with other software you didn’t write.” In other words, she needs to use many different programs in order to work with her data and algorithms. These programs are often written by others and may contain bugs or be incompatible with one another. This can serve as a roadblock to her research. However, despite these challenges Yannakopoulos is so far quite successful in her endeavors within the field. 

Ideally, this research will give biologists a list of certain genes that Yannakopoulos and others are confident relate to different diseases.  From there, biologists will be able to use that information to determine how those genes correlate with complex conditions. Additionally, the sorts of algorithms being developed in the process can be applied to a number of fields: it’s not exclusively limited to biological sciences and medicine. Through the identification of underlying genetics associated with complex diseases, Yannakopolous’s research is a stepping stone towards being able to provide more accurate diagnoses and more effective treatment in individuals who present symptoms associated with complex conditions. 

Yannakopolous’s interest in the field was piqued as she already had a passion for exploring the capabilities of computing systems and computational work. What drove her to this project specifically was the many applications for computational science within biology and the undeniable and global importance of understanding human health.


1 Text mining: the act of obtaining information from unstructured text to use as an additional source of data. Unstructured text is data that is undefined by a previous model. Examples of unstructured text include emails, videos, audio files, and social media messages.