Putting Together the Puzzle of Topological Data Analysis

As a high school student, Dr. Liz Munch decided she did not want to be a math nerd and opted to pursue a music degree instead. Now an assistant professor jointly appointed in the Department of Computational Mathematics, Science & Engineering (CMSE) and the Department of Mathematics, Dr. Munch has found her calling in an exciting area of math known as topological data analysis (TDA).

The field of TDA only started around 20 years ago, but it emerged from much older fields of math such as algebraic topology. TDA can quantify the shape and structure of data in an automated fashion, which can reduce the time to scientific discovery.

“Up until recently, this was a much more esoteric branch of mathematics with beautiful math and theorems hidden in symbols and pictures that are very accessible to mathematicians but are much harder to take outside of that realm,” explains Dr. Munch. “So, the more recent work has been interested in finding ways of taking the tools and ideas from the more theoretical mathematical side and bringing them closer to things like statistics and machine learning in a way that we can analyze.”

Fitting together the puzzle pieces of TDA and, more broadly, interdisciplinary work, comes naturally to Dr. Munch. As an enthusiast of puzzles and rule-based worlds, she finds her field of study to be an appealing intersection of the ways she naturally operates.

In her interdisciplinary research, Dr. Munch first figures out what questions researchers are trying to answer on the biology or engineering side and what tools are available on the mathematical side to help answer those questions.

“A lot of the application side research, in particular biology, tends to work with a hypothesis-driven model where they have a question, they develop some sort of experiment to try to answer the question, and then they make conclusions,” says Dr. Munch. “A lot of the work that I do is much more on a data-driven side where I'm generating theorems and trying to understand how these tools work and what sort of questions they can answer.”

In one of the Munch Lab projects, Dr. Munch works with plant biologists looking at X-ray CT scans of plants. Encoding information about the shape and structure of a plant and matching it with its genotype and gene expression information provides data that shows how changing the genes of plants can change their resulting shape.

Dr. Munch explains plant height as an example application of this work. If a plant grows too tall in certain conditions, it might fall over. Modifying the genetic information could make the plant grow to a shorter height or wider base, which would result in greater crop yield.

“If I'm interested in encoding information about a shape, I can pick a particular direction and then watch how the shape changes as I look at slices that go in that direction,” Dr. Munch explains. “And if I encode that information in enough directions, I can take all those representations and put them together to mathematically promise that will encode all the information about the shape.”

The resulting information about the shape can be encoded in a computer to use in statistical and machine learning tools. Automating this process significantly improves the speed of research and helps to reduce issues related to user bias. High-performance computer capacity is necessary to run these intensive tools.

“ICER's compute resources have been incredibly useful in our research,” says Dr. Munch. “I've also made a lot of use of the support staff because having access to people online to ask questions and try to figure out how to actually write our code better, access the resources, or get stuff working has been incredibly helpful.”