Grant Awarded for Accessible Supercomputer: MSU Data Machine
Although the phrase “high-performance computing” traditionally conjures images of a computer scientist more so than a social scientist, the reality is that many diverse areas of research benefit from access to the advanced computational capacity offered by a high-performance computing center (HPCC). The National Science Foundation has awarded a $399,865 Campus Cyberinfrastructure grant to MSU that will enable researchers from diverse academic backgrounds to utilize the campus HPCC facilitated by the Institute for Cyber-Enabled Research (ICER).
Technological advancements and increased availability of data have led to an explosion of data for researchers to employ in machine learning (ML) and artificial intelligence (AI), particularly in fields of study where computing has not been widely used. With the Campus Cyberinfrastructure grant, ICER will create the MSU Data Machine—an accessible supercomputer optimized for such data-intensive research and ML and AI applications.
Dr. Brian O’Shea, Principal Investigator for the Campus Cyberinfrastructure grant and Director of ICER, noted that the technical optimization of the MSU Data Machine is paired with a comprehensive outreach and training program to ensure access to researchers from fields that do not typically use high-performance computing in their workflow.
“The machine will include large amounts of memory to facilitate user-friendly data analysis, low latency solid-state storage that is optimized for working with small files and complex access patterns, and graphics processing units (GPUs) that are well-suited for ML and AI applications,” Dr. O’Shea stated. “We will ensure that this resource is maximally accessible to researchers and instructors through tools like Open OnDemand, which provides a graphical interface that is much easier to use than the standard command-line interface, modern cloud-informed system and user tools, and usage policies that promote interactive data analysis over a batch queue-based system.”
Melissa Woo, Executive Vice President for Administration and Chief Information Officer, said the accessibility built into the MSU Data Machine is essential to the University’s commitment to an inclusive and equitable campus culture.
“Expanding high-performance computing access to researchers across our university speaks to the very core of MSU’s strategic effort to empower excellence, advance equity, and expand impact,” said Dr. Woo. “The MSU Data Machine affirms our commitment as a leading research institution and greatly enhances the ability of our researchers to collaboratively tackle global challenges.”
The MSU Data Machine will address the unique needs of researchers in areas such as microbiology, social dynamics, ecology, and remote imaging, and can ultimately lead to substantial scientific advances. Four research groups, led by co-principal investigators of the Campus Cyberinfrastructure grant, will be the first users of the MSU Data Machine. Once efficacy has been established, the machine will be made available to the broader MSU community.
The MSU SpaCE Lab (led by Dr. Phoebe Zarnetske, associate professor, MSU IBIO/EEB, and PI, MSU Institute for Biodiversity, Ecology, Evolution, and Macrosystems (IBEEM)) studies what drives biodiversity. Projects include combining data from the National Ecological Observatory Network (NEON) with satellite remote sensing and modeling to explain and predict changes in bird, tree, fish, mammal, and insect biodiversity from local to continental scales. Credit: National Science Foundation, https://www.nsf.gov/news/news_images.jsp?cntn_id=121207&org=NSF
Dr. Phoebe Zarnetske, Associate Professor in the Department of Integrative Biology, has experienced hindered progress due to a lack of resources for big data processing and interactive computing. The MSU Data Machine will provide the resources needed to further her research.
“Big data are essential to help explain and predict natural phenomena including patterns of biodiversity, impacts of climate change on genes to ecosystems, and feedbacks among ecology, evolution, and behavior,” said Dr. Zarnetske. “By combining data from satellites, gene sequences, and observations of organisms from public science efforts like iNaturalist or the National Ecological Observatory Network (NEON), we can advance both fundamental knowledge and applied questions that are essential for sustainable management and conservation of Earth’s ecosystems in time and space. The MSU Data Machine enables integrative and conservation biology to expand to bigger scales in research and teaching, facilitating knowledge, discovery, and more robust forecasts of how life is changing on Earth.”
Weekly trips to grocery stores in Detroit mapped from over 10,000 simulation data points. The simulations were run on a local PC and took a week to finish. The image was generated using open-source software and open data. The Data Machine will allow for a significant increase in execution speed and better rendering of results. Funded by U.S. National Science Foundation, the Human and Social Dynamics program grant SES 0624263 (PI I. Vojnovic).
The MSU Data Machine will provide Dr. Arika Ligmann-Zielinska, Associate Professor in the Department of Geography, Environment, and Spatial Sciences, with the necessary resources for developing efficient workflows that are impossible with the current high-performance computing resources.
“High-performance computing has gained a lot of attention in recent years in the Social Sciences,” Dr. Ligmann-Zielinska noted. “The sheer volume of data from satellites, social media, federal databases, or government and commercial clearinghouses provides new avenues for research. However, their use is hindered due to inadequate training in the analysis and visualization. Advanced open-source software came from obscurity in the last decade, but its use in social sciences is limited. There is a pressing need to educate the social science community in high-level programming augmented with user-friendly interfaces. The MSU Data Machine will be a stepping stone in making high-performance computing available to the broader social science community through more intuitive interfaces for data processing and training the second generation of social science scholars.”
Large-scale direct numerical simulation of turbulent river flow bounded by a permeable bed, directly resolving hyporheic exchange for extraction of pore-scale physics. From the Turbulence Simulation and Modeling (TSM) Lab.
Dr. Junlin Yuan, an Assistant professor in the Department of Mechanical Engineering, will be able to greatly accelerate work in the Yuan lab by utilizing the large memory available on the MSU Data Machine for machine learning algorithms.
“The discovery of new flow physics increasingly depends on ‘expensive’ simulations that generate many terabytes of raw data,” explained Dr. Yuan. “Machine learning emerges as a promising tool to dissect and reduced these data to be used in system modeling, control, and design. The MSU Data Machine will unify and greatly accelerate the data processing and turbulence modeling, and facilitate "barrier-free" education and community outreach relating to computational fluid dynamics.”
This figure shows work done by The Schrenk Lab looking at microbial species in groundwater and their relationship to environmental characteristics of the water to demonstrate how big data approaches might be used.
One of the major challenges faced by Dr. Matthew Schrenk, an Associate Professor jointly appointed in the Department of Earth and Environmental Sciences and the Department of Microbiology and Molecular Genetics, is the variable level of experience students have with coding. The MSU Data Machine’s user-friendly interface will reduce the barrier to entry for high-performance computing.
"The Earth and Environmental Sciences are an area defined by the grand scales that they cover, ranging from nanoseconds to billions of years and from molecules to planets,” said Dr. Schrenk. “The new campus cyberinfrastructure award will help us to train our students to work with data across these scales, providing them with new opportunities in the workforce, and potentially new perspectives through the integration of data sets from different disciplines."
The deployment of the MSU Data Machine will provide low-barrier access to computational resources for researchers and instructors across MSU. The technical optimization is joined with user training and support structures that will facilitate data-intensive research and instruction and ultimately contribute to the development of a globally competitive STEM workforce.