Skip navigation links

National Science Foundation Grant Awarded for the MSU Data Hub

A server rack is open with a row of storage hardware illuminated by green lights pulled out.

Michigan State University (MSU) researchers will have a new option to store, share, and archive research data thanks to a nearly $630,000 grant from the National Science Foundation. The grant, which is a collaboration between the Institute for Cyber-Enabled Research (ICER), IT Services, MSU’s Research Technology Support Facility (RTSF), the MSU Commons, and the MSU Libraries, establishes the MSU Data Hub. It will support a diverse range of data-intensive science applications across MSU’s community of scholars.

The principal investigator (PI) for this grant, ICER Director on sabbatical Dr. Brian O’Shea, is joined by co-PIs Ronald Henry, Director of RTSF, Donald DuRousseau, Executive Director for Research Cyberinfrastructure, Kathleen Fitzpatrick, Director of Knowledge Commons, and Jonathan Barber, Data Librarian.

The MSU Data Hub

There is an explosion of data-intensive research particularly in the realm of life sciences, remote sensing, and areas where machine learning and AI are poised to make transformative contributions. This growth presents a tremendous challenge for researchers due to the complexity of storing large datasets for analysis, sharing them with collaborators, and archiving them for long-term use. These issues are increasingly urgent as interdisciplinary teams are crucial to solving important societal problems but lack the tools to collaborate on shared datasets.

The MSU Data Hub will tackle these challenges by creating a central resource for all MSU scholars and students to securely store and share datasets, making them easily available for processing, analysis, and visualization.

Collaboration is paramount to the MSU Data Hub. Through a partnership with the Open Science Data Federation, the Data Hub will support national research infrastructure and foster working relationships across institutions. To assist MSU scholars in achieving the objectives of findability, accessibility, interoperability, and reusability (FAIR), the Data Hub will also integrate with the MSU Commons, which is the University’s open-access digital repository.

By providing a reliable platform on which researchers can build large datasets and connecting that platform to a public-facing system, the MSU Data Hub will foster a culture of openness among MSU researchers. Additionally, its connection to the larger Knowledge Commons will empower researchers globally to locate and use valuable datasets, accelerating the pace of discovery and innovation.

Tech Specs

The MSU Data Hub is expected to provide about 8.5 petabytes of usable data storage. For perspective, that storage volume can hold roughly 94,000 feature-length movies in 4k quality.

The Data Hub is being built on an open-source storage platform that allows data to be automatically distributed across the entire storage cluster, which improves performance compared to older, single-node file systems. Critically, the Data Hub will use fast, reliable, open-source tools for transferring large amounts of data.

Research Implications

After deployment and testing, the MSU Data Hub is expected to be available campus-wide in the spring of 2026 to support researchers in many domains. The primary scientific partner, RTSF, will be the first group onboarded to test the new cyberinfrastructure. RTSF is a scientific facilities consortium that provides technical and scientific assistance for MSU and national research communities. These core facilities produce huge amounts of raw data -- currently hundreds of terabytes per year, with exponential growth expected in the next few years. Additionally, the need for researchers to integrate data from multiple core facilities creates another data management challenge.

The MSU Data Hub’s storage space and transferring methods will ease these challenges, resulting in a significant increase in scientific productivity.

Cryo-EM is a core facility that creates highly detailed 3D maps of everything from atoms to organisms. The digital storage space required for these images in 2023 was a half to one TB per day, which is expected to increase to 10-15 TB per day in 2025 with new hardware capable of higher-quality data capture. The storage space provided by the MSU Data Hub will be essential to accommodate the massive data output.

Another core facility, the Genomics Core, provides nucleic acid sequencing, consultation, and outreach services. This data will be more easily archived and protected against accidental data loss by the longer storage periods available through the Data Hub.

The Bioinformatics Core supports research by analyzing complex data sets and providing a meaningful biological context for the results. The current data transfer processes prove cumbersome for researchers. The MSU Data Hub will use a fast, secure service to transfer data, which will be instrumental for the Bioinformatics Core as they frequently integrate data from multiple sources.

Finally, the Mass Spectroscopy and Metabolomics Core (MSMC) performs analyses using mass spectrometry. The MSMC staff plans to work with the Data Hub user support team to improve automated metadata collection and to educate users about FAIR data principles to make it easier for them to deposit their data into both the Data Hub and public repositories.

The MSU Data Hub represents a leap forward for researchers needing to store, share, and archive data-intensive research. By providing a centralized, secure platform, the Data Hub will address the pressing challenges of collaboration across disciplines. This cyberinfrastructure will not only support MSU’s diverse research community but also foster innovation by integrating with national research infrastructure and adhering to FAIR data principles. As the Data Hub enables smooth data sharing and long-term preservation, it will accelerate discoveries that tackle complex societal challenges.