NVIDIA’s Grace Hopper Nodes Arrive at MSU’s Data Center to Push Boundaries of Computational Research

The top of a computer node is removed to reveal its internal components.

On the cutting edge of computer chip technology, the NVIDIA Grace Hopper Superchip promises to advance computational research beyond what was possible with previous hardware. The Institute for Cyber-Enabled Research (ICER) has installed four Grace Hopper nodes and one Grace CPU node for MSU’s high-performance computing users to accelerate their research.

The Grace Hopper Superchip is named after a trailblazing computer science pioneer and naval officer, Grace Hopper, whose long list of accolades and accomplishments includes creating programming languages with words rather than symbols to make them more accessible to non-experts. NVIDIA’s Grace Hopper Superchip combines their existing Grace central processing unit (CPU), which is also available at ICER, and Hopper graphics processing unit (GPU) architecture.

Boxes containing Grace Hopper nodes sit on a dolly in the loading dock. Although there is a trend toward general-purpose GPUs in computational research due to better performance and less cost and power, some researchers cannot take advantage of GPUs because their software does not work well with the constraints of the GPU hardware architecture. With a combination of sophisticated CPU, GPU, and high-speed memory architecture, the Grace Hopper Superchip aims to solve many of the performance issues researchers encounter with standard hardware.

Brian O’Shea, director of ICER and professor in the College of Engineering and the College of Natural Science, described the mission of ICER in providing MSU scholars with opportunities to experiment with new and novel hardware to help them solve their research problems more quickly and open further opportunities for research and grant funding.

“The NVIDIA Grace Hopper Superchip will be particularly exciting for researchers who can use GPUs for at least part of their application but also need lots of memory,” stated O’Shea. “For example, people doing machine learning, deep learning, and generative AI should see substantial benefit, and we also expect that researchers with molecular dynamics and fluid dynamics codes will be able to see significant speedups of the code too.”

The Vermaas lab, led by Assistant Professor Josh Vermaas of the Department of Biochemistry & Molecular Biology, was the first research group to test the new nodes. So far, they have used them to quantify how tightly electron transfer proteins involved in photosynthesis bind. These proteins then carry energy to drive metabolism in plants and algae.

“For molecular simulation systems, we are limited to taking very short two femtosecond [two quadrillionths of a second] timesteps to capture the dynamics at the molecular scale,” Vermaas explained. “So, in order to study biological processes that might take milliseconds or longer to happen, we need to add biases.”

In this context, “biases” refers to extra force added to make something happen faster. An analogy would be walking a dog on a leash, where the dog still gets to do dog-like things and explore, but the human is biasing the dog to move along and take a predetermined route. At the molecular scale, the Vermaas lab guides the molecules along the path they want them to take, rather than just staying where they started as molecules often do.

“The key thing that the Grace Hopper nodes enable for us is to permit biased simulations to happen much faster,” said Vermaas. “While molecular dynamics has long used GPUs to get great performance, that performance was mostly limited to the “easy” case where we were not adding a bias, since these biases are largely implemented on the CPU. CPU/GPU communication latency and bandwidth thus become a bottleneck when running biased simulations. This key bottleneck is substantially alleviated on Grace Hopper nodes since the CPU/GPU communication happens much faster.”

Angela Wilson, John A. Hannah Distinguished Professor of Chemistry, has also been an early adopter of these systems. Wilson’s group is engaged in areas including quantum mechanical and quantum dynamical method development, thermochemical and spectroscopic studies of small molecules, protein modeling and drug design, catalysis design, environmental challenges such as CO2 and PFAS, heavy element and transition metal chemistry, and mechanical properties of materials.

"We are very excited to have Grace Hopper nodes at MSU,” reported Wilson. “We believe that they will have a significant impact upon much of our research."

Two system administrators install a new Grace Hopper node in the data center.

In addition to being the director of ICER, O’Shea is a computational astrophysicist and plasma physicist who uses ICER’s high-performance computing center (HPCC) for his research. 

“Solving the most pressing questions in my field requires the most powerful computational hardware we can access,” said O’Shea. “My research group is excited to experiment with this hardware because we think that the innovative approach to the system’s memory - in particular, high memory bandwidth and the ability of the GPUs to access all of the memory on the node directly rather than using the CPU as an intermediary – is going to accelerate our plasma simulation codes tremendously.”

Preliminary experiments from O’Shea’s research group show a 1.7-1.8 times improvement in performance over ICER’s NVIDIA A100 (“Ampere”) GPUs without any specialized workflow tuning for the Grace Hopper nodes. Once they fine-tune their process for these nodes, they hope to achieve a significant increase in performance.

Access to the Grace Hopper nodes is temporarily restricted while in beta. Soon, all ICER users will be able to access the nodes to experiment with their own workflows. Stay tuned to our Topic of the Month page to learn more about accessing these nodes by April 2024.