The Language of a Genome, Decoded through High Performance Computing

Dr. Shin-Han Shiu, Associate Professor
Department of Plant Biology, Michigan State University
Dr. Shin-Han Shiu

Dr. Shin-Han Shiu studies genetics and evolution. He and his colleagues, using computer simulations and through experiments with plants, explore how genes express themselves. Under what conditions does a given gene turn on? They seek to understand the “regulatory logic” of genes. Together with other related projects, Dr. Shiu says their goal is to learn “the language of a genome.” With 25,000 genes in a plant, combinations of possibilities run into the millions and billions. “The problem is non-trivial,” he says with a smile. Dr. Shiu uses the resources of MSU’s High Performance Computing Center (HPCC) to explore the possibilities.

A single typical simulation run would take six months to complete on a conventional PC. Dr. Shiu and colleagues submit up to 100 jobs at once using HPCC computer resources. Using the HPCC, they can complete 50,000 simulation jobs in three weeks.

Dr Shiu says the HPCC was a major factor in attracting him to MSU in the first place. “The HPCC enables our lab to compete with researchers at some of the leading facilities in the United States and in Europe.” He came to MSU after doing research at the University of Chicago. While at the University of Chicago, he received a fellowship from the National Institutes of Health.

“Genes form families, large and small” explains Dr. Shiu, totaling some 10,000 to 15,000 families in plants. The sizes of genomes, the total content of genetic materials, varies greatly across plants and animals. The lab studies a plant known as Thale cress (Arabidopsis thaliana), with 150 million bases. A human being, Homo sapiens, has ~3 billion. By contrast, a Marbled Lungfish (Protopterus aethiopicus) has 132 billion bases in its genome. “The run time grows logarithmically, so we start out by studying simpler living organisms,” he explains.

The simulations allow the researchers to predict how genes will express themselves. The lab takes the theoretical results and then performs experiments to test the theories. “Sometimes the experiments yield unexpected results, and then we have to do more simulations and come up with new theories,” he explains. This feedback between theory and experiment is a common thread among HPCC users.

The Shiu Lab (sometimes referred to as the Evolutionary Genomics Lab) is in the Plant Biology Building. Walk into the lab and you’ll see students tending trays of various plants under grow lights. The lab employs a wide range of people, from postdocs to undergraduates to visiting scholars and even high school students. Collaborators include faculty in MSU’s Computer Science and Engineering and Statistics and Probability, as well as researchers at other universities. The lab receives funding from the National Science Foundation and from MSU. Visit his website.

Recently Dr. Shiu and his colleagues have published five papers in peer-reviewed journals related to genomics that involve the use of HPCC resources, and two papers in journals that feature creative uses of graphics in explaining complex scientific concepts.

Dr. Shiu’s first computer was an Apple II. He remembers playing Space Invaders as a child. He learned the Java programming language while a PhD student at the University of Wisconsin. Now he and his colleagues rely on computing capacity thousands of times more powerful than the computers he encountered in his academic career. When he decided to join MSU’s faculty, he contemplated the capacity of the HPCC: “The idea of a computer with 512 processors and 512 gigabytes of memory just blew me away,” he says.