Genome Wide Association Analyses
Dr. Ian Dworkin, an assistant professor in the Department of Zoology, studies the intersection of evolutionary genomics and phenomics, which deals with massive amounts of DNA sequence data as well as high dimensional phenotypic data. One model system in their lab is the wing shape of the fruit fly, Drosophila melanogaster, for which the lab has a geometric representation in 58 dimensions. With tens of thousands of individuals phenotyped, even simple computational analysis would take days, but using HPCC speeds up this process considerably.
Dworkin and his lab use HPCC most intensively for genome-wide association analyses, mapping this 58 dimensional phenotypic data onto genome sequence data, containing millions of genetic variants. They are able to parallelize their analyses to trade off number of jobs against run time. However, because they deal with highly multivariate data even relatively simple analyses can become untenable on off-the-shelf machines, so they have been relying on the large memory capacity available at the HPCC.
In addition, Dworkin uses HPCC to run evolution experiments with digital organisms using the Avida software package. By running Avida on the HPCC, they are able to evolve dozens or even hundreds of populations of digital organisms simultaneously, allowing them to perform powerful tests of predictions made by evolutionary theory. “We also analyze massive genomic datasets at HPCC, because its computing power allows us to perform intensive and complex analyses much more quickly than we could on our own machines,” says Dworkin.