Jumping Genes: Quantifying the Hidden World of Transposable Elements
Scott Teresi is a few days away from graduating with his Ph.D., but he has already left his mark on the field of bioinformatics with a novel tool that helps researchers understand a feature of DNA colloquially known as “jumping genes.” His tool has the potential to improve our understanding of everything that contains DNA, from the domestication of plants to diseases like cancer. Teresi, a graduate researcher at Michigan State University in the Department of Horticulture and Genetics & Genome Sciences program, created an open-source software that he named TE Density to fill a critical gap in the bioinformatics community.
These “jumping genes” are formally known as transposable elements (TEs or transposons), which are repetitive DNA sequences that can jump from one location to another within the genome. They primarily exist to create more copies of themselves, and unlike regular genes, they generally do not do anything useful for their host genomes. However, their movements, akin to cut-and-paste or copy-and-paste mechanisms, play a significant role in creating genetic and genomic diversity.
TEs are most widely known for creating harmful mutations that can lead to diseases like cancer. However, in rare instances, TEs can create “good” or “useful” variation. For example, TEs are implicated in fruit color variation in strawberries, oranges, and grapes, a variety of stress and defense responses in other plants, and the domestication of corn. In mammals, TEs have contributed to the evolution of pregnancy and the evolution of complex gene regulatory networks.
Because they are so mutagenic, TEs are controlled by a set of mechanisms to arrest their replication and movement; however, these mechanisms are not perfect and a small minority of TEs are active, even in your genome. While most TEs are located far away from genes in areas of the chromosome that can be thought of as gene deserts, some are located near genes, and even target gene-rich regions.
“If a TE happens to insert itself into a gene or regulatory sequence, it can drastically alter the normal expression patterns of one or more genes,” Teresi explains. “This would not be good!”
The location of a TE profoundly impacts its capacity to create genetic variation. Understanding the distribution of TEs in a genome, particularly their positions relative to genes, is critical for assessing their impact. A greater understanding of these trends would have many downstream implications, ranging from broad scales of genome evolution to more applied fields such as cancer genetics, plant breeding, and gene therapies.
This was the impetus for Teresi to develop the TE Density tool, which provides a comprehensive, reproducible way to quantify TE presence near genes, and offers researchers a flexible method for exploring TE content near genes. By reporting TE presence relative to genes, rather than traditional estimates of genome-wide TE content, Teresi’s tool offers insights into which genes might be affected by TE activity.
"The beauty of TE Density is that it can be applied to any organism, from plants to animals to bacteria," Teresi explains. "It brings a new level of precision to the study of TE-driven gene regulation and opens up a wealth of possibilities for future research."
Teresi’s initial interest in TEs developed during his time as an undergraduate at the College of William and Mary working with Dr. Joshua Puzey, and through his experience in MSU’s Plant Genomics REU with Dr. Pat Edger.
“Previous research has shown a general trend: TE presence is negatively correlated with gene expression, but there remains a lot of unanswered questions,” Teresi says. “At what distance do we see an effect? Does this change for the various TE types? How are TEs distributed relative to genes? How can we identify TEs that are influencing gene expression?”
Building on these questions during his time in the Plant Genomics REU, Teresi developed prototype versions of the software, making extensive use of the High-Performance Computing Center (HPCC) at MSU’s Institute for Cyber-Enabled Research (ICER).
“At first, I developed the software to run on this one strawberry genome, which I used because the Edger Lab is broadly interested in the evolution of strawberries, and the genome was small and of high-quality. But, despite my best efforts, the software was not performant at all,” Teresi laughs. “It took almost a week to run, however I did see some interesting results.”
Among those results, he found evidence of unusual TE associations near sugar genes, which caught his attention given that TEs are generally located far away from genes, and when they are close, they are associated with reduced gene expression. This brought up an interesting question. Why would sugar genes be associated with features that typically reduce gene expression? Teresi speculated that in rare cases, TEs can act as novel regulatory sequences that enhance gene expression. He highlighted that while it is quite infrequent, it can happen, and the exceptions to the rule are often the most intriguing.
“An alternative explanation for this finding is that TEs are indeed reducing sugar production in favor of energy expenditure towards other traits such as fruit size,” Teresi says.
As a graduate student in Pat Edger’s lab, Teresi polished and published the software in the journal Mobile DNA.
“One of my main priorities was using better data structures and documenting the code, so that it can be run on larger genomes such as blueberry and corn. I also wanted to make it accessible to non-plant biologists.”
Teresi credits ICER’s compute and support resources as an essential part of his academic journey.
"If I didn’t have access to the HPCC and its wealth of resources, I am not sure I would have been able to build momentum on TE Density,” Teresi says. “ICER resources allowed me to experiment, develop a passion for coding, and see this project through. Along with providing the hardware I needed, the support resources were also instrumental. Nick Panchy, a research consultant at ICER, helped me generate test datasets and configure the software I needed."
TEs can make up a significant portion of plant and animal genomes, with estimates ranging from around 5% to 85% depending on the species. For example, around 45% of the human genome consists of TEs, while in some plants like corn, TEs can constitute up to 85% of the genome. The amount of TEs in a genome varies greatly across the tree of life, and even between closely related species. Furthermore, there is a tremendous amount of diversity of TE types, each with their own mechanisms of movement, size, structural characteristics, and impact. However, the sheer diversity of TEs, coupled with a lack of standardized tools and approaches for analyzing gene-centric TE variation has reduced researchers’ ability to compare and contrast between genomes.
By creating TE Density, Teresi aims to fill some of the knowledge gaps that have been inherent in this field of study. By giving researchers the ability to quantify the complexity and variability of TE events, they will be one step closer to identifying application of TEs for beneficial uses like gene therapy, environmental adaptation, and plant breeding.
Learn more about TE Density here!