XSEDE Science Successes

« Back

Comet Tree of Life

XSEDE Resources Help Grow a New ‘Tree of Life'

A new ‘tree of life', depicting the evolution of life on this planet that includes more than 1,000 new types of bacteria and Archaea lurking in the Earth's nooks and crannies, was made possible with the help of supercomputing resources available through the National Science Foundation's eXtreme Science and Engineering Discovery Environment (XSEDE) program, as well as a phylogenetics "science gateway" available on those resources.

The new tree, published April 11 in the new journal Nature Microbiology and widely publicized throughout the general press, reinforces once again that the life we see around us – plants, animals, humans and other so-called eukaryotes – represents but a tiny percentage of the world's biodiversity.

"The tree of life is one of the most important organizing principles in biology," said Jill Banfield, a University of California Berkeley professor of earth and planetary science, policy, and management, and the study's principal investigator. "The new depiction will be of use not only to biologists who study microbial ecology, but also biochemists searching for novel genes and researchers studying evolution and earth history."

Researchers used the CIPRES (CyberInfrastructure for Phylogenetic RESearch) gateway, a web-based portal that allows researchers to explore evolutionary relationships between species.

"The CIPRES Science Gateway was critical to our work," said Laura Hug, who computed the trees at the UC Berkeley and is now a biology faculty member at the University of Waterloo, Canada. "Previous attempts to infer the trees presented severe problems with run time, memory allocation and a lack of parallelized implementation of the RAxML (Randomized Axelerated Maximum Likelihood), a popular program for phylogenetic analysis of large datasets). No run had successfully finished prior to our introduction to CIPRES."

Access to supercomputers also was a key part of completing this study, helping researchers investigate relationships by comparing DNA sequences information between species. This type of analysis is becoming more powerful as the number of DNA sequences available is increasing rapidly, with new, larger data sets requiring higher levels of computational power.

In using the CIPRES gateway, the researchers relied on two other XSEDE resources: Gordon, the first high-performance supercomputer to use massive amounts of flash-based SSD (solid state drive) storage, and Comet, a petascale supercomputer designed to transform advanced scientific computing by expanding access and capacity among traditional as well as non-traditional research domains. The two jobs ran for a total of about five days, using 48 cores. Both Gordon and Comet, the result of two National Science Foundation grants, are housed at the San Diego Supercomputer Center at the University of California San Diego.

Charles Darwin first sketched a tree of life in 1837 as he sought ways to show how animals and plants are related to one another. The idea took root in the 19th century, with the tips of the twigs representing life on Earth today, while the branches connecting them to the trunk implied evolutionary relationships among these creatures.

Banfield and Hug, along with more than a dozen other researchers who have sequenced new microbial species, gathered 1,011 previously unpublished genomes to add to already known genome sequences of organisms representing the major groups of life on Earth. Their investigation, representing the total diversity among all sequenced genomes, produced a tree with branches dominated by bacteria, especially by uncultivated bacteria.

A second view of the tree grouped organisms by their evolutionary distance from one another rather than current taxonomic definitions, making clear that about one-third of all biodiversity comes from cultivated bacteria, one-third from uncultivated bacteria, and one-third from Archaea and Eukaryotes.

Added Hug: "I spent over a month attempting to conduct these jobs on other servers with no success – the jobs always failed prior to finishing. CIPRES was invaluable in troubleshooting our analyses."

"The CIPRES gateway allows scientists to conduct their research in significantly shorter times without having to understand how to operate supercomputers," said Mark Miller, principal investigator of the CIPRES gateway and an SDSC researcher.

Written by Robert Sanders (UC Berkeley) and Warren Froelich (SDSC)