XSEDE Science Successes
Science Gateway shares bimolecular data wit nation's scientists and researchers
Published on April 25, 2017 by Faith Singer-Villalobos
Our bodies are made of biomolecules like proteins, nucleic acids, fats and sugars. These biomolecules are folded into specific 3D structures -- predetermined by the DNA and RNA sequences that build them -- which allows them to do everything they need to do in our bodies.
Biomolecules are frequently long and can bend in lots of different ways, creating an immense number of possible forms. For scientists trying to understand how a protein works, or how to design a biomolecule that accomplishes a specific action, the task of determining what it might look like in 3D is daunting.
To deal with this problem, scientists have developed computer algorithms that are clever enough to map out biomolecules' 3D forms, or create entirely new ones, based on their DNA or RNA sequence. However, doing so requires powerful supercomputers and specialized software that can take advantage of them.
One of the most widely used such programs is Rosetta. Originally developed as a structure prediction tool more than 17 years ago in the laboratory of David Baker at the University of Washington, Rosetta has been adapted to solve a wide range of common computational macromolecular problems. It has enabled notable scientific advances in computational biology, including protein design, enzyme design, ligand docking, and structure predictions for biological macromolecules and macromolecular complexes.
"The structure prediction problem is to take a sequence and ask, ‘What does it look like?'" said Jeffrey Gray, a professor of Chemical and Biomolecular Engineering at Johns Hopkins University and a collaborator on the project. "The design problem asks ‘What sequence would fold into this structure?' That's at the heart of Rosetta, but Rosetta does a lot of other things," Gray said.
Over the years, Rosetta evolved from a single tool, to a collection of tools, to a large collaboration called RosettaCommons, which includes more than 50 government laboratories, institutes, and research centers (only nonprofits).
THE ROSIE SCIENCE GATEWAY
Most recently, with support from the National Science Foundation (NSF), it has morphed once again into ROSIE: the Rosetta Online Server that Includes Everyone. ROSIE is an easy-to-use web interface (also known as a ‘gateway') that provides access to the Rosetta software suite and encapsulates the body of rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers that were created by members of the RosettaCommons.
"The idea was to take this collaboration of 50 labs and institutions and make a single gateway," Gray said. "Rather than duplicating the work that everyone else was doing we agreed to work together. We decided to use NSF resources for the back end to provide the computational power. Now, it's easy to maintain 18 different web servers."
First described in PLOS One in May 2013, it continues to add new elements. In January 2017, a team of researchers, including Gray, reported in Nature Protocols on the latest additions to the gateway: antibody modeling and docking tools called RosettaAntibody and SnugDock that can run fully automated via the ROSIE web server or manually, with user control, on a personal computer or cluster.
Currently, the ROSIE gateway serves approximately 5,000 users and has run more than 30,000 jobs.
Some of the calculations enabled by ROSIE require 10 minutes of compute time; others require 200 computer processing hours. With several thousand users, the computing needs quickly add up.
"XSEDE [the Extreme Science and Engineering Discovery Environment] was a natural fit for a shared national resource that allows many different scientists to do science using large compute facilities," Gray said.
Initially funded by a five-year, $110-million grant from NSF, XSEDE is the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world. It is a single virtual system that scientists can use to interactively share computing resources, data, and expertise.
The Stampede supercomputer at the Texas Advanced Computing Center (TACC), one of the resources allocated through XSEDE, provides the lion's share of the computing power. Gray had used TACC resources as a graduate student in Texas in the late 1990s, so he knew about TACC and some of the other NSF supercomputing facilities.
"We've been using Stampede and applied for it through XSEDE," Gray said. "We have a Stampede allocation for my lab and we have a separate allocation for ROSIE."
Stampede serves as the back-end computing system for the thousands of researchers who use ROSIE. It has provided roughly two million compute hours for the project since 2013. Though scientists may not be aware that they are using a supercomputer, the project could not be as successful without a massive, on-demand supercomputer humming away in the background.
In Gray's own lab, he is exploring the structure and interactions of membrane proteins, which behave differently than many other types of proteins because they are in a bilayer of fatty lipids. How proteins interact and fold inside the cell membrane is an open question that his lab is trying to solve.
"The other big new thrust in the lab is glycoproteins," Gray said. "Most of the proteins in your body have sugars attached to them, which makes them glycoproteins. Traditionally, people ignored the glycans, but they are very important to cancer, heart disease, diabetes, aging, and infectious diseases. We're adding carbohydrates into the structure, and modeling their effects on protein folding and binding int eractions using the Rosetta software and the Stampede supercomputer."
GETTING HELP FROM XSEDE'S EXPERTS
Beyond providing raw computing power to the nation's researchers, XSEDE also runs an Extended Collaboration Support Service (ECSS) program, which pairs researchers with cyberinfrastructure experts who have a variety of expertise. ECSS experts, many with advanced degrees in domain areas, are available for collaborations lasting months to a year to help researchers fundamentally advance their use of XSEDE resources.
"I'm hugely grateful for NSF, XSEDE and TACC for making these resources available. We spent so many years and had so many students put all of their research effort into making great tools to model and design biomolecules and you want other people to be able to use it. However, biomolecular prediction and design requires tremendous computing time, so having XSEDE there makes it possible for us to share our tools broadly and allow them to have impact across the scientific community."
"There were a couple of places that we needed ECSS's help," Gray said. "One was setting up the ROSIE science gateway. To run a gateway there are many security concerns — you have people logging in from different locations, and the computer cluster is a hacking target. To assuage this concern, the software engineer that developed ROSIE worked with TACC staff to make sure the gateway worked properly. That was very successful."
In addition, Gray and other researchers needed the ability to write their own code in Rosetta beyond simply running canned software. Thus, Gray also worked with ECSS to install the Rosetta Python modules, called PyRosetta, which was created in Gray's lab.
"It's a Python interface to all of the Rosetta tools," Gray said. "It allows people to make their own customized scripts for tailored modeling."
PyRosetta is installed on Stampede as a module so that a scientist who is more of an expert can log into Stampede, load the module, and have access to all of the Rosetta code and functionality, allowing them to tailor their own scripts for their own particular molecules or designs that they are trying to calculate.
"I'm hugely grateful for NSF, XSEDE and TACC for making these resources available," Gray said. "We spent so many years and had so many students put all of their research effort into making great tools to model and design biomolecules and you want other people to be able to use it. However, biomolecular prediction and design requires tremendous computing time, so having XSEDE there makes it possible for us to share our tools broadly and allow them to have impact across the scientific community."
As ROSIE and the community it supports continues to grow, so do its computing needs.
"There's a huge life sciences community out there that wants to perform structural predictions on their biomolecules, but we can't handle it all with the current demand on Stampede."
For that reason, Gray is eagerly awaiting Stampede2, TACC's newest supercomputer which is due to come online later in 2017, "so we have the capacity to handle the great demand for computing time."
Jeffrey Gray, professor of Chemical and Biomolecular Engineering at Johns Hopkins University
- XSEDE Resources, Trinity Enable Non-Human Primate Reference Transcriptome Resource to Support Study of Genes in Our Closest Relatives
- Turtle Tree of Life
- Region 1 Champions meet at Idaho National Laboratory
- Crash test simulations expose real risks
- NSF supports development of new arctic maps
- How was the planet Earth formed?
- Exploring Large Data for Scientific Discovery
- XSEDE Value Added
- Scholars program helps realize dream
- Making sense of cyberinfrastructure
- XSEDE15 Wrap Up
- Bioinformatics Scripts Solutions
- XSEDE15 Plenary Panel
- Polymer Potential
- The Future of NSF Advanced Computing Infrastructure
- 2015 International Summer School on HPC Challenges
- A Catalyst for Complexity
- As Austin Grows So Does Its Traffic Woes
- The University of Tennessee, Knoxville, Wins Second Place in an International Student Supercomputing Competition
- PSC Receives NSF Award for Bridges Supercomputer
- Innovative New Supercomputers Increase Nation's Computational Capacity and Capability
- Exploring Competitive Balance
- A Direct Bridge
- The Dopamine Transporter
- XSEDE Supercomputers Laid the Foundation for an Unprecedented Simulation of Cosmological Evolution
- Big Data Needs Big Funding
- XSEDE helps create a more effective way to assemble genomic information
- Of Micelles and Machines
- XSEDE Allocation System to Receive Makeover
- Internet2: Advancing Science in the Age of Big Data
- XSEDE User Portal At Your Fingertips: Mobile App
- Researchers Study Air Pollution
- Dan Stanzione: New Executive Director at TACC
- People of XSEDE: Campus Champions - Preaching the HPC Gospel
- XSEDE and Blue Waters Go Supernova
- Two at a Time
- Show Him the Money
- Cosmic Slurp
- Turning Salt into the Unknown
- Looking Inside Images
- Farming the Wind
- Breaking out of the Digital Graveyard
- The Mechanism of Short-term Memory
- Open Science and Industry Collaboration
- XSEDE, Prace Call for Requests of Joint Support
- XSEDE Wins HPCWire Award
- Shields to Maximum, Mr. Scott
- The Ultimate Timekeeper
- Blue Waters, XSEDE sign collaborative agreement
- People of XSEDE - Outreach programs set XSEDE apart
- Wrangler Reels in Award
- The Great Comet: NSF awards $12 Million Grant to SDSC to deploy Comet
- Meet the Gribbles
- 2013 Nobel Prize in Chemistry winners bring HPC to the lab
- XSEDE helps create a more effective way to assemble genomic information
- XSEDE facilitates large-scale image analysis to understand diseases
- XSEDE announces new campus briding services and tools
- XSEDE, NSF Release Cloud Survey Report
- XSEDE13: Programming Competition Allows Students to "Geek Out" and Gain Crucial Skillsets
- Katlin Thaney gave XSEDE13 Keynote: Gateways for Open Science
- XSEDE13 conference selects best papers, posters visualizations and more
- XSEDE13 speaker tells how turbulence simulations help make movie magic
- XSEDE13 Plenary Talk: Accelerating Brain Research with Supercomputers
- Invited speakers announced for Extreme Scaling Workshop - Heterogenous Computing
- XSEDE13 speaker LeManuel "Lee" Bitsóí: Democratizing Scientific Research
Read more about Bitsóí's talk at this year's conference
- More than 70 students from 4 continents gain HPC skills at fourth annual Summer School
- Registration opens for Extreme Scaling Workshop 2013
- Campus Champions Fellows Named
- Campus Champions program reaches 200 members
- Rock Snot Genomics: University of Texas researchers use advanced sequencing and TACC's Ranger supercomputer to uncover origin of common algae
- Experiencing some turbulence: Researchers Take on One of Physics' Most Important and Enduring Problems
- Register now for Virtual School summer courses on data-intensive and many-core computing
- XSEDE seeks a Scientific Workflow Specialist for Extended Collaborative Support Service
Applications are due May 31, 2013
- XSEDE13 schedule now available online
- Students from high school to grad school levels invited to participate in programming contest at XSEDE13 high performance computing conference
- SDSC's Gordon enables discoveries in the study of genetics Read about Gordon's role in pinpointing the genetic patterns underlying autism-spectrum disorders, schizophrenia and similar brain conditions.
- XSEDE, National Computational Science Institute offer summer workshops for educators
- XSEDE13 Student Day applications due May 15 High school and undergraduate students get hands-on experience in computational science and interact with expert researchers
- XSEDE upgrades to Internet2's 100G Network
- XSEDE13 Registration now open!
- Get to know XSEDE Staff XSEDE Allocations Manager Ken Hackworth: The Man, The Myth, The Legend
- Two sponsors commit to XSEDE13 conference: Cray and Intel .
- Texas Unleashes Stampede
- Swirling Secrets-Understanding the turbulence of gases
- Blacklight helps researchers develop better materials for carbon capture
- Journey to the limits of spacetime
- Students invited to participate in XSEDE13 Multiple ways for high school, undergraduate, and graduate students to get involved; funding support available.
- XSEDE Call for Humanities, Arts and Social Science ProjectsIf you and your collaborators need to access to large collections of digital data, need more computer power, or require substantial storage capacity and computing power – please share it with XSEDE.
- XSEDE needs your feedback! If you received an invitation to complete the 2013 User Satisfaction Survey, please take 10 minutes today to share your comments about the XSEDE user experience.
- XSEDE deploys Globus Online for data transfer The first official software service on XSEDE has been accepted for production deployment
- The Stampede Era Begins XSEDE supercomputer now operational and available to the national open science community
- Call for ParticipationInternational Summer School on HPC Challenges in Computational Sciences
- XSEDE, European Grid Infrastructure seek collaborative use cases
Deadline extended to March 8!
- XSEDE offers free online parallel computing course Learn to use parallel computers more efficiently and productively
- NICS makes the top of Green500 list XSEDE partner recognized for energy-conscious high-performance computer, Beacon
- XSEDE's John Towns appointed to Compute Canada board of directors Board includes leaders in industry, academia, and computational research
- STILL ACCEPTING RESPONSES to Cloud Use Survey from XSEDE, NSF All researchers encouraged to respond and help shape future of cloud computing in XSEDE
- Make room for Stampede: TACC expands data center for new supercomputer
Read more about the new data center at TACC
See TACC Deputy Director, Dan Stanzione describe the new center
- SDSC welcomes Gordon supercomputer as a research powerhouse
Read more about SDSC's Gordon
- Campus Bridging Early Adopter Program issues Call For Proposals to be submitted Dec. 1-9
Read more about the program
- XSEDE12 announced -- first conference of Extreme Science and Engineering Discovery Environment
Read more about XSEDE12
- PSC, SGI Team Up on Shared-Memory Supercomputer
Read more about PSC's shared-memory supercomputer
- Pittsburgh Supercomputing Center Wins High-Performance Computing Award
Read more about PSC
- Blacklight Goes to Work at the Pittsburgh Supercomputing Center
Read more about Blacklight
- Ranger supercomputer's lifespan extended one year as part of NSF XD initiative.
Read more about Ranger
- Kraken set to deliver 2 billionth CPU hour, sustains 96 percent utilization
Read more about Kraken
- TACC Offers New, Broader Computational Biology Software Stack to Open Science Community.
Read more about biology software stack
- ACM launches new Special Interest Group on High Performance Computing. Join by Nov. 18 for special rate.
Read more about the new SIGHPC
- 'What Are You Working on Today,' Ranger, Jaguar and iForge?
Read more about TACC's Ranger supercomputer
Read more about ORNL's Jaguar supercomputer
Read more about NCSA's iForge supercomputer
- Adventures with HPC Accelerators, GPUs and Intel MIC Coprocessors
Read more about experiences with new hardware
- Developing Scientific Computing Communities
Read more about development efforts
- Indiana University to create the National Center for Genome Analysis Support, which will be integrated with XSEDE resources
Read more about the NCGAS at IU
- Scientists use XSEDE/TeraGrid resources to determine how shock waves move through solids
Read more about 'super-elastic shock waves'
- XSEDE upgrades network
Read more about the XSEDE upgrade
- Richard Tapia, Rice University mathematician and professor and member of XSEDE outreach team, receives National Medal of Science
Watch the Oct. 21 webcast
Read more about Tapia's award
Learn more about Richard Tapia
- Stampede's comprehensive capabilities to bolster U.S. open science computational resources
Read more about Stampede
Watch a video of Jay Boisseau, director of TACC, discussing Stampede
- SDSC announces scalable, high-performance data storage cloud
Read more about SDSC cloud
- Appro and SDSC Gordon supercomputer to provide up to 35M IOPS
Read more about SDSC's Gordon
- Dr. Barry Schneider from the National Science Foundation to describe XSEDE in the Oklahoma Supercomputing Symposium keynote, Oct. 11-12
Read more about Dr. Schneider's keynote
Go to symposium site
- Students research solar cells with HPC
Read more about HPC and solar research
- Seeing Is Believing: Extreme Digital visualization and data analysis resources help researchers derive insights from massive data sets
Read more about Extreme Digital
- New "Memory Advantage Program" on Blacklight at the Pittsburgh Supercomputing Center
Read more about PSC's MAP
- XSEDE project brings advanced cyberinfrastructure, digital services, and expertise to nation's scientists and engineers
Read more about XSEDE
- Watch the John Towns video
- How XSEDE will facilitate collaborative science
Read more about XSEDE and collaboration