The National Science Foundation's eXtreme Digital (XD) program is making new infrastructure and next-generation digital services available to researchers and educators. They'll use that infrastructure to handle the huge volumes of digital information that are now a part of their work--the results of supercomputing simulations, the data generated by large scientific instruments such as telescopes, and the existing data that can be mined from a host of public sources.

Many of the supercomputers and high-end visualization and data analysis resources connected by the XSEDE project are supported by NSF's eXtreme Digital program.

Other projects that are part of the NSF eXtreme Digital program include:

  • The XD Technology Database, which is publicly available, allows XSEDE users and other technology providers to submit their tools for evaluation by the XSEDE team and suggest other technologies that would be of use to the XSEDE user community.

  • XSEDE Metrics on Demand offers tools to benchmark user satisfaction and resource use across the XSEDE project.

  • FutureSystems, a distributed, high-performance test-bed, allows scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing.

XSEDE also partners with organizations outside of the NSF eXtreme Digital program. These relationships improve the quality and diversity of the resources and services available to the open scientific research community through XSEDE. They also help expand the XSEDE community to new groups and research teams. These XSEDE partners include:

  • Open Science Grid brings together computing and storage resources from U.S. campuses and research communities into a common, shared-grid infrastructure. OSG will help provide the high-throughput resources that many research teams need.

  • PRACE supports a pan-European computing infrastructure and includes 19 member countries. By partnering, PRACE and XSEDE will provide the technical and administrative means for international scientific collaborations. The organizations also will work together on joint user support and training activities.


NSF Proposal Title Start Date Ende Date Abstract
Mainstreaming Volunteer Computing October 1, 2011 September 30, 2016
This award funds the continued operations
and further development of Einstein@Home and its software infrastructure, the Berkeley Open Infrastructure for Network Computing (BOINC). Einstein@Home is one of the largest and most powerful computers on the planet. It searches astrophysical data for the weak signals from spinning neutron stars. Unlike a normal supercomputer, the computing power of Einstein@Home comes from ordinary home computers and laptops that have been "signed up" by about 300,000 members of the general public. When otherwise idle, these computers automatically download observational data over the Internet from Einstein@Home servers, search the data for the weak signals from spinning neutron stars, and return the results of the analysis to the servers.

Neutron stars are exotic objects: they represent the most compact form that a star can take before it collapses into a black hole. Since they were discovered in 1967, about two thousand neutron stars have been found (including several discovered in 2010 and 2011 by Einstein@Home). Neutron star observations provide a unique view into the behavior of matter at extreme pressures and densities, and into the nature of gravitation when gravity is very strong. Under certain circumstances, neutron stars can be emitters of pulsing radio waves (pulsars). Einstein@home exploits the unique capabilities of the Arecibo Radio Observatory, the largest and most sensitive single-dish radio telescope in the world, to search for these signals. It is possible that neutron stars can also emit gravitational waves. Gravitational waves were first predicted by Einstein in 1917 but have never been directly detected. Einstein@home can search the data from gravitational wave detectors such as those of the Laser Interferometer Gravitational-wave Observatory (LIGO) for these signals. Einstein@home also supports the BOINC software infrastructure to benefit dozens of computationally intensive projects in other areas of science, that also exploit volunteer distributed computing. And it is a remarkable tool for scientific outreach: Einstein@Home allows hundreds of thousands of ordinary citizens from around the world to participate in and make meaningful contribution to cutting-edge scientific research.
SI2-SSI: SciDaaS -- Scientific data management as a service for small/medium labs April 1, 2012 March 31, 2016
The SciDaaS project will develop and operate
a suite of innovative research data management services for the NSF community. These services, to be accessible at, will allow research laboratories to outsource a range of time-consuming research data management functions, including storage and movement, publication, and metadata management. SciDaaS research will investigate what services are most needed by NSF researchers; how best to present these services to integrate with diverse research laboratory environments; and how these services are used in practice across different research communities.

SciDaaS will greatly reduce the cost to the individual researcher of acquiring and operating sophisticated scientific data management capabilities. In so doing, it has the potential to dramatically expand use of advanced information technology in NSF research and thus accelerate discovery across many fields of science and engineering. By providing a platform for researchers to publicly share data at an incremental cost, SciDaaS will also reduce barriers to free exchange among researchers and contribute to the democratization of science.
Collaborative Research: Integrated HPC Systems Usage and Performance of Resources Monitoring and Modeling (SUPReMM- SUNY Buffalo) 7/1/2012 6/30/2015
Today's high-performance computing systems
are a complex combination of software, processors, memory, networks, and storage systems characterized by frequent disruptive technological advances. In this environment, system managers, users and sponsors find it difficult if not impossible to know if optimal performance of the infrastructure is being realized, or even if all subcomponents are functioning properly. Users of such systems are often engaged in science at the extreme where system uncertainties can significantly delay or even confound the scientific investigations. Critically, for systems based on open source software systems which includes a large fraction of XSEDE resources, the data and information necessary to use and manage these complex systems is not available. HPC centers and their users, are to some extent flying blind, without a clear understanding of system behavior. Anomalous behavior has to be diagnosed and remedied with incomplete and sparse data. It is difficult for users to assess the effectiveness with which they are using the available resources to generate knowledge in their sciences. NSF lacks a comprehensive knowledge base to evaluate the effectiveness of its investments in HPC systems.
This award will address this problem through the creation of a comprehensive set of tools for developing the needed knowledge bases. This will be accomplished by building on and combining work on HPC systems monitoring and reporting currently underway at the University at Buffalo under the Technology Audit Service (TAS) of the XSEDE project and University of Texas/ Texas Advance Computing Center (TACC) as part of the Ranger Technology Insertion effort with many elements of existing monitoring and analysis tools. The PIs will provide the knowledge bases required to understand the current operations of XSEDE, to enhance and increase the productivity of all of the stakeholders of XSEDE (service providers, users and sponsors), and ultimately to provide open source tools to greatly increase the operational efficiency and productivity of HPC systems in general.
Collaborative Research: Integrated HPC Systems Usage and Performance of Resources Monitoring and Modeling (SUPReMM- UT-Austin) 7/1/2012 6/30/2015
Today's high-performance computing systems
are a complex combination of software, processors, memory, networks, and storage systems characterized by frequent disruptive technological advances. In this environment, system managers, users and sponsors find it difficult if not impossible to know if optimal performance of the infrastructure is being realized, or even if all subcomponents are functioning properly. Users of such systems are often engaged in science at the extreme where system uncertainties can significantly delay or even confound the scientific investigations. Critically, for systems based on open source software systems which includes a large fraction of XSEDE resources, the data and information necessary to use and manage these complex systems is not available. HPC centers and their users, are to some extent flying blind, without a clear understanding of system behavior. Anomalous behavior has to be diagnosed and remedied with incomplete and sparse data. It is difficult for users to assess the effectiveness with which they are using the available resources to generate knowledge in their sciences. NSF lacks a comprehensive knowledge base to evaluate the effectiveness of its investments in HPC systems.
This award will address this problem through the creation of a comprehensive set of tools for developing the needed knowledge bases. This will be accomplished by building on and combining work on HPC systems monitoring and reporting currently underway at the University at Buffalo under the Technology Audit Service (TAS) of the XSEDE project and University of Texas/ Texas Advance Computing Center (TACC) as part of the Ranger Technology Insertion effort with many elements of existing monitoring and analysis tools. The PIs will provide the knowledge bases required to understand the current operations of XSEDE, to enhance and increase the productivity of all of the stakeholders of XSEDE (service providers, users and sponsors), and ultimately to provide open source tools to greatly increase the operational efficiency and productivity of HPC systems in general.
SUPREMM tool July 1, 2012 June 30, 2015
Todays high-performance computing systems
are a complex combination of software, processors, memory, networks, and storage systems characterized by frequent disruptive technological advances. In this environment, system managers, users and sponsors find it difficult if not impossible to know if optimal performance of the infrastructure is being realized, or even if all subcomponents are functioning properly. Users of such systems are often engaged in science at the extreme where system uncertainties can significantly delay or even confound the scientific investigations. Critically, for systems based on open source software systems which includes a large fraction of XSEDE resources, the data and information necessary to use and manage these complex systems is not available. HPC centers and their users, are to some extent flying blind, without a clear understanding of system behavior. Anomalous behavior has to be diagnosed and remedied with incomplete and sparse data. It is difficult for users to assess the effectiveness with which they are using the available resources to generate knowledge in their sciences. NSF lacks a comprehensive knowledge base to evaluate the effectiveness of its investments in HPC systems.

This award will address this problem through the creation of a comprehensive set of tools for developing the needed knowledge bases. This will be accomplished by building on and combining work on HPC systems monitoring and reporting currently underway at the University at Buffalo under the Technology Audit Service (TAS) of the XSEDE project and University of Texas/ Texas Advance Computing Center (TACC) as part of the Ranger Technology Insertion effort with many elements of existing monitoring and analysis tools. The PIs will provide the knowledge bases required to understand the current operations of XSEDE, to enhance and increase the productivity of all of the stakeholders of XSEDE (service providers, users and sponsors), and ultimately to provide open source tools to greatly increase the operational efficiency and productivity of HPC systems in general.
Center for Trustworthy Scientific Cyberinfrastructure (CTSC) October 1, 2012 June 30, 2015
The Center for Trustworthy Scientific
Cyberinfrastructure (CTSC) will transform and improve the practice of cybersecurity and hence trustworthiness of NSF scientific cyberinfrastructure. CTSC will provide readily available cybersecurity expertise and services, as well as leadership and coordination across a broad range of NSF scientific cyberinfrastructure projects via a series of engagements with NSF cyberinfrastructure projects and a broader ongoing education, outreach and training effort.

Intellectual Merit: CTSC will advance the state of cybersecurity practice across the community by analyzing gaps in cybersecurity technology to provide guidance to researchers and developers, addressing the application of software assessment to complicated cyberinfrastructure software stacks, and fostering broadly the transition of cybersecurity research to practice. Broader Impact: Scientific computing and confidence in its results relies on trustworthy cyberinfrastructure. The CTSC mission is to help provide the trustworthy cyberinfrastructure that science requires across the ecosystem of NSF. CTSC's work will impact science through dozens of cyberinfrastructure projects over the project's lifetime. Additionally, CTSC will perform workforce development in the area of cyberinfrastructure cybersecurity through EOT activities: training, undergraduate curriculum development, and student education.
Latin America-US Institute 2013: Methods in Computational Discovery for Multidimensional Problem Solving November 1, 2012 October 31, 2013
This Pan-American Advanced Studies Institutes
(PASI) award, jointly supported by the NSF and the Department of Energy (DOE), will take place July 2013 at the Universidad del Valle in Guatemala. Organized by Dr. Marshall S. Poole, Professor in the Department of Communications at the University of Illinois, Urbana-Champaign, the PASI aims to introduce junior researchers to methods in computation-based discovery (CBD). In searching for solutions to major problems (e.g., biodiversity, modeling of natural systems, water ecology, and others), researchers across the natural and social sciences as well as the humanities and arts are generating massive and/or highly complex data sets that extend well-beyond humans' capacities to perceive or analyze without sophisticated technological augmentation. CBD allows researchers to gather, transform, and analyze data from a range of sources, including, for example, sensors, video archives, telescopes, and supercomputers. Thus, access to advanced computational resources and also to sophisticated skills in data acquisition, management, transformation, visualization, analytics, and preservation, are highly valued by researchers. For example, sophisticated visualization tools and techniques enhance human understanding of extreme, complex and/or abstract data sets, making it easier to see patterns and relationships and to form or test hypotheses.

The Institute will focus on CBD technical and analytical methods and help investigators apply these to their own research. Key goals are to (1) expand participants' knowledge of high performance computing (HPC) and specialized tools and techniques that support CBD involving massive or complex data sets; (2) provide hands-on experience in exploring large and complex data sets using easily accessible desktop open source tools; (3) bring researchers from underrepresented populations into the CBD field; and (4) foster new collegial partnerships that stimulate both national and international co-operative research among the presenters and attendees. In addition, the PASI will also provide up-to-date information on the deliberations of the PASI to a wider audience through a web page to disseminate results and reports of the meeting.
EAGER proposal: Toward a Distributed Knowledge Environment for Research into Cyberinfrastructure: Data, Tools, Measures, and Models for Multidimensional Innovation Network Analysis September 1, 2013 August 31, 2015
Although many virtual organizations (VO) are
quite effective, not all VO practitioners are effective in each area, and there is no organized body of knowledge or set of ?best practices? among VOs to draw upon for key issues. Therefore centers are likely not as effective as they could be. This proposal involves the creation of an online knowledge exchange. This Virtual Organization Resources and Toolkits Exchange (VORTEX) would provide leaders of virtual organizations with resources about running virtual organizations and access to relevant organizational scientists. VORTEX is intended to aid in building a community among virtual organization leaders so that they can collaborate, share, and learn with and from each other.

Specific Objectives of the work include development, evaluation, and improvement of an online Virtual Organization Resources and Toolkits Exchange (VORTEX) environment to aid scientists and engineers to more effectively lead virtual organizations. This type of environment is necessary in order to:

(1) Connect leaders of virtual organizations with appropriate organization scientists;

(2) Provide online educational and reference materials for issues associated with managing virtual organizations; and

(3) Establish a center for leaders of virtual organization to share and collaborate with each other.
Multiscale Software for Quantum Simulations in Materials Design, Nano Science and Technology September 1, 2013 August 31, 2016
The emergence of petascale computing platforms
brings unprecedented opportunities for transformational research through simulation. However, future breakthroughs will depend on the availability of high-end simulation software, which will fully utilize these unparalleled resources and provide the long-sought third avenue for scientific progress in key areas of national interest. This award will deliver a set of open source petascale quantum simulation tools in the broad areas of materials design, nano science and nanotechnology. Materials prediction and design are key aspects to the recently created Materials Genome initiative, which seeks to "deploy advanced materials at least twice as fast, at a fraction of the cost." Computational materials design is the critical aspect of that initiative, which relies on computation guiding experiments. The outcomes of the latter will in turn lead to follow-up computation in an iterative feedback loop. Nanoscience, which studies properties of materials and processes on fundamental scale of nanometers, promises development of materials and systems with radically new properties. However, the nanoscale properties are hard to measure and even harder to predict theoretically. Only simulations that can fully account for the complexity and variability at that fundamental scale stand a chance of predicting and utilizing the macroscopic properties that emerge. This truly requires petascale resources and efficient petascale software tools.

This award will develop software tools build on the real-space multigrid (RMG) software suite and distribute them to the national user community. The RMG code already scales to 128,000 CPU cores and 18,000 GPU nodes. The award will further enhance RMG through development of new iterative methods with improved convergence, optimization of additional modules for existing and new petascale computing platforms, and creation of ease-to-use interfaces to the main codes. Workshops in RMG usage will be conducted at XSEDE workshops and other meetings of NSF supercomputing centers. RMG will be distributed through a web portal, which will also contain user forums and video tutorials, recorded at live user sessions. A library of representative examples for the main petascale platforms will be maintained. RMG will enable quantum simulations of unprecedented size, enabling studies of the building blocks of functional nano or bio-nano structures, which often involve thousands of atoms and must be described with the requisite fidelity. The development of petascale quantum simulation software and its user community will lead to cross-fertilization of ideas both within and across fields. Students and postdocs trained in this area will have significant opportunities for advancement and making substantial impact on their own.
MRI: Acquisition of SuperMIC-- A Heterogeneous Computing Environment to Enable Transformation of Computational Research and Education in the State of Louisiana October 1, 2013 September 30, 2016
This is an award to acquire a compute cluster at
LSU. The computer is a heterogeneous HPC cluster named SuperMIC containing both Intel Xeon Phi and NVIDIA Kepler K20X GPU (graphics processing unit) accelerators. The intent is to conduct research on programming such clusters while advancing projects that are dependent on HPC. The efforts range from modeling conditions which threaten coastal environments and test mitigation techniques; to simulating the motions of tumors/organs in cancer patients due to respiratory actions to aid radiotherapy planning and management. The burden of learning highly complex hybrid programming models presents an enormous software development crisis and demands a better solution. SuperMIC will serve as the development platform to extend current programming frameworks, such as Cactus, by incorporating GPU and Xeon Phi methods. Such frameworks allow users to move seamlessly from serial to multi-core to distributed parallel platforms without changing their applications, and yet achieve high performance. The SuperMIC project will include training and education at all levels, from a Beowulf boot camp for high school students to more than 20 annual LSU workshops and computational sciences distance learning courses for students at LONI (Louisiana Optical Network Initiative) and LA-SiGMA (Louisiana Alliance for Simulation-Guided Materials Applications) member institutions. These include Southern University, Xavier University, and Grambling State University - all historically black colleges and universities (HBCU) which have large underrepresented minority enrollments. The SuperMIC cluster will be used in the LSU and LA-SiGMA REU and RET programs. It will impact the national HPC community through resources committed to the NSF XSEDE program and the Southeastern Universities Research Association SURAgrid. The SuperMIC will commit 40% of the usage of the machine to the XSEDE XRAC allocation committee.
Open Gateway Computing Environments Science Gateways Platform as a Service (OGCE SciGaP) October 1, 2013 September 30, 2018
Science Gateways are virtual environments that
dramatically accelerate scientific discovery by enabling scientific communities to utilize distributed computational and data resources (that is, cyberinfrastructure). Successful Science Gateways provide access to sophisticated and powerful resources, while shielding their users from the resources' complexities. Given Science Gateways' demonstrated impact on progress in many scientific fields, it is important to remove barriers to the creation of new gateways and make it easier to sustain them. The Science Gateway Platform (SciGaP) project will create a set of hosted infrastructure services that can be easily adopted by gateway providers to build new gateways based on robust and reliable open source tools. The proposed work will transform the way Science Gateways are constructed by significantly lowering the development overhead for communities requiring access to cyberinfrastructure, and support the efficient utilization of shared resources.

SciGaP will transform access to large scale computing and data resources by reducing development time of new gateways and by accelerating scientific research for communities in need of access to large-scale resources. SciGaP's adherence to open community and open governance principles of the Apache Software Foundation will assure open source software access and open operation of its services. This will give all project stakeholders a voice in the software and will clear the proprietary fog that surrounds cyberinfrastructure services. The benefits of SciGaP services are not restricted to scientific fields, but can be used to accelerate progress in any field of endeavor that is limited by access to computational resources. SciGaP services will be usable by a community of any size, whether it is an individual, a lab group, a department, an institution, or an international community. SciGaP will help train a new generation of cyberinfrastructure developers in open source development, providing these early career developers with the ability to make publicly documented contributions to gateway software and to bridge the gap between academic and non-academic development.
Sustaining Globus Toolkit for the NSF Community (Sustain-GT) October 1, 2013 September 30, 2018
Science and engineering depend increasingly on the ability to
collaborate and federate resources across distances. This observation holds whether a single investigator is accessing a remote computer, a small team is analyzing data from an engineering experiment, or an international collaboration is involved in a multi-decade project such as the Large Hadron Collider (LHC). Any distributed collaboration and resource federation system requires methods for authentication and authorization, data movement, and remote computation. Of the many solutions that have been proposed to these problems, the Globus Toolkit (GT) has proven the most persistently applicable across multiple fields, geographies, and project scales. GT resource gateway services and client libraries are used by tens of thousands of people every day to perform literally tens of millions of tasks at thousands of sites, enabling discovery across essentially every science and engineering discipline supported by the NSF. As new, innovative techniques and technologies for collaboration and scientific workflows are developed, and as new computing and instrument resources are added to the national cyberinfrastructure, these technologies and other improvements must be added and integrated into GT so that it can continue to provide an advanced and robust technology for solving scientific research problems.

The Sustain-GT project builds on past success to ensure that GT resource gateway services will continue to meet the challenges faced by NSF science and engineering communities. These challenges include: multiple-orders-of-magnitude increases in the volume of data generated, stored, and transmitted; much bigger computer systems and correspondingly larger and more complex computations; much faster networks; many more researchers, educators, and students engaged in data-intensive and computational research; and rapidly evolving commodity Web and Cloud computing environments. With the help of a new User Requirements Board, Sustain-GT will respond to community demands to evolve the GT resource gateway services with superior functionality, scalability, availability, reliability, and manageability. Sustain-GT will also provide the NSF community with high quality support and rapid-response bug fix services, as is required to sustain a heavily used, production system like GT.
CC-NIE Integration: Developing Applications with Networking Capabilities via End-to-End SDN (DANCES) January 1, 2014 December 31, 2015
The DANCES project team of network engineers,
application developers, and research scientists is implementing a software-defined networking (SDN)-enabled end-to-end environment to optimize support for scientific data transfer. DANCES accomplishes this optimization by integrating high performance computing job scheduling, network control capabilities offered by SDN along with data movement applications in an end-to-end network infrastructure. This integration provides access to control mechanisms for managing network bandwidth. The control of network resources enabled by SDN enhances application stability, predictability and performance, thereby improving overall network utilization. Motivation for the DANCES project is to apply the advantages of advanced network services to the problem of congested metropolitan and campus networks. DANCES uses XSEDENet across Internet2 in conjunction with OpenFlow-enabled network switches installed at the collaborating sites as the end-to-end hardware and software substrate.

Knowledge gained through DANCES is being disseminated through educational programs offered by the participating institutions and at existing community workshops, meetings, and conferences. The insights and experience obtained through DANCES will promote a better understanding of the technical requirements for supporting end-to-end SDN across wide area and campus cyberinfrastructure. The resulting SDN-enabled applications will make the request and configuration of high bandwidth connections easily accessible to end users and improve network performance and predictability for supporting a wide range of applications.
A Large-Scale, Community-Driven Experimental Environment for Cloud Research October 1, 2014 September 30, 2017
A persistent problem facing academic cloud research is the
lack of infrastructure and data to perform experimental research: large-scale hardware is needed to investigate the scalability of cloud infrastructure and applications, heterogeneous hardware is needed to investigate algorithmic and implementation tradeoffs, fully-configurable software environments are needed to investigate the performance of virtualization techniques and the differences between cloud software stacks, and data about how clouds are used is needed to evaluate virtual machine scheduling and data placement algorithms.

The Chameleon project will addresses these needs by providing a large-scale, fully configurable experimental testbed driven by the needs of the cloud research and education communities. The testbed, and the ecosystem associated with it, will enable researchers to explore a range of cloud research challenges, from large scale to small scale, including exploring low-level problems in hardware architecture, systems research, network configuration, and software design, or at higher levels of abstraction looking at cloud scheduling, cloud platforms, and cloud applications.

Chameleon will significantly enhance the ability of the computing research community to understand the behavior of Internet scale cloud systems, and to develop new software, ideas and algorithms for the cloud environment. As the tremendous shift to cloud as the primary means of providing computing infrastructure continues, a large-scale testbed tailored to researchers' needs is essential to the continued relevance of a large fraction of computing research.

The project is led by the University of Chicago and includes partners from the Texas Advanced Computing Center (TACC), Northwestern University, the Ohio State University, and the University of Texas at San Antonio, comprising a highly qualified and experienced team, with research leaders from the cloud and networking world blended with providers of production quality cyberinfrastructure. The team includes members from the NSF-supported FutureGrid project and from the GENI community, both forerunners of the NSFCloud solicitation under which this project is funded.

The Chameleon testbed, will be deployed at the University of Chicago (UC) and the Texas Advanced Computing Center (TACC) and will consist of 650 multi-core cloud nodes, 5PB of total disk space, and leverage 100 Gbps connection between the sites. While a large part of the testbed will consist of homogenous hardware to support large-scale experiments, a portion of it will support heterogeneous units allowing experimentation with high-memory, large-disk, low-power, GPU, and co-processor units. The project will also leverage existing FutureGrid hardware at UC and TACC in its first year to provide a transition period for the existing FutureGrid community of experimental users.

To support a broad range of experiments emphasizing a range of requirements ranging from a high degree of control to ease of use the project will support a graduated configuration system allowing full user configurability of the stack, from provisioning of bare metal and network interconnects to delivery of fully functioning cloud environments. In addition, to facilitate experiments, Chameleon will support a set of services designed to meet researchers needs, including support for experimental management, reproducibility, and repositories of trace and workload data of production cloud workloads.

To facilitate the latter, the project will form a set of partnerships with commercial as well as academic clouds, such as Rackspace and Open Science Data Cloud (OSDC). It will also partner with other testbeds, notably GENI and INRIA's Grid5000 testbed, and reach out to the user community to shape the policy an direction of the testbed.

The Chameleon project will bring a new dimension and scale of resources to the CS community who wish to educate their students about design, implementation, operation and applications of cloud computing, a critical skillset for future computing professionals. It will enhance the understanding and application of experimental methodology in computer science and generate new educational materials and resources, with the participation of, and for, Minority Serving Institution (MSI) students.
MRI: Acquisition of a National CyberGIS Facility for Computing- and Data-Intensive Geospatial Research and Education October 1, 2014 September 30, 2017
Collaborative, interactive, and scalable knowledge discovery,
in the form on processing and visualizing massive amounts of complex geospatial data and performing associated analysis and simulation, have become essential to fulfilling the important role of the emerging and vibrant interdisciplinary field of CyberGIS -- geographic information science and systems (GIS) based on advanced cyberinfrastructure -- in enabling computing- and data-intensive research and education across a broad swath of academic disciplines with significant societal impacts.

This project supports these activities by establishing the CyberGIS Facility as an innovative instrument equipped with capabilities that include high-performance data access with large disk storage, cutting-edge computing configured with advanced graphics processing units, and visualization supported with fast network and dynamically provisioned cloud computing resources. The CyberGIS Facility represents a groundbreaking advance in the broad context of advanced cyberinfrastructure and geospatial sciences and technologies. The Facility enables researchers to solve a diverse set of major and complex scientific problems (e.g., climate and weather predictions, emergency management, and environmental and energy sustainability) in multidisciplinary, bio, engineering, geo, and social sciences that would otherwise be impossible or difficult to tackle. Extensive advances in various education and training efforts (e.g., new courses, cross-disciplinary curricula, and online learning materials) help to produce a next-generation workforce for fostering CyberGIS-enabled discoveries and innovations. Facility users represent a wide range of disciplines and conduct leading-edge research sponsored by various agencies and organizations (e.g., NSF, Environmental Protection Agency, National Institutes of Health, National Aeronautics and Space Administration, and U.S. Geological Survey), which highlight the impact that this project has in enabling broad and significant scientific advances.
Acquisition of an Extreme GPU cluster for Interdisciplinary Research October 1, 2014 September 30, 2017
Stanford University requests $3,500,000 over 36 months to
acquire an extreme GPU HPC cluster, called X-GPU, comprising 54 compute nodes built using the Cray Hydra technology with FDR Infiniband. Each node has Intel Haswell 12-cores; 8 NVIDIA Kepler cards; 128 GB of DDR4 memory; a 120 GB SSD and two 1 TB hard drives. energy-efficient, computational facility providing almost a petaflop of computational power. It will be used by 1) at least 25 research groups representing more than 100 students and postdoctorals at Stanford across 15 departments and 4 schools, 2) at least 8 collaborators from at least 7 other institutions across the nation, and 3) by as many as hundreds of national researchers through the NSF-sponsored XSEDE allocation system. The PIs plan to offer 25% of X-GPU to XSEDE to offset the impacts from the planned retiring of Keeneland, the current XSEDE resource providing heterogeneous parallel computing with CPUs and GPUs to the national community.

Identified scientific outcomes enabled by this instrument include, but not limited to: astrophysics and cosmology, bioinformatics and biology, materials modeling, and climate modeling. The researchers have already invested significant efforts to develop modeling and simulation codes that can demonstrate high performance on GPU-accelerated clusters. The PIs plan develop software infrastructure and educational materials to help the national community in the transition to fine-grained parallel thinking and algorithm design, which is critical to effectively use this novel high-performance, low-cost, energy-efficient architecture.
The Centrality of Advanced Digitally-ENabled Science: CADENS October 1, 2014 September 30, 2017
Computational data science is at a turning point in its
history. Never before has there been such a challenge to meet the growing demands of digital computing, to fund infrastructure and attract diverse, trained personnel to the field. The methods and technologies that define this evolving field are central to modern science. In fact, advanced methods of computational and data-enabled discovery have become so pervasive that they are referred to as paradigm shifts in the conduct of science. A goal of this Project is to increase digital science literacy and raise awareness about the Centrality of Advanced Digitally ENabled Science (CADENS) in the discovery process. Digitally enabled scientific investigations often result in a treasure trove of data used for analysis. This project leverages these valuable resources to generate insightful visualizations that provide the core of a series of science education outreach programs targeted to the broad public, educational and professional communities. From the deep well of discoveries generated at the frontiers of advanced digitally enabled scientific investigation, this project will produce and disseminate a body of data visualizations and scalable media products that demonstrate advanced scientific methods. In the process, these outreach programs will give audiences a whole new look at the world around them. The project calls for the production and evaluation of two principal initiatives. The first initiative, HR (high-resolution) Science, centers on the production and distribution of three ultra-high-resolution digital films to be premiered at giant screen full-dome theaters; these programs will be scaled for wide distribution to smaller theaters and include supplemental educator guides. The second initiative, Virtual Universe, includes a series of nine high-definition (HD) documentary programs. Both initiatives will produce and feature data visualizations and the CADENS narratives to support an integrated set of digital media products. The packaged outreach programs will be promoted and made available to millions through established global distribution channels. Expanding access to data visualization is an essential component of the Project. Through a call for participation (CFP), the Project provides new opportunities for researchers to work with the project team and technical staff for the purpose of creating and broadly distributing large-scale data visualizations in various formats and resolutions. The project will feature these compelling, informative visualizations in the outreach programs described above. A Science Advisory Committee will participate in the CFP science selections and advise the Project team. The project calls for an independent Program Evaluation and Assessment Plan (PEAP) to iteratively review visualizations and the outreach programs that will target broad, diverse audiences. The project launches an expansive outreach effort to increase digital science literacy and to convey forefront scientific research while expanding researchers access to data visualization. The project leverages and integrates disparate visualization efforts to create a new optimized large-scale workflow for high-resolution museum displays and broad public venues. The PEAP evaluations will measure progress toward project goals and will reveal new information about visualization's effectiveness to move a field forward and to develop effective outreach models. The project specifically targets broad audiences in places where they seek high-quality encounters with science: at museums, universities, K-16 schools, and the web. This distribution effort includes creating and widely disseminating the project outreach programs and supplemental educator guides. The project visualizations, program components, HD documentaries, educational and evaluation materials will be promoted, distributed and made freely available for academic, educational and promotional use. Dissemination strategies include proactively distributing to rural portable theaters, 4K television, professional associations, educators, decision-makers, and conferences. To help address the critical challenge of attracting women and underrepresented minorities to STEM fields, the Project will support a Broadening Participation in Visualization workshop and will leverage successful XSEDE/Blue Waters mechanisms to recruit under-represented faculty and students at minority-serving and majority-serving institutions and to disseminate the Project programs and materials among diverse institutions and communities.
CloudLab: Flexible Scientific Infrastructure to Support Fundamental Advances in Cloud Architectures and Applications October 1, 2014 September 30, 2017
Many of the ideas that drive modern cloud computing, such as
server virtualization, network slicing, and robust distributed storage, arose from the research community. But because today's clouds have particular, non-malleable implementations of these ideas "baked in," they are unsuitable as facilities in which to conduct research on future cloud architectures. This project creates CloudLab, a facility that will enable fundamental advances in cloud architecture. CloudLab will not be a cloud; CloudLab will be large-scale, distributed scientific infrastructure on top of which many different clouds can be built. It will support thousands of researchers and run hundreds of different, experimental clouds simultaneously. The Phase I CloudLab deployment will provide data centers at Clemson (with Dell equipment), Utah (HP), and Wisconsin (Cisco), with each industrial partner collaborating to explore next-generation ideas for cloud architectures

CloudLab will be a place where researchers can try out ideas using any cloud software stack they can imagine. It will accomplish this by running at a layer below cloud infrastructure: it will provide isolated, bare-metal access to a set of resources that researchers can use to bring up their own clouds. These clouds may run instances of today's popular stacks, modest modifications to them, or something entirely new. CloudLab will not be tied to any particular particular cloud stack, and will support experimentation on multiple in parallel.

The impact of cloud computing outside the field of computer science has been substantial: it has enabled a new generation of applications and services with direct impacts on society at large. CloudLab is positioned to have an immediate and substantial impact on the research community by providing access to the resources it needs to shape the future of clouds. Cloud architecture research, enabled by CloudLab, will empower a new generation of applications and services which will bring direct benefit to the public in areas of national priority such as medicine, smart grids, and natural disaster early warning and response.
RUI: CAREER Organizational Capacity and Capacity Building for Cyberinfrastructure Diffusion August 1, 2015 August 31, 2020
The vision behind advanced cyberinfrastructure (CI) is that
its development, acquisition, and provision will transform science and engineering in the 21st century. However, CI diffusion is full of challenges, because the adoption of the material objects also requires the adoption of a set of related behavioral practices and philosophical ideologies. Most critically, CI-enabled virtual organizations (VOs) often lack the full range of organizational capacity to effectively integrate and support the complex web of objects, practices, and ideologies as a holistic innovation.

This project examines the various manifestations of CI related objects, practices, and ideologies, and the ways they support CI implementation in scientific VOs. Using grounded theory analysis of interviews and factor analysis of survey data, this project will develop and validate a robust framework/measure of organizational capacity for CI diffusion. The project's empirical focus will be the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE;, a nationwide network of distributed high-performance computing resources. Interviews and surveys will solicit input from domain scientists, computational technologists, and supercomputer center administrators (across e-science projects, institutions, and disciplines) who have experience with adopting and using CI tools within the XSEDE ecosystem. The project will generate a series of capacity building strategies to help VOs increase the organizational capacity necessary to fully adopt CI. Findings will help NSF and other federal agencies to improve existing and future CI investments. This project may also have implications for open-source and commercial technologies that harness big data for complex simulations, modeling, and visualization analysis.
MRI Collaborative Consortium: Acquisition of a Shared Supercomputer by the Rocky Mountain Advanced Computing Consortium September 1, 2015 August 31, 2018
A cluster supercomputer is deployed by the
University of Colorado Boulder (CU-Boulder) and Colorado State University (CSU) for the Rocky Mountain Advanced Computing Consortium (RMACC). This high-performance computing (HPC) system supports multiple research groups across the Rocky Mountain region in fields including astrophysics, bioinformatics, chemistry, computational fluid dynamics, earth system science, life science, material science, physics, and social sciences with advanced computing capabilities. It also provides a platform to investigate and address the impact of many-core processors on the applications that support research in these fields.

The system integrates nodes populated with Intel's conventional multicore Xeon processors and Many-Integrated-Core (MIC) 'Knights Landing' Phi processors interconnected by Intel's new Omni-Path networking technology. Users of the new HPC system have access to existing data management services including data storage, data sharing, metadata consulting, and data publishing, leveraging the NSF-funded high-performance networking infrastructure and long term storage system, as well as additional cyberinfrastructure, at CU-Boulder and CSU. The many-core feature of this HPC system enhances graduate and undergraduate students' education and training as they develop, deploy, test, and run optimized applications for next generation many-core architectures. Training for researchers and students is provided through workshops appropriate for introducing diverse audiences to the efficient and effective use of HPC systems, the challenges of vectorization for single core performance, shared memory parallelism, and issues of data management. Additionally, advanced workshops on large-scale distributed computing, high-throughput computing, and data-intensive computing are offered during the year and at the annual RMACC student-centric HPC Symposium. The Symposium brings together hundreds of students, researchers, and professionals from universities, national laboratories and industry to exchange ideas and best practices in all areas of cyberinfrastructure. For-credit HPC classes will be delivered for online participation, educating the next generation of computational scientists in state-of-the-art computational techniques.
EarthCube RCN: Collaborative Research: Research Coordination Network for HighPerformance Distributed Computing in the Polar Sciences September 1, 2015 August 31, 2017
One of the major current challenges with polar
cyberinfrastructure is managing and fully exploiting the volume of high-resolution commercial imagery now being collected over the polar regions. This data can be used to understand the changes in polar regions due to climate change and other processes. The potential of global socio-economic costs of these impacts make it an urgent priority to better understand polar systems. Understanding the mechanisms that underlie polar climate change and the links between polar and global climate systems requires a combination of field data, high-resolution observations from satellites, airborne imagery, and computer model outputs. Computational approaches have the potential to support faster and more fine-grained integration and analysis of these and other data types, thus increasing the efficiency of analyzing and understanding the complex processes. This project will support advances in computing tools and techniques that will enable the Polar Sciences Community to address significant challenges, both in the short and long-term.

The impact of this project will be in the improvements in the ability to utilize advanced cyberinfrastructure and high-performance distributed computing to fundamentally alter the scale, sophistication and scope of polar science problems that will be addressed. This project will not implement those changes but will identify and lay the groundwork for such impact across the Polar Sciences. The Project personnel will identify primary barriers to the uptake of high-performance and distributed computing and will help alleviate them through a combination of community based solutions and training. The project will also produce a roadmap detailing a credible and effective way to meet the long-term computing challenges faced by the Polar Science community and possible plans to effectively address them. This project will establish mechanisms for community engagement which include, gathering technical requirements for polar cyberinfrastructure and supporting and training early career scientists and graduate students.
Fostering Successful Innovative Large-Scale, Distributed Science and Engineering Projects through Integrated Collaboration September 1, 2015 August 31, 2016
Large-scale, innovative science and engineering requires
collaboration across geographically-distributed, multidisciplinary teams; however, it is very difficult for projects to maintain intellectual cohesion, tight coordination, and integration necessary to manage scalable ?virtual? work that is distributed.

The goal of this project is to help teams to succeed - maximizing efficiency, effectiveness, and innovativeness. This proposal is to develop the capacity for leaders of Centers, Institutes, Labs and other collaborations to plan and pursue transformative research agendas in order to truly create breakthroughs in smart and connected health, cyberphysical systems, smart cities, cybersecurity, big data, environmental sustainability, and across the domains of basic research. Training in design and management of such collaborations will be provided and tools and techniques will be developed. This work will draw on lessons learned from organization science to develop a customized curriculum to help large-scale science and engineering teams effectively and efficiently collaborate at this scale. Using the developed materials, the first of a series of workshops targeting potential principle investigators interested in large-scale Computer and Information Science and Engineering (CISE)-related projects will be conducted in the spring of 2016.
BD Hubs: Midwest: "SEEDCorn: Sustainable Enabling Environment for Data Collaboration that you are proposing in response to the NSF Big Data Regional Innovation Hubs (BD Hubs): Accelerating the Big Data Innovation Ecosystem (NSF 15-562) solicitation October 1, 2015 September 30, 2018
Catalyzed by the NSF Big Data Hub program, the Universities
of Illinois, Indiana, Michigan, North Dakota, and Iowa State University have created a flexible regional Midwest Big Data Hub (MBDH), with a network of diverse and committed regional supporting partners (including colleges, universities, and libraries; non-profit organizations; industry; city, state and federal government organizations who bring data projects from multiple private, public, and government sources and funding agencies). The NSF-funded SEEDCorn project will be the foundational project to energize the activities of MBDH, leveraging partner activities and resources, coordinating existing projects, initiating 20-30 new public-private partnerships, sharing best practices and data policies, starting pilots, and helping to acquire funding. The result of SEEDCorn will be a sustainable hub of Big Data activities across the region and across the nation that enable research communities to better tackle complex science, engineering, and societal challenges, that support competitiveness of US industry, and that enable decision makers to make more informed decisions on topics ranging from public policy to economic development.

The MBDH is focusing on specific strengths and themes of importance to the Midwest across three sectors: Society (including smart cities and communities, network science, business analytics), Natural & Built World (including food, energy, water, digital agriculture, transportation, advanced manufacturing), and Healthcare and Biomedical Research (which spans patient care to genomics). Integrative "rings" connect all spokes and will be organized around themes of specific MBDH strengths, including (a) Data Science, where computational and statistical approaches can be developed and integrated with domain knowledge and societal considerations that support the underlying needs of "data to knowledge," (b) services, infrastructure, and tools needed to collect, store, link, serve, and analyze complex data collections, to support pilot projects, and ultimately provide production-level data services across the hub, and (c) educational activities needed to advance the knowledge base and train a new generation of data science-enabled specialists and a more general workforce in the practice and use of data science and services.
Secure Data Architecture: Shared Intelligence Platform for Protecting our National Cyberinfrastructure" that you are proposing in response to the NSF Cybersecurity Innovation for Cyberinfrastructure (NSF 15-549) solicitation December 1, 2015 November 30, 2018
This research is expected to significantly enhance the
security of campus and research networks. It addresses the emerging security challenge of open, unrestricted access to campus research networks, but beyond that it lays the foundation for an evolvable intelligence sharing network with the very real potential for national scale analysis of that intelligence. Further it will supply cyber security researchers with a rich real-world intelligence source upon which to test their theories, tools, and techniques. The research will produce a new kind of virtual security appliance that will significantly enhance the security posture of open science networks so that advanced high-performance network-based research can be carried out free of performance lags induced by more traditional security controls.

This research will integrate prior research results, expertise and security products from from both the National Science Foundation and the Department of Energy to advance the security infrastructure available for open science networks, aka Science DMZs. Further the effort will actively promote sharing of intelligence among science DMZ participants as well as with national academic computational resources and organizations that wish to participate. Beyond meeting the security needs of campus-based DMZs, the effort will lay the foundation for an intelligence sharing infrastructure that will provide a significant benefit to the cybersecurity research community, making possible the collection, annotation, and open distribution of a national scale security intelligence to help test and validate on-going security research.
CILogon 2.0 project that you are proposing in response to the NSF Cybersecurity Innovation for Cyberinfrastructure (NSF 15-549) solicitation January 1, 2016 December 31, 2018
When scientists work together, they use web sites and other
software to share their ideas and data. To ensure the integrity of their work, these systems require the scientists to log in and verify that they are part of the team working on a particular science problem. Too often, the identity and access verification process is a stumbling block for the scientists. Scientific research projects are forced to invest time and effort into developing and supporting Identity and Access Management (IdAM) services, distracting them from the core goals of their research collaboration. The "CILogon 2.0" project provides an IdAM platform that enables scientists to work together to meet their IdAM needs more effectively so they can allocate more time and effort to their core mission of scientific research. To ensure that the project makes a real contribution to scientific collaborations, the researchers have partnered with the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration, the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) Physics Frontiers Center, and the Data Observation Network for Earth (DataONE). The project also provides training and outreach to additional scientific collaborations, and the project supports integration with the Extreme Science and Engineering Discovery Environment (XSEDE), which provides a national-scale cyberinfrastructure for scientific research in the US.

Prior to the "CILogon 2.0" project, the CILogon and COmanage projects separately developed platforms for federated identity management and collaborative organization management. Federated identity management enables researchers to use their home organization identities to access cyberinfrastructure, rather than requiring yet another username and password to log on. Collaborative organization management enables research projects to define user groups for authorization to collaboration platforms (e.g., wikis, mailing lists, and domain applications). The "CILogon 2.0" project integrates and expands on the existing CILogon and COmanage software to provide an integrated Identity and Access Management (IdAM) platform for cyberinfrastructure. This IdAM platform serves the unique needs of research collaborations, namely the need to dynamically form collaboration groups across organizations and countries, sharing access to data, instruments, compute clusters, and other resources to enable scientific discovery. The project provides a software-as-a-service platform to ease integration with cyberinfrastructure, while making all software components publicly available under open source licenses to enable re-use.
DIBBs: Merging Science and Cyberinfrastructure Pathways: The Whole Tale March 1, 2016 February 28, 2021
Scholarly publications today are still mostly
disconnected from the underlying data and code used to produce the published results and findings, despite an increasing recognition of the need to share all aspects of the research process. As data become more open and transportable, a second layer of research output has emerged, linking research publications to the associated data, possibly along with its provenance. This trend is rapidly followed by a new third layer: communicating the process of inquiry itself by sharing a complete computational narrative that links method descriptions with executable code and data, thereby introducing a new era of reproducible science and accelerated knowledge discovery. In the Whole Tale (WT) project, all of these components are linked and accessible from scholarly publications. The third layer is broad, encompassing numerous research communities through science pathways (e.g., in astronomy, life and earth sciences, materials science, social science), and deep, using interconnected cyberinfrastructure pathways and shared technologies.

The goal of this project is to strengthen the second layer of research output, and to build a robust third layer that integrates all parts of the story, conveying the holistic experience of reproducible scientific inquiry by (1) exposing existing cyberinfrastructure through popular frontends, e.g., digital notebooks (IPython, Jupyter), traditional scripting environments, and workflow systems; (2) developing the necessary 'software glue' for seamless access to different backend capabilities, including from DataNet federations and Data Infrastructure Building Blocks (DIBBs) projects; and (3) enhancing the complete data-to-publication lifecycle by empowering scientists to create computational narratives in their usual programming environments, enhanced with new capabilities from the underlying cyberinfrastructure (e.g., identity management, advanced data access and provenance APIs, and Digital Object Identifier-based data publications). The technologies and interfaces will be developed and stress-tested using a diverse set of data types, technical frameworks, and early adopters across a range of science domains.
Associated Universities, Inc. (AUI) and the National Radio Astronomy Observatory (NRAO) April 1, 2016 September 30, 2026
To enable a world-wide multi-user community to
realize research and education programs of the highest caliber, Associated Universities, Inc. (AUI) presents a strategic vision over the next decade to manage, operate, optimize and disseminate results from the world-leading capabilities of the National Radio Astronomy Observatory (NRAO). With the successful construction of the Atacama Large Millimeter/submillimeter Array (ALMA), and the recent enhancement of the Karl G. Jansky Very Large Array (VLA), two new forefront facilities are moving into routine operation with ever-increasing scientific capability. Taken together, these iconic arrays fulfill a major milestone in modern astronomy, encompassing more than an order-of-magnitude leap in observational capabilities for astronomical sources at frequencies between 1 gigahertz and 1 terahertz.

As prioritized by multiple National Research Council Decadal Surveys in Astronomy and Astrophysics, NRAO facilities are tools for the entire scientific community that will empower discoveries across all fields of astrophysics. ALMA enables transformational research into the physics of the cold Universe, regions that are optically dark but shine brightly in the millimeter/submillimeter portion of the electromagnetic spectrum. Within the broad range of science accessible with ALMA, the top-level objectives include imaging the redshifted dust continuum and molecular line emission from evolving galaxies as early as a redshift of z~10 (500 million years after the Big Bang), determining the chemical composition and dynamics of star-forming gas in normal galaxies like the Milky Way but at z~3 (75% of the way across the Universe), and measuring the gas kinematics in young disks in nearby star-forming clouds. ALMA has already demonstrated its revolutionary impact with its dramatic images of planet, star and galaxy formation. These results will accelerate as the full array becomes operational, and with the longest baselines ALMA will achieve an angular resolution of tens of milli-arseconds. ALMA provides one to two orders-of-magnitude improvement over previous facilities in all areas of millimeter- and submillimeter-wave observations, including sensitivity, angular resolution and image fidelity.

Likewise, at centimeter wavelengths, the broadband VLA has ushered in a new era in radio astronomy, with groundbreaking results published in areas ranging from Galactic proto-stellar clouds to images of the molecular gas in the earliest galaxies. The enhanced VLA is opening new scientific frontiers and explicitly addressing four primary science themes: measuring the strength and topology of cosmic magnetic fields; imaging young stars and massive black holes in dust-enshrouded environments; following the rapid evolution of energetic phenomena; and studying the formation and evolution of stars, galaxies and active galactic nuclei. Improvement over previous performance is up to a factor of 10 in continuum sensitivity and coarsest frequency resolution, and a factor of 1000 or more in finest frequency resolution and the number of frequency channels.

In collaboration with NSF's international partners in ALMA, NRAO will transition ALMA from the current phase of commissioning and early science to full science operations. Already the most capable millimeter/submillimeter facility on the planet, in the next few years ALMA will realize significant new capabilities, further increasing ALMA's scientific productivity. The ALMA Development

Program, a key component in the plan for the coming decade, will solicit and support community input and expertise in upgrading ALMA's capabilities throughout its useful lifetime.

Under AUI management, NRAO will implement a staged VLA infrastructure maintenance and development plan to renew and support operation of the VLA beyond the end of the next decade, followed by community-based planning and technical development for the next-generation centimeter-wave facilities. AUI will expand the NRAO Central Development Laboratory (CDL) mission to enhance NSF's existing radio astronomy facilities, to develop technology and expertise needed to build the next generation of radio astronomy instruments, and to benefit the broader economy via technology transfer. In collaboration with the university community, the CDL will support development for both ALMA and VLA and conduct leading-edge, creative research in both core and exploratory technologies that will continue to be vital to the NRAO mission in the coming decade.

With plans for enhanced user support services and new data manipulation and visualization tools, AUI envisions expanding the NRAO user base beyond traditional radio astronomers and enabling multi-wavelength science by researchers and students. AUI will also ensure that NSF's investment in NRAO achieves the broadest possible impact in cutting-edge research and technical innovation, training the next generation of researchers, and inspiring students and the public.

Building on an existing framework of diversity activities, AUI will conduct ambitious programs to transform the participation of underrepresented groups in science and engineering. An Office of Diversity Initiatives will lead programs, including the National Astronomy Consortium and Physics Inspiring the Next Generation, to empower under-represented students to obtain graduate degrees in STEM fields. The enhanced International/National Exchange Program and Chilean Women Graduate Internships will support international student research in radio astronomy. A key AUI objective for the NRAO workforce in the coming decade will be to move toward achieving parity with the nation's demographics for women and people of color.

AUI embraces an integrative approach to education and public outreach (EPO), closely aligned with NRAO research. The EPO plan builds on a comprehensive suite of programs, targeting learners of all ages, broad geographic regions, and traditionally under-represented groups, and incorporates federal STEM education initiatives that identify evidence-based best practices. NRAO will support graduate and undergraduate research, and Jansky postdoctoral fellows will carry out investigations independently, or in collaboration with staff and/or university collaborators, thus building professional relationships between NRAO and academic research groups.

As part of AUI's management and oversight of NRAO, AUI will regularly review the technical, financial, and administrative functioning of NRAO as well as AUI's own governance and business practices. Key Performance Indicators and both qualitative and quantitative assessments will inform NRAO activities and AUI policies to achieve optimal management and operation of NRAO.
Quantum Mechanical Modeling of Major Mantle Materials 8/1/2014 7/31/2017
Geophysics is currently undergoing
Geophysics is currently undergoing a transformation with the integration of three distinct modeling fields: computational mineral physics, geodynamics, and seismic tomography. Cyberinfrastructure is enabling a leap in computational capability and is helping to produce huge amounts of data on mineral properties very quickly. Advances in seismic imaging of the Earth's deep interior are providing structural information about convective and thermal patterns in the Earth's mantle. Several fascinating structures holding keys to the nature of the deep Earth are currently being mapped in detail. They are being interpreted within geodynamically consistent scenarios that include detailed properties of Earth forming minerals. Computational mineral physics, a field that evolved from the materials simulation revolution of the late eighties and nineties, helps to integrate these fields by contributing data on realistic mineral properties at extreme conditions of Earth's interior. This project focuses on the synergy between mineral physics and geodynamics. This research is establishing a new modus operandi in geophysics research, a trans-disciplinary dialog, and a global-scale modeling field that starts at the atomic scale. The emergence of this modeling phenomenon illustrates what could become typical in other scientific modeling fields, e.g., atmospheric and ocean science, astrophysics, materials processing, biological systems, etc.
This project will continue a productive line of inquiry in the area of computational mineral physics led by this team of researchers. The ultimate goals of the study is to provide information on mineral properties that are needed to interpret seismic tomography and bolster advanced and more refined geodynamics simulations. Computational mineral physics, in particular, has contributed greatly to the integration of these fields. Results from these type of modeling efforts complement experiments by expanding the pressure and temperature range in which properties can be obtained and offers access to atomic scale phenomena that is sometimes suggestive of new interpretations of experimental and seismological data. This project focuses on strengthening the synergy between computational mineral physics and geodynamics. Sophisticated state-of-the-art quantum mechanical simulations of minerals address key properties of Earth's solid mantle needed to improve the realism of geodynamics simulations. Thermal expansion, thermal conductivity, specific heat, thermodynamics phase boundaries in mineral aggregates, all from low temperatures (~ 0 K) to near melting temperatures can now be obtained reliably by means of high throughput calculations distributed in the Extreme Science and Engineering Development Environment (XSEDE). These results are to be integrated directly in simulations to investigate Earth's current state and evolution.
Molecular Sciences Software Institute (MolSSI) that you are proposing in response to the NSF Scientific Software Innovation Institutes (S2I2, NSF 15-553) solicitation 8/1/2016 7/31/2021
The Molecular Sciences Software Institute
(MolSSI) will become a focus of scientific research, education and scientific collaboration for the worldwide community of computational molecular scientists. The MolSSI aims to reach these goals by engaging the computational molecular science community in multiple ways to remove barriers between innovations that often occur in small single-researcher groups and the implementation of these ideas in software that is used in the production of science by the entire community. Thus, great ideas will not languish in the "just get the science right" mode, but be incorporated into usable software for the wider community to enable bigger and better molecular science. The MolSSI will catalyze significant advances in software infrastructure, education, standards, and best-practices. These advances are critical because they are needed to address the next set of grand challenges in molecular science. Activities catalyzed by the Institute will improve the interoperability of the software used by the community, make easier the use of this software on the varied and heterogenous computing architectures that currently exist, enable greater scalability of existing and emerging theoretical models, as well as substantially improving the training of molecular-science students in software design and engineering. Through the range of outreach efforts by its multiple institutions, the MolSSI will engage the community to increase the diversity of its workforce by more effectively attracting and retaining students and faculty from underrepresented groups. All of these endeavors will result in fundamentally and dramatically improved molecular science software and its usage, that will reduce or eliminate the current delays - often by years - in the practical realization of theoretical innovations. Ultimately, the Institute will enable computational scientists to more easily navigate future disruptive transitions in computing technology, and most importantly, tackle problems that are orders of magnitude larger and more complex than those currently within their grasp and to realize new, more ambitious scientific objectives. This will accelerate the translation of basic science into new technologies essential to the vitality of the economy and environment, and to compete globally with Europe, Japan, and other countries that are making aggressive investments in advanced cyber-infrastructure.
The MolSSI aims to reach these goals by engaging the computational molecular science community in multiple ways to remove barriers between innovations that often occur in small single- principle investigator groups and the implementation of these ideas in software that is used in the production of science by the entire community. The MolSSI will create a sustainable Molecular Sciences Consortium that will develop use cases and standards for code and data sharing across the software ecosystem and become a focus of scientific research, education and scientific collaboration for the worldwide community of computational molecular scientists. The Institute will create an interdisciplinary team of Software Scientists who will help develop software frameworks, interact with community code developers, collaborate with partners in cyber-infrastructure, form mutually productive coalitions with industry, government labs, and international efforts, and ultimately serve as future experts and leaders. In addition, the Institute will support and mentor a cohort of Software Fellows actively developing code infrastructure in research groups across the U.S., and, in turn, they will engage in MolSSI outreach and education activities within the larger molecular science community. Through a range of multi-institutional outreach efforts, the Institute will engage the community to increase the diversity of its workforce by more effectively attracting and retaining students and faculty from underrepresented groups. The Institute will educate the next generation of software developers by providing workshops, summer schools, on-line forums, and a Professional Master's program in molecular simulation and software engineering. MolSSI will be guided by an internal Board of Directors and an external Science and Software Advisory Board, both comprised of leaders in the field, who will work together with the Software Scientists and Fellows to establish the key software priorities. MolSSI will be sustained by a mix of labor contributed by the community, revenue from education programs and license revenues. In summary, the MolSSI's ultimate impact will be in the translation of basic science into future technological advances essential to the economy, environment, and human health.
Science Gateways Software Institute for NSF Scientific Software Innovation Institutes (S2I2, NSF 15-553) solicitation 8/1/2016 7/31/2021
Science gateways are user-friendly web portals
that make advanced computing, data, networking and scientific instrumentation accessible and easily usable by scientists at all levels, including students, thereby revolutionizing how research and education is done in science. For example, scientists are conducting biomedical studies through Galaxy, a science gateway for data intensive biomedical research, as well as engaging citizens in investigating lion density using Snapshot Serengeti, a science gateway for citizen science. By being easily accessible via the Web, science gateways expand and democratize access to supercomputers, telescopes, sensor networks, unique data collections, collaborative spaces that enable the multidisciplinary collaborations needed to solve complex problems, and analysis capabilities. Thus, science gateways expand and broadening participation in science - an important goal of the National Science Foundation (NSF). By increasing participation, science gateways increase the NSF's return on investment in advanced technologies and facilities. The Science Gateways Community Institute (SGCI) will speed the development and application of robust, cost-effective, sustainable gateways to address the needs of scientists and engineers across the sciences. The work of the institute will increase the number as well as the effectiveness and usability of gateways to science and engineering. This will result in broader gateway use and more widespread conduct of science ranging from professionals to citizen scientists, thus, directly amplifying the impact of the SGCI. Further, and very importantly, the Institute's community engagement and exchange activities will, over time, increase the audience for its services, and its partnerships with minority professional organizations will ensure involvement in training and workforce development from underrepresented groups.
Science gateways are user-friendly web portals that make advanced computing, data, networking and scientific instrumentation accessible and easily usable by scientists at all levels, including students, thereby revolutionizing how research and education is done in science. Gateways enable scientists to test their assumptions more quickly, providing them more time for deeper thinking about the types of problems that have yet to be solved. In this way, gateways become "research amplifiers". They also enable synthetic science - by using modelling and simulation tools powered by high-performance computing - across ecosystems, geographic distances, methodologies, and disciplines. However, and despite the presence of gateways for many years, development of these environments is often done with ad-hoc processes, limiting success, resource efficiency, and long-term impact. Developers of gateways are often unaware that others have solved similar challenges before, and do not know where to turn for advice or expertise. Thus, projects waste money and time re-implementing the more basic functions rather than building the value-added features for their unique audience. Many gateway efforts fail. Some fail early by not understanding how to build communities of users; others fail later by not developing plans for sustainability. The Science Gateways Community Institute (SGCI) has been designed to address the above limitations while providing career paths for gateway developers and for students. The five-component design of the SGCI is the result of several years of studies, including many focus groups and a 5,000-person survey of the research community. Its Incubator component will provide shared expertise in business and sustainability planning, cybersecurity, user interface design, and software engineering practices. The Extended Developer Support component will provide expert developers for up to one year to projects that request assistance as well as demonstrating the potential to achieve impacts on their research communities. The Scientific Software Collaborative component will offer a component-based, open-source, extensible framework for gateway design, integration, and services, including gateway hosting and capabilities for external developers to integrate their own software into Institute offerings. The Community Engagement and Exchange component will provide a forum for communication and shared experiences among gateway developers, within NSF, across federal agencies, and internationally. Finally, with its training programs the Workforce Development component will increase the pipeline of gateway developers, with special emphasis on recruiting underrepresented minorities, and by helping universities form gateway support groups. In short, the work of the institute will increase the number, ease of use, and effective application of gateways to science and engineering, resulting in broader gateway use and more widespread conduct of science ranging from professionals to citizen scientists. The Institute's community engagement and exchange activities over time will increase the audience for its services, and its partnerships with minority professional organizations will ensure involvement in training and workforce development from underrepresented groups.
CC* Networking Infrastructure: Building HPRNet (High-Performance Research Network) for advancement of data intensive research and collaboration 3/1/2017 2/28/2019
The sizes of scientific datasets are growing
exponentially across all scientific disciplines due to several factors such as improved scientific instrumentation, social media and decreasing costs of storage. To extract real value from these geographically distant datasets, researchers need to have access to these datasets at high speeds which is typically not possible with traditional campus networks. The University of Illinois at Chicago (UIC) is building "HPRNet", a high performance research network providing last mile connectivity for over 31 research projects. HPRNet not only improves the ongoing research productivity, but also sets the stage for future innovations and collaborations. UIC is a public university and minority serving institution (MSI) in the heart of Chicago area where HPRNet significantly impacts the research training of underrepresented groups. The project team is working with other NSF and institutionally funded minority training programs on campus to ensure access to HPRNet resources.
For HPRNet's deployment, 13 locations are identified at UIC where 10 to 40 Gigabit uplinks to regional, national and international R&E networks are established. HPRNet builds on the Science DMZ model that works in concert with the current campus research network (CRN) and a special data storage system known as Data Transfer Node (DTN) to deliver high-performance and reliable network paths for data-intensive applications, including high-volume bulk data transfer, remote experiment and/or instrumentation control, cloud computing, data-mining and advanced visualization.
Collaborative Research: SI2-SSI: Adding Volunteer Computing to the Research Cyberinfrastructure 8/1/2016 7/31/2017
The aggregate computing power of consumer devices
- desktop and laptop computers, tablets, smartphones - far exceeds that of institutional computing resources. "Volunteer computing" uses these consumer devices, volunteered by their owners, to do scientific computing. In addition to providing additional, much-needed computational resources to scientists, volunteer computing publicizes scientific research and engages citizens in science. BOINC is the primary software system for volunteer computing. It was developed at UC Berkeley with NSF support starting in 2002. Until now, BOINC has been based on a model of independent competing projects. Scientists set up their own BOINC servers, port their applications to run on BOINC, and publicize their projects to attract volunteers. There are about 40 such projects, in many areas of science: examples include Einstein@home, CERN, and SETI@home (astrophysics), Rosetta@home and (biomedicine), (climate study), and IBM World Community Grid (multiple applications). Together these projects have about 400,000 active volunteers and 12 PetaFLOPS of computing throughput. This model, while successful to an extent, has reached a limit. The number of projects and volunteers has stagnated. Volunteer computing is supplying lots of computing power, but only to a few research projects. For other scientists, there are two major barriers. First, creating a BOINC project has significant overhead: learning a new technology, creating a public web site, generating publicity, and so on. Second, volunteer computing is risky and uncertain; there is no guarantee that a new project will attract volunteers. This project aims to break this barrier, and to make volunteer computing available to all scientists doing high-throughput computing, by replacing the competing-projects model with a new "central broker" model. The new model has two related parts: 1) the integration of BOINC with existing high-throughput computing facilities such as supercomputing centers and science portals. Jobs currently run on cluster nodes will be transparently offloaded to volunteer computers. Scientists using these facilities will see faster turnaround times; they'll benefit from volunteer computing without even knowing it's there. 2) The project will change the volunteer interface so that participants sign up for scientific areas and goals rather then for particular projects. For example, a participant might sign up to contribute to cancer research. A central broker, to be developed as part of this project, would dynamically assign their computing resources to projects doing that type of research. This project mobilizes public support for and interest in scientific research by encouraging "volunteer computing" and engaging citizens in the conduct of the research itself. It simultaneously advances NSF's mission to advance science while broadening citizen engagement.
The first year of this project will prototype each of these parts, and will integrate BOINC with TACC and nanoHub. Integrating BOINC with existing HTC systems involves several subtasks: 1) Job routing: modifying existing job processing systems used by TACC and nanoHub (Launcher and Rappture respectively) to decide when a group of jobs should be offloaded to BOINC. This decision might involve the estimated runtime of the jobs, input and output file sizes, data sensitivity, the deadline or priority of the jobs, and the identity of the job submitter. 2) Job format conversion: mapping job descriptions (input/output file specifications, resource and timing requirements) to their BOINC equivalents. 3) Application packaging: adapting existing applications (such as nanoHub's simulation tools and TACC's Autodock) to run under BOINC. We will use BOINC's virtual machine facility, which packages an application as a virtual machine image (VirtualBox or Docker) and a program to be run within the VM. This allows existing Linux applications to run on consumer desktop platforms such as Windows and Mac, as well as providing a strong security sandbox and an efficient application-independent checkpoint/restart mechanism. 4) File handling: moving input and output files between existing storage systems (typically inaccessible from outside firewalls) to Internet-visible servers. This will use existing BOINC components that manage files based on hashes to eliminate duplicate transfer and storage of files. 5) Job monitoring and control: adapting existing web- or command-line based tools for monitoring the progress of batches of jobs, and for aborting jobs, to work with BOINC. This will use existing Web RPCs provided by BOINC for these purposes. This project will carry out these tasks by designing and implementing new software as needed, testing for correctness, performance, and scalability, and deploying it in a production environment. The second part of the project - a brokering system for allocating computing power based on volunteer scientific preferences - will be designed and prototyped. This involves several subtasks: 1) Designing a schema for volunteer preferences, including scientific areas and sub-areas, project nationality and institutions, specific projects and applications, inclusions/exclusions, and so on. 2) Designing a schema for assigning attributes to job streams (e.g. their area, sub-area, institution, etc.), and for assigning quotas or priorities to job streams. 3) Designing a relational database for storing the above information. 4) Designing and implementing policies for assigning volunteer resources to job streams in a way that respects volunteer preferences and optimizes quota, fairness, and throughput criteria. This will be implemented as a BOINC "account manager" so that volunteers see a single interface rather than lots of separate projects and web sites.
CC* Compute: BioBurst in response to the Campus Cyberinfrastructure (CC*) Program solicitation (NSF 16-567) 2/1/2017 1/31/2018
The goal of the project is to 
is to deploy the BioBurst system to enhance the high performance computing capabilities at the University of California, San Diego, with technology designed to accelerate biological and life sciences research. The last few years have seen revolutionary advances in sequencing instruments for decoding genetic materials such as deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). DNA and RNA carry the genetic code and control the production of proteins essential to life. A byproduct of this revolution in DNA/RNA sequencing technology is the production of vast amounts of data that must be stored and analyzed in order to achieve scientific progress. The manner of conducting these analysis stresses existing research computing systems in ways that must be overcome in order to expand the scope of investigations and reduce the time to results. The BioBurst system aims to augment the campus research computing system with innovative technology to speed up both data access and computation on DNA/RNA sequence data. A better understanding of DNA and RNA has the potential for advancing our Nation's health and well-being, enabling applications such as new insights into the biological mechanisms causing disease, and the development of new biofuels and agriculture products.
The technical goal of the project is to implement a separately scheduled partition of the existing campus research computing system with technology designed to address important classes of bioinformatics computing including genomics, transcriptomics, and immune receptor repertoire analysis. The BioBurst system will incorporate the following major components: (1) I/O acceleration appliance with 40 terabytes of non-volatile memory and software designed to alleviate the small-block/small-file I/O problem characteristic of many bioinformatics codes; (2) An FPGA-based computational accelerator node that has been demonstrated to perform demultiplexing, read mapping, and variant calling of complete human genomes in 22 minutes; (3) 672 commodity computing cores which will access the I/O accelerator and provide a separately scheduled resource for running bioinformatics applications; (4) integration with a large scale parallel file system, which supports streaming I/O and has the capacity to stage large amounts of data associated with many bioinformatics studies; and 5) customization to the job scheduler to accommodate bioinformatics workflows, which can consist of hundreds to thousands of jobs submitted by a single user at one time. These components will be integrated as a partition of the existing production research computing system, providing a unique and highly usable resource by researchers across campus. A key objective is to provide bulk computing capacity to conduct in the order of 8,000 whole-genome analyses per year plus the ability for quick turnaround (less than 60 min.) single-genome analyses, and sufficient solid state disk (SSD) capacity for staging associated working sets (200GB - 1TB).