Sponsors

Sponsors

Abstracts

Abstracts are organized by day and track, in chronological order.

Tuesday, June 13 Abstracts


Education and Outreach

Zecosystem—Cyberinfrastructure-enabled learning ecosystems of 2020
K. Madhavan, G. Bertoline, S. Goasguen, M. Vorvoreanu

Most people spend their lives without ever seriously becoming aware of the complex technological infrastructure driving and supporting their everyday lives. The students and faculty members at most middle and high schools, colleges, and universities are no exception to this observation. Cyberinfrastructure and middleware are pervasive in almost every aspect of modern life. It is present in the increasing pervasiveness of mobile phones we use to communicate, the ever-increasing number of broadband home users, in the climbing number of high definition (HD) television sets that are being sold, and is present in large proportions even in the cars we drive. Advances in science, engineering, socio-cognitive understanding, technology have launched the human race headlong into a revolution where the boundary between how information is constructed, stored, accessed, delivered, and understood has blurred. One critical characteristic of this impending revolution is that the time lapse between construction of data to delivery of significant scientific insights is steadily (and in many cases exponentially) decreasing. What is more breathtaking about this infant revolution is that the next generation of scientists—the so-called millennium generation or Generation-Z—currently in middle and high schools or just beginning college is accelerating this revolution to warp speed. Through the innovative use of emerging communication technologies, cyberinfrastructure-enabled social networks, and by thriving in a world that demands tremendous cognitive flexibility students are sending a clear signal for the need of the educational system to reflect this on-going revolution. Yet, a significant part of our pedagogical approaches remain immune and continue to have their foundations in the pre-information age—if not in the pre-industrial era.

This paper presents a vision for a cyberinfrastructure enabled learning ecosystem—entitled Zecosystem—a framework for a cyber-infrastructure enabled learning ecosystem that is built on the fundamental premise that learning experiences of the future will be multi-sensory, engage multiple technologies and significant computational power invisibly, continuously, and will be completely engaging. We define Zecosystem as the complementary convergence framework of pedagogy, cyberinfrastructure, emerging STEM area, and social-behavioral sciences to create a student lifestyle oriented neural learning environment. In this paper, we will discuss in-depth how many of the key cyber-services offered on the nanoHUB are informed by this vision.

go to top of page
MSI-CIEC: MSI Cyberinfrastructure Empowerment Coalition and the TeraGrid
G. Fox

The Minority-Serving Institution (MSI) Cyberinfrastructure (CI) Empowerment Coalition, MSI-CIEC, is established to accelerate the advancement of e-science and CI, the development of a diverse CI-related science and engineering workforce, and to broaden access, participation, and appreciation for CI and e-science, particularly among traditionally underrepresented minority populations. The vision of MSI-CIEC is to advance science, technology, engineering and mathematics (STEM) and the participation of the nation's underrepresented minorities in STEM, particularly e-science, and in the global STEM workforce through minority-serving institutions (MSIs) and the emerging Cyberinfrastructure (CI). This defines a mission to build and enhance the social and technological mechanisms for meaningful engagement of MSIs in cyberinfrastructure (CI). That is, to develop the CI "middleware" resource to encourage, broker, enable and manage meaningful CI initiative and MSI collaborations of mutual benefit for the use, support, deployment, development, and design of CI to enable the advancement of e-science research and education unlike ever before, and the development of the nations divers'e science, technology, engineering and mathematics (STEM) workforce, including the current and next generation of the STEM professoriate in an increasingly diverse society. MSI-CIEC exploits the virtualization and global integration features of CI as a democratizing force that can offer leading edge STEM involvement to all.

MSI-CIEC is a virtual organization (using Grid terminology) under the Alliance for Equity in Higher Education. This ensures its work will have systemic impact on at least 335 Minority Serving Institutions covered by the Hispanic Association of Colleges and Universities, the National Association for Equal Opportunity in Higher Education, and the American Indian Higher Education Consortium). MSI-CIEC is envisaged as largely aimed at supporting the community interested in CI-involvement of MSIs and that it will lead a few projects but also provide a scalable implementation of its mission by advising relevant projects led by others. The MSI-CIEC initial project is the Minority-Serving Institutions Cyberinfrastructure Institute (MSI CI2) funded by the NSF CI-Team program as an initial planning and information dissemination activity. This has worked with MSI and CI leaders to identify challenges, opportunities and success stories so as to prepare a pathway forward. We identified some critical features of our future work including:

We will present details of our current and planned activities and how they interact with TeraGrid.

go to top of page
Broadband access and digital equity: the international picture
B. Bracey

High-speed Internet connections now provide users in Europe and North America with a speed and ease of access to entertainment, information and products that is rapidly transforming society. The same technologies could potentially help tackle development goals in many countries of the South—building new markets, and contributing to education and health care. Yet, for many countries in Africa, Asia and Latin America—especially the poorer countries ? this remains an elusive goal, because they have no way to provide the infrastructure. This presentation will provide an overview of the international digital divide, give some examples of cutting edge projects to promote digital equity and suggest ways in which they can be taken to scale.

go to top of page
SCEC Earthworks Science Gateway: Widening SCEC Community Access to the TeraGrid
J. Muench, P. Maechling, H. Francoeur, D. Okaya, Y. Cui

The SCEC Earthworks Science Gateway is designed to allow members of the SCEC geoscience community to perform sophisticated, computationally-intensive, geophysical research using TeraGrid resources, even if they have no prior experience with high performance computing. The SCEC Earthworks Science Gateway allows users to configure and execute earthquake wave propagation simulations using well validated geophysical models and high performance simulation software. The SCEC Earthworks system generates a series of data sets including surface seismograms and ground motion maps. It also interfaces with the Incorporated Research Institutions in Seismology (IRIS) Data Handling Interface (DHI) which provides the system with access to observed data include earthquake catalogs and seismograms.

Users access the SCEC Earthworks system through a web-based portal built using the GridSphere Portlets engine. Using a portlet-based interface, users can configure, submit, and monitor, wave propagation simulations. They can also access the resulting simulation data products. The portlets allow users to browse simulation data products, save configurations, and share simulations results with other users.

All steps in the wave propagation simulations including mesh generation, wave propagation, and post processing are run using a grid-based workflow system based on the Virtual Data System (VDS), the Pegasus meta-scheduler system, and the Globus toolkit. These workflow tools perform the backend steps of registering data with a RLS (Replica Location Service) and building, submitting, and monitoring workflows. The metadata for the resulting data products are registered within a MCS (Metadata Catalog Service).

go to top of page
TeraGrid User Portal: An Integrated Interface for TeraGrid User Information & Services
E. Roberts, M. Dahan, J. Boisseau

The TeraGrid comprises many heterogeneous systems that enable high performance computing, data management/storage, and scientific visualization, and access to scientific data collections. TeraGrid uses advanced networking systems, software technologies, and operations and support activities to tie these fundamental resources together into a production cyberinfrastructure for research. Allocations, accounting, security, resource monitoring, consulting, and documentation are among the services that each TeraGrid Resource Provider (RP) implements to meet their site-specific needs while operating within the TeraGrid environment. The purpose of the TeraGrid User Portal is to serve as a launch-pad for new users and a control panel for current users by integrating all of these resources, services, and information into a single web interface serving a national community of computational researchers. The first version of the portal addresses the most fundamental issue in integrating TeraGrid resources: extension and integration of existing, centralized TeraGrid accounting and security services, including authentication, monitoring allocation usage, managing resource provider accounts, and registering distinguished names through a clear and comprehensive portal interface. In addition to these core services, the portal provides simple access to existing TeraGrid user-centric information and services such as documentation, consulting, allocation request and renewal, and resource monitoring. This paper discusses the challenges and presents the approach used for integrating the existing accounting and security services across RPs while ensuring robustness and scalability. The initial user services and information interfaces are also discussed, followed by a plan for adding new features in future releases including user customization, interactive capabilities, and additional information services.

go to top of page
Supporting "Big Humanities": An Introduction to HASTAC
A. Balsamo

HASTAC: Humanities Arts Science Technology Advanced Collaboratory includes several large-scale humanities computing projects. This presentation will provide a brief overview of key projects and new directions in humanities-based research that involves new cyberstructural projects.

go to top of page
iShare - Bringing the TeraGrid to the User's Desktop
A. Basumallik, X. Ren, R. Eigenmann, S. Goasguen

iShare is an Internet-Sharing system that supports end-users as well as providers of computing resources (applications, data and hardware). iShare allows providers to disseminate resources and users to access these resources in a way that allows open participation. A fully decentralized organization for resource dissemination is enabled via the integration of a peer-to-peer (P2P) system and web standards such as XML and RDF. iShare has an open extensible architecture that allows different access mechanisms and protocols to be plugged in. It delivers a desktop-based environment for publishing and using remote resources, which decouples the computing environment perceived by end users from the underlying physical platforms.

This paper describes the iShare plug-ins, implemented using the Java Commodity Grid (CoG) Kit, that enable TeraGrid resources to be shared and used through the iShare desktop. TeraGrid resources are disseminated through the publication and discovery functionalities that are a part of iShare. For providing users access to TeraGrid resources, iShare allows certificate based authentication, remote job submission using GRAM and file transfers using GridFTP. Additionally, an SRB plug-in also allows users to access data collections across distributed and heterogeneous platforms. Together, these plug-ins enable the end-user to effectively discover and use TeraGrid resources from a desktop.

go to top of page

Science Impact

SCEC/CME CyberShake: Calculating the Probability of Strong Ground Motions on the TeraGrid
P. Maechling, L. Zhao, Y. Cui, E. Deelman, T. Jordan

Researchers from the SCEC Community Modeling Environment (SCEC/CME) project are utilizing the TeraGrid to calculate physics-based probabilistic seismic hazard curves for several sites in the Southern California area. Traditionally, probabilistic seismic hazard analysis (PSHA) is conducted using intensity measure relationships based on empirical attenuation relationships. However, a more physics-based approach using waveform modeling could lead to significant improvements in seismic hazard analysis. Members of the SCEC/CME Project have integrated leading-edge PSHA software tools, SCEC-developed geophysical models, validated anelastic wave modeling software, state-of-the-art computational technologies, and TeraGrid computational and storage resources to calculate several probabilistic seismic hazard curves using 3D waveform-based modeling.

The CyberShake calculations require tens of thousands of CPU hours and multiple terabytes of disk storage for each site. The CyberShake workflows were run on high performance computing systems including multiple TeraGrid sites (currently SDSC and NCSA), and the USC Center for High Performance Computing and Communications. To manage the extensive job scheduling and data requirements, CyberShake utilizes a grid-based scientific workflow system based on the Virtual Data System (VDS), the Pegasus meta-scheduler system, and the Globus toolkit.

In this talk we will discuss the importance of improving probabilistic seismic hazard analysis techniques and the computational techniques that SCEC scientists are bringing to this importance task. We will also outline the grid-based scientific workflow system developed on the SCEC/CME that was used in the CyberShake simulations and discuss our experiences running the CyberShake workflows on the TeraGrid.

go to top of page
SCEC TeraShake Simulations: High Resolution Simulations of Large Southern San Andreas Earthquakes Using the TeraGrid
K. Olsen, S. Day, B. Minster, T. Cui, P. Maechling

Researchers working on the SCEC Community Modeling Environment (SCEC/CME) Project have carried out some of the largest and most detailed earthquake simulations completed to date (TeraShake), in which we model ground motions expected from a large earthquake on the southern San Andreas Fault on parallel supercomputers using TeraGrid facilities at the San Diego Supercomputer Center (SDSC) and National Center for Supercomputer Applications (NCSA).

The TeraShake calculations simulate 4 minutes of 0-0.5 Hz ground motion in a 180,000 km2 area of southern California, at 200m resolution, for a M 7.7 earthquake along the 199 km section of the San Andreas fault between Cajon Creek, and Bombay Beach. The average recurrence interval for large earthquakes with surface rupture on these segments is approximately 200 years, suggesting these segments are due to produce a large rupture. The TeraShake simulations include ruptures propagating both northwest-ward and southeast-ward on the fault.

The results show that the chain of sedimentary basins between San Bernardino and downtown Los Angeles form an effective waveguide that channels Love waves along the southern edge of the San Bernardino and San Gabriel Mountains. Earthquake scenarios in which the guided wave is efficiently excited (Scenarios with northward rupture) produce unusually high long-period ground motions over much of the greater Los Angeles region. Intense, localized amplitude modulations arising from variations in waveguide cross-section can be explained to a remarkable level of accuracy in terms of energy conservation for the guided mode.

go to top of page
Design and Implementation of Services for a Synthetic Seismogram Calculation Tool
D. Seber, C. Youn, T. Kaiser, C. Santini, T. Bollman

We have built a user environment that simplifies and provides interactive access to distributed data sources and TeraGrid computational services to study and model earthquake waveforms utilizing realistic 3D Earth models. These data and computational environments are implemented as part of the GEON (Geosciences Network — www.geongrid.org) portal and built using a service-oriented architecture. The system is built in a way that users do not have to have experience runnig large parallel applications. GEON users can logon to the portal, start the application (SYNSEIS—SYNthetic SEISmogram generation tool) and construct their models and run their jobs on the GEON and/or TeraGrid resources. The application is built primarily for seismologists to calculate realistic 3D regional seismic waveforms using a well-tested, finite difference code, e3d which was developed by the Lawrence Livermore National Laboratory (S. Larsen). This system is also designed to be used in day-to-day activities of researchers; especially EarthScope scientists (www.earthscope.org) who will be accessing seismic data from hundreds of stations everyday and need to process the data in a timely fashion. We developed the user-friendly interface of SYNSEIS using Macromedia Flash MX. Within the user interface, we provide several components: an interactive mapping tool, event/station/waveform extraction tools that allow users to seamlessly access remote archives, and easily submit job to selected TeraGrid sites. A job monitoring service is also built using a very basic service component. Users are easily able to check the current job status from the queuing system through the Web service client APIs.

go to top of page
Location of Earthquakes in Three-dimensional Media Using a Divide and Conquer Method
G. Pavlis, P. Wang, F. Vernon

We describe a new approach to the large-scale relocation of seismic catalogs based on a variation of conventional multiple event location methods. We associate every event in a catalog with one or more points in a 3D grid of control points. Events associated with each point are located together with an implementation of the Progressive Multiple Event Location (PMEL) algorithm. PMEL estimates a set of path corrections from a given control point to each seismic station along with a set of revised locations. In the estimation we use a set of matrix projectors that allow the unresolvable (absolute) location bias to be extracted from a 3D reference model and project only components of information the available data constrain. This allows us to construct empirically derived travel time correction fields in 3D that are highly data adaptive; the density of control points can be very high where seismicity is intense and low where earthquakes are rare. The 3D model is used only to fill in gaps between control points and correct smoothly varying, bias problems. The algorithm is highly parallel and we found major improvements in performance were possible on massively parallel computers. To date we have applied this algorithm to data from the Anza network in southern California, the University of Alaska network in southeastern Alaska, and a temporary array deployment in central Asia. Improvements in data fit scale with the aperture of the network with the smaller aperture Anza data showing a ten-fold improvement in data fit compared to only about a factor of 2 for other other data sets. This difference is attributed to measurement precision. The Anza catalog is dominated by impulse, local P and S phases while the other data sets dominated by emergent, regional phases that are difficult to time by conventional methods. We are dealing with this problem through a novel correlation method that exploits waveform similarity for located near each other in space.

go to top of page
Use of the TeraGrid for High-Performance Subsurface Modeling and Oil Discovery
B. Rutt, T. Kurc, U. Catalyurek, J. Saltz

The main objective of oil reservoir modeling is to understand the reservoir properties and predict oil production to optimize return on investment from a given reservoir, while minimizing environmental effects. An important challenge is to develop accurate models and mechanisms to search a large space of oil production and reservoir parameters. This process is very time consuming and data intensive, since simulation studies evaluate thousands of scenario using complex models, resulting in multi-Terabyte datasets.

The first step in the generation of these simulation studies was identifying resources that could support a large volume of simulation data. Our 3-D seismic simulation model consisted of a total of 30 TB of data when generated. By dividing the data across several TeraGrid sites, we were able to allocate enough space to store the data and balance the computational load. In addition, we ran an oil reservoir simulation which modeled the impact of injecting water and harvesting oil from an oilfield. Once the data generation phase was complete, we queried into the data sets using tools such as VTK for visualization of oil fields, and our STORM infrastructure for query operations into the data files. By running queries simultaneously across multiple sites, we were able to achieve 3 GB/sec of simultaneous I/O bandwidth against seismic data files. We also used a tool that performed optimization functions to find the optimal well placement in an oilfield. In this paper, we will discuss in more detail how we implemented this application using the TeraGrid infrastructure.

go to top of page
The Earth System Grid and TeraGrid
D. Middleton, L. Cinquini, I. Foster, D. Williams, D. Bernholdt

Increasingly, climate change research is data intensive, involving the analysis and intercomparison of simulation and observation data from many sources. Continued scientific progress depends upon powerful, effective enabling technologies that allow the core climate science community to coherently manage and publish a diverse collection of what in a few years will be petascale data, such that a broad, global community can access and analyze it. The Earth System Grid (ESG) is a U.S. Department of Energy (DOE) Scientific Discovery Through Advanced Computing (SciDAC) project aimed at making climate simulation data easily accessible to a global climate research community.

We have developed and deployed the ESG system, which now has 2500 registered users and manages 160 TB of data in archives distributed around the nation. From this past year alone, more than 200 scientific journal articles have been published from analyses of data delivered by the ESG. NCAR's emerging participation in the U.S. National Science Foundation's TeraGrid effort opens up important new opportunities for ESG and climate research. There are several scientific drivers to be considered here, ranging from short-term to long-term:

  1. The DOE Climate Science Computational End Station at the Leadership Computing Facility at Oak Ridge National Labs (ORNL) will support large-scale simulation of the Earth's climate and the data production will be sizable.
  2. ESG is evolving from a Grid-based distributed data system into a science gateway, with new interfaces that support web-based instantiation, tracking, and diagnosis of modeling activities.
  3. The next major report of the Intergovernmental Panel on Climate Change (IPCC) in 2010 will require major modeling activities at sites around the world, including some on the TeraGrid (e.g. NCAR, ORNL).

This presentation will address ESG as it exists today, how ESG and TeraGrid will complement one another to advance scientific research, and an overview of our plans for the future.

go to top of page
Using PLAPACK and MPICH-G2 to Grid-Enable Bayesian Geostatistical Models
W. He, S. Wang, J. Yan, M. Cowles, M. Armstrong

Bayesian geostatistical models enable statistical analysis and prediction based on data measured at irregularly-spaced geographic locations. The Markov chain Monte Carlo (MCMC) methods needed to fit these models require linear algebra operations that are computationally intensive when the number of measurement locations is large. Because an MCMC sampler typically must be run for thousands of iterations, each requiring numerous operations, the run-time for sequential Bayesian algorithms quickly becomes prohibitive. Even with parallel MCMC algorithms running on single clusters, run-times may be unacceptable, especially when large geographic datasets are analyzed. The TeraGrid provides an ideal platform to develop parallel MCMC algorithms for Bayesian geostatistical models by taking advantage of dynamically configurable Grid resources.

This paper presents a parallel algorithm based on MPICH-G2 on the TeraGrid using PLAPACK for a class of Bayesian geostatistical models. The algorithm is designed to be scalable to the TeraGrid by exploiting its high-end network capabilities as well as using the parallel Cholesky decomposition function available from PLAPACK.

The scalability of the algorithm is examined by varying physical block sizes and the number of processors used to compute large-scale linear algebra operations. Our algorithm achieves scalable speedups on single TeraGrid clusters. Ongoing experiments are being conducted to verify whether this level of scalability can be sustained when multiple TeraGrid clusters are used.

go to top of page

Technology

The OptIPuter: A National and Global-Scale Cyberinfrastructure for Enabling LambdaGrid Computing
M. Brown, L. Smarr, T. DeFanti, J. Leigh, M. Ellisman, P. Papadopoulos

To facilitate the interactive visualization, analysis, and correlation of massive amounts of data from multiple sites, the NSF-funded OptIPuter project is designing a powerful distributed cyberinfrastructure to support data-intensive scientific research and collaboration. This research exploits a new world in which the central architectural element is optical networking, not computers. This transition is caused by the use of parallelism, as in supercomputing a decade ago. However, this time the parallelism is in multiple wavelengths of light, or lambdas, on single optical fibers, creating supernetworks. Dedicated 1- to 10-Gigabit deterministic network connections are being deployed internationally by the Global Lambda Integrated Facility (GLIF), nationally by the National LambdaRail (NLR), regionally by academic consortia, and locally on campuses, connecting scientists' laboratories to collaborators and/or data sources all over the world, providing researchers with guaranteed bandwidth for data movement, guaranteed latency for visualization/collaboration and data analysis, and guaranteed scheduling for remote instrument control. Bandwidth alone isn't the solution; the OptIPuter is working on new grid-computing paradigms − that is, new middleware, transport protocols and optical signaling, control and management software − to enable applications to dynamically manage lambda resources just as they do any grid resource, creating a Lambda-Grid of interconnected high-performance computers, data storage devices, and instrumentation. This paper summarizes some of the OptIPuter's developments over dedicated end-to-end lightpaths among partner sites in San Diego, Chicago and Amsterdam.

go to top of page
Northwest Indiana Computational Grid (NWICG)
G. Bertoline, C. Hoffmann, N. Bohlmann, D. Sharp, D. Latimer

NWICG is a partnership of researchers and educators at Purdue University-West Lafayette (Purdue), Purdue University-Calumet (Calumet), and the University of Notre Dame, that couples mutual interests among the three campuses with national science and research initiatives, builds a cyber-infrastructure that supports the solution of breakthrough level problems, and enables continuing world-class advances in the underlying technologies of high performance computing.

Our approach begins with building a scalable, high speed, high bandwidth, science-driven computational grid for Northwest Indiana across the three universities in partnership with the Department of Energy's Argonne National Laboratories. This connectivity will build on existing collaborations between the three universities and DOE, and will leverage federal investments made by the National Science Foundation in its TeraGrid program of which Purdue is a funded member, and in the recent commissioning of "dark fiber" assets between Notre Dame and Argonne.

Available network, computational, storage, visualization, and data resources, as well as specialized facilities, will be expanded, integrated, and shared among the partnering institutions, thereby providing opportunities to energize and connect the research communities, and to promote new scientific knowledge environments in ways not previously possible.

On the infrastructure side, we plan to leverage existing middle ware such as iShare, Globus, and SRB. We are committed to simplifying the access to resources for the partners and making the distributed nature of the resources largely transparent. This includes addressing nontraditional issues such as ownership of physical resources and liabilities among the institutions.

Some of the initial research projects funded by NWICG will be described. They include middleware prototyping and development, as well as applications research. For example, we will support distributed, data-driven processing of sensor data, such as state information of the electric power grid, as well as processing data from the CMS experiment in high-energy physics that is currently under construction and will deliver a torrent of data to be analyzed. Also included are high-performance computations, such as simulation of the digital nuclear reactor, global climate simulations, as well as simulating morphogenesis in biological organism development.

go to top of page
Application Hosting Services: Requirements and Architecture
I. Foster

The need to make application code accessible as a Web Service arises frequently in scientific applications. Depending on context, this apparently simple task can introduce a wide range of requirements, including interface generation, authorization of requests, generation of code to dispatch calls to application code, monitoring and management of tasks, data management, and dynamic mapping of application tasks to processors in respond to changing workloads. The resulting application hosting services can vary greatly in their architecture and complexity, depending the requirements(s) to be addressed, the form of the application code(s), the type of task(s) to be executed, and the workload(s) to be supported. Many groups are building relevant components and tools, but no one system meets all needs. With the goal of encouraging collaboration and communication, I review requirements for application hosting services, present an application hosting service architecture, and identify interfaces that we may wish to define to enable interoperability of different tools and systems. I also review existing approaches to building such services.

go to top of page
Automatic Co-Scheduling on the TeraGrid
D. Marcusiu, M. Margo, K. Yoshimoto, P. Kovatch

One of the challenges of harnessing the TeraGrid's cumulative compute power is the capability to schedule and execute parallel jobs across multiple resources at multiple sites simultaneously. In 2005, there were 75 requests for co-scheduled resources. Each one of these requests started with a TeraGrid ticket, user services processing and system administration handling. This careful planning and coordination of many people at multiple sites shows the need for an automatic capability. This paper will discuss the effort to implement an automated, production quality service to provide a co-scheduling capability to TeraGrid users that also allows the sites to achieve their individual scheduling goals. The paper will discuss the prototype implementation and the policies that need to be agreed upon and put into place on TeraGrid resources in order to support such a co-scheduling service. The paper will also discuss plans for providing a general purpose solution to meta-scheduling where meta-scheduling is defined as the capability to submit a job whose resource requirements will be best matched to the appropriate TeraGrid resources.

go to top of page
Predicting Bounds on the Batch Queuing Delay Experienced by Individual TeraGrid User Jobs in Real Time
R. Wolski, R. Garver, D. Nurmi, J. Brevik

In this talk, we present a new method for providing TeraGrid end-users with real-time predictions of the bounds on queuing delay individual jobs will experience when waiting to be scheduled to a machine partition. Predicting the delay users will experience while waiting for their jobs to be be scheduled is a problem that has been studied both by the academic and commercial HPC communities for some time. Our approach, based on a new statistical methodology, predicts bounds on the waiting time (upper or lower) that individual jobs will experience with quantified confidence measures. Thus the predictions made by this system constitute a statistical guarantee of best-case and worst-case waiting delay where the confidence measure quantifies the quality of the guarantee.

We have implemented this new methodology as part of the Network Weather Service and deployed it on TeraGrid where it currently provides real-time bounds predictions. In the talk we will report on the effectiveness of the system which has been in operation as a prototype for approximately 8 months. We will discuss the methodology and its evaluation using batch-queue logs spanning 10 years at the NSF and open DOE supercomputer centers. We will also demonstrate the web interface to the system and make "live" predictions of TeraGrid delay bounds during the presentation from the web page located at http://nws.cs.ucsb.edu/batchq and we will detail the operation of a set of command-line tools that are portable among all ETF architectures.

Our results show that it is possible to predict delay bounds with specified confidence levels for individual jobs in different queues, and for jobs requesting different ranges of processor counts and different maximum execution delays Using these predictions, users with roaming allocations or with allocations at multiple TeraGrid sites can choose the machine that is most likely to minimize turn-around time. Users can also determine the probability that a job will meet a specified deadline in a particular queue. Finally, the system is portable to all ETF architectures making it possible for users to consider the use of heterogeneous resources, and to predict which is most likely to impose the shortest waiting time for their jobs.

go to top of page
A Prediction Service for Grid Computing
W. Smith

Computational grids, such as the TeraGrid, provide users with many possible systems to execute their applications. There are a number of criteria that users or metaschedulers use to select a system such as where the user has allocations and the software and hardware configuration of the machines. One criteria that is often desired is estimates of how long a system will take to complete a job.

We address this problem by providing a prediction web service that predicts the main components of the time to complete a job: The amount of time waiting in a batch scheduler queue, the execution time, and the time to transfer files to and from the execution system. This service forms predictions for all of these components based on historical information using an instance-based learning technique. This technique finds past experiences similar to a query and derives a prediction for the query from these experiences.

This paper describes our prediction service and the techniques it uses to form predictions. Further, the paper describes the accuracy of this technique when used to predict queue wait times and execution times. Our experiments on workloads recorded from several academic and government parallel computers have found average errors for execution time predictions to be 30-45 percent of mean run times and average errors for queue wait time predictions to be 75 to 95 percent of mean wait times. We are currently performing further experiments and plan to report the performance of our techniques for one or more TeraGrid systems.

go to top of page
Adaptive Grid-enabled SIMOX Simulation on Japan-US Grid Testbed
Y. Tanaka, H. Takemiya, S. Sekiguchi, S. Ogata, A. Nakano, R. Kalia, P. Vashishta

We propose a reservation-based sustained Grid supercomputing paradigm to enable tightly-coupled computations of considerable scale (involving over 1,000 processors) and duration (over tens of continuous days) on a Grid of geographically distributed parallel supercomputers. The paradigm is demonstrated for an adaptive multiscale simulation application, in which accurate but compute-intensive quantum mechanical (QM) simulations are embedded within a classical molecular dynamics (MD) simulation only when and where high fidelity is required. The adaptive simulation is implemented by a hybrid Grid remote procedure call (GridRPC) + message passing interface (MPI) Grid application framework to combine flexibility (adaptive resource allocation and migration), fault tolerance (automated fault recovery), and efficiency (scalable management of large computing resources). We have achieved an automated execution of multiscale MD/QM simulation on a Grid consisting of 6 supercomputer centers in Japan and the US including the TeraGrid (in total of 150 thousand processor-hours), in which the number of processors change dynamically on demand and resources are allocated and migrated dynamically according to both reservations and unexpected faults.

go to top of page
Evening Reception, Bistro Lounge, UPCC
GRASS PLUS

Join "Grass Plus" for an evening of New Acoustic Bluegrass Music with world-renowned Michael Lindeau, and Toby Oler and Ryan Deasy, both graduates  of the Indiana University School of Music. 

Original innovative compositions and traditional Bluegrass classics are woven together with powerful vocals, intricate instrumentation, and a performance synergy that brings audiences to their feet.


Michael Lindeau
Electric and Acoustic Violin, Guitar, and Vocals

Ryan Deasy Acoustic Bass, Guitar, and Vocals

Toby Oler Banjo, Guitar, and Vocals

go to top of page

Wednesday, June 14 Abstracts


Education and Outreach

The Open Science Grid TeraGrid Partnership and Interoperability Work
M. Livny, R. Pordes, F. Wuerthwein, L. Grundhoefer

The Open Science Grid (OSG) is a US distributed computing infrastructure, currently consisting of about 50 University and Laboratory sites, that supports scientific computing via an open collaboration of science researchers, software developers and computing, storage and network providers.

OSG is collaborating with the TeraGrid to ensure a consistent and interoperable software base, building on the NMI and VDT software releases. We report on technical activities to enable user organizations to submit jobs and move data across the TeraGrid and OSG infrastructures. We discuss some of the technical security, policy and operational issues that are also being worked on.

go to top of page
vGrid—On-demand Virtual Supercomputing
T. Stef-Praun, S. Goasguen, K. Madhavan

One of the key problems facing educators, students, and beginning scientists is the high barrier of entry that supercomputing and advanced cyberinfrastructure like the TeraGrid represent. In this paper, we discuss the implementation details of a middleware tool funded by the NSF NMI effort that attempts to simplify access and learning of advanced cyberinfrastructure and supercomputing. vGrid—an infrastructure for on-demand virtual supercomputing—was successfully used by over 140 participants as part of the Supercomputing 2005 Education Program. vGrid is a grid system built on virtual resources. This implementation addresses issues of resource allocation to novice users, and provides several benefits such as dedicated grids, with isolation, QoS, and simple control and management.

Computational cycles, storage and bandwidth are in high demand, and increase with the introduction of supercomputing to novice users and to the general public through informal science efforts. The vGrid solution integrates disparate heterogeneous resources into a loosely coupled grid system. Such a computational grid presents to the user standard access interfaces (including authentication, access, execution, and storage) such that it creates the illusion of a supercomputer or the TeraGrid.

The main problem that large, complex systems with many users have to address is the fair and efficient allocation and sharing of the resources. In the case of grids, users and their jobs need to be managed in such a way that maximizes both their experience in the system and the system's resource allocation efficiency. This is generally a very complex problem, as the requirements for each job submitted by the user varies greatly in terms of resource and timing needs, and the impact of a set of jobs sharing the same hardware resource makes it almost impossible to guarantee desired levels of quality of service.

As the allocation complexity is clearly NP-hard, trying to compute optimal allocations for dynamic system in which users and resources can arrive and leave any time, can make the overhead for computing the allocation exceed its utility. There have been several efforts in addressing this problem, and the classic solution is to accept reservation for the resources. Several much more efficient and advanced solutions inspire themselves from economics and build markets where users can compete through bidding in acquiring the resources needed. While the outcome of such implementation is the closest to the ideal, making a market functional implies handling payment strategies and currencies, which is a very high barrier for the case of automated systems, or even for unsophisticated users. vGrid is a first attempt at addressing many of the issues raised here. This paper will discuss these issues in greater depth and will include a real-time demonstration of vGrid.

go to top of page
Adler Planetarium Outreach
Greenberg

*A Web-based Collaboratory for bringing Astronomy Cyberinfrastructure enabled Research into the Classroom*

Gary Greenberg, Northwestern University

This presentation provides an overview of a Web-based astronomy research collaboratory that uses astronomy cyberinfrastructure resources and tools to enable authentic scientific research in high school classrooms across the country. The initiative brings together the resources and experience of The Adler Planetarium and Astronomy Museum, Hands-On Universe at the University of California at Berkeley, the Sloan Digital Sky Survey / National Virtual Observatory at Johns Hopkins University and the Northwestern University Collaboratory Project. An NSF Strategic Technologies for the Internet grant led by Northwestern University has funded the development efforts. An NSF CI-TEAM demonstration project led by the Adler Planetarium is funding a pilot teacher professional development program.

The START Collaboratory integrates access to gigabytes of searchable data and images from the Sloan Digital Sky Survey and SkyServer tools into Web-based collaborative research journals that can be shared and discussed online. From these research journals, students can request observations from a network of Internet controlled telescopes. These observations can be viewed with a Web visualization tool created for visualizing and measuring FITS files. A teacher professional development program is using research scenarios to introduce students to the resources and tools available through the START Collaboratory and to provide a model for network-based collaborative research that engages students, teachers and professional scientists in a virtual community of practice. What distinguishes this approach is being able to bring real data and real tools to students in ways that engage them in authentic research; generating useful scientific results just as professional astronomers do - learning science by doing science. The presentation will also discuss plans for extending this model to bring resources of the National Virtual Observatory (NVO) into the START Collaboratory to support multi-wavelength research scenarios and provide a common collaborative environment for astronomy education and public outreach programs.

go to top of page
Joining the TeraGrid: Things I Wish I'd Known (Panel)
L. McGinnis

Joining the TeraGrid as a Resource Provider requires a significant level of commitment, not only from RP management, but from the operational elements of the provider's organization. Meeting this commitment can be more effective and less stressful if the staff required to perform the integration know what is expected and have access to technical and human resources that can support and inform their work. In October, 2005, the TeraGrid GIG management team made a TG Primer available for Resource Providers and other organizations interested in what makes the TeraGrid work. In the overview, the purpose of the document is stated to provide "an overview of the essential characteristics of computational resources that have been integrated into the TeraGrid system." More recently, the TeraGrid GIG and executive team have begun work on a series of policy documents that further clarify the requirements for becoming a contributing member of the TeraGrid community.

This panel will serve as a "Meet the Team" opportunity, with brief presentations from Working Group chairs (or their representatives). Each speaker will have 5-10 minutes to review the material in their section of the Primer. The remaining time available in the session will be for open discussion with the audience. The overall goals of this session are:

go to top of page
Engaging People in Cyberinfrastructure: A TeraGrid '06 Panel
R. Giles, S. McLean, G. Moses

The EPIC project (Engaging People in Cyberinfrastructure)—which builds on a strong foundation of over 10 years of experience among more than 20 research and education organizations around the country—has been aggressively pursuing the goal of "building human capacity by creating awareness of the opportunities afforded through cyberinfrastructure". The panelists, leaders of EPIC, will share their first-hand experiences at creating appropriate and effective interfaces for tools and technology to serve the needs of diverse communities of practice.

Furthermore, the panel will describe the emerging challenges of bridging the gap between developers and users and will reflect on building and sustaining communities of practice in high performance computing and education. They will share strategies for facilitating collaboration, information sharing, broadening diversity, and scaling up effective programs for preparing future generations of scientists, technologists, engineers and mathematicians.

Finally, the panel will welcome a discussion of how to further engage members of the Cyberinfrastructure community, and beyond, in the coming years.

go to top of page
Service Oriented Learning Services on the nanoHUB: Sakai Integration
K. Chung, K. Madhavan, S. Goasguen

The nanoHUB, operated by the NSF-funded Network for Computational Nanotechnology (NCN), is seen as a model science gateway for integrating discovery and learning in cutting-edge ways. One of the key components of the nanoHUB infrastructure is its ability to deliver learning content that is contextualized and tightly coupled with advanced simulation tools. The service that provides this contextualization is called the learning module service—which aggregates diverse content on the nanoHUB and is compliant with global e-learning specifications such as IMS and Shareable Content Object Reference Model (SCORM) being considered by IEEE for standards status. Since the launch of this service, there is a growing demand from the nanoHUB user community to provide assessment services that can measure learning outcomes as part of the learning modules and simulation tools. The deployment of this assessment service is a major middleware integration challenge that leverages the service-oriented architecture of Sakai and the nanoHUB. This work is funded by the NSF NMI effort.

Sakai is a Java-based collaboration and learning environment for higher education, developed as an open source effort. This is an extensible enterprise framework, built on component-based architecture and a large stack of open source libraries. This paper discusses in-depth the process of integrating assessment services that are available as part of the Sakai environment with the nanoHUB. The integration framework begins with SSO across both environment, auto user registration within the assessment environment, assessment administration, performance tracking, and score reporting. The current state of the system allows users to not only take quizzes and tests as part of the complex nanoHUB learning module service, but also tracks performance metrics across multiple uses.

Future efforts will focus on using this assessment service to move the nanoHUB towards a "neural learning environment"—where the system anticipates the needs of the users. Additional features will allow users to create and maintain a large databank of quiz questions that will automatically monitor and customize themselves to the ability level of the users. The assessment service will also help inform the core nanoHUB middleware about the type of customizations that the novice user requires in order to be a successful nanoHUB simulation user. Plans also include providing nanoHUB users with custom tool and educational content recommendations based on their performances on specific assessment items. As several nanoHUB services start becoming ubiquitous and become available on mobile devices and ink-based environments, the assessment services will be extended to capture performance metrics on a continuous basis. This paper elaborates and discusses in-depth future services that are being planned also.

go to top of page
Computational Science and Engineering Curriculum Concept Map
Moses

The Engaging People in Cyberinfrastructure (EPIC) grant sponsored a Virtual Institute devoted to researching the focus areas that comprise Computational Science and Engineering, with the intention of drawing an outline for CS&E curriculum. This curriculum was organized using a concept map that relates each focus area to the others. A total of 18 focus areas have been identified. In this talk we will report on the outcome of this CSE VI indicate how future research and curriculum development can utilize the work that has been done.

go to top of page

Science Impact

McStas Neutron Instrument Simulation on the TeraGrid
M. Chen, J. Cobb, G. Granroth, M. Hagen, J. Kohl, S. Miller

The Neutron Science TeraGrid Gateway (NSTG) focuses on creating environments conducive to productive HPC usage by the neutron science community. The Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL) is scheduled for completion in 2006. It will provide up to 24 neutron instruments for users world-wide perfuming researches on neutron science and material science. Monte Carlo simulations have proven to be a powerful tool for instrument design optimizations (design, enhancement, upgrades). However, traditional serial instrument simulations on a PC or workstation can not perform the full instrument simulation at sufficient statistical levels rapidly enough to be effectively used in instrument design (although this has been done for moderator development). Collaborating with SNS instrument scientists, the NSTG team at ORNL has designed and developed a software facility to parallelize instrument simulations. This facility has been deployed across the TeraGrid and available to SNS and other neutron scientists. It not only creates the capability of full instrument simulation for design, but also simulation for testing proposal feasibility, pre-experiment planning, on-the-fly data comparison to resolution convolved models, and automated experiment adjustment.

This paper will explore the design goals, software structure, and easy-of-use features of these software facilities. Furthermore it will describe the realized speed-up seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and HYSPEC instruments at SNS. The future work, including using fast simulation for pre-experiment planning, virtual experiment, and comparing models to data in analysis, will also be discussed.

go to top of page
Neutron Science TeraGrid Gateway
J. Cobb, S. Miller, G. Pike, S. Vazhkudai, M. Hagen

The Oak Ridge National Laboratory TeraGrid Resource Provider (RP) is focused on creating bridging services between TeraGrid Cyberinfrastructure and Neutron Scattering facilities in general and specifically the Spallation Neutron Source (SNS), also located at Oak Ridge. SNS (http://www.sns.gov) is the world's largest pulsed neutron source. It creates warm and cold neutron beams (few eV to a few meV) with unprecedented intensity for basic materials science studies with application to biology, genomics, nanotechnology, semiconductor technology, chemistry, polymer science, crystallography and many other science areas. The NSTG bridging services include high bandwidth file and data transfer services, local bridging storage, a local compute resource, application orchestration, and integration assistance for both TeraGrid cyberinfrastructure and SNS advanced neutron science software environments. The idiom of interaction is via portal and web services. Collectively, these are referred to as the Neutron Science TeraGrid Gateway (NSTG). The NSTG is also one of the initial TeraGrid complement of Science Gateways.

The SNS will eventually commission over 20 instruments. They will create data at rates orders of magnitude larger than previous neutron facilities. SNS's seven year, $1.4 billion construction phase ended less than a month before the TeraGrid '06 conference and is now moving into operations. This presentation will discuss NSTG planned services for SNS and neutron science facilities in general as well as early experience with the SNS's advanced software development group and preliminary data from SNS commissioning experience. It will also discuss future plans for support of user operations of increasing number of beamline user programs.

go to top of page
Remote, real-time visualization of multidimensional biological images
C. Gilpin, L. Katherine, K. Gaither

Modern biological light and electron microscopy can be used to produce multidimensional images of organisms, cells, organelles and molecules. Confocal microscopy is used to collect Z stacks of 3 channel data often as a time series. In electron tomography a series of images are collected over a large range of small increment tilt angles. With appropriate software, raw 2D data are rendered into 3D volumes. In order to extract essential information from the data, 3D volumes need to be displayed, segmented, rendered and freely rotated and zoomed in real-time at full resolution. Most visualization software available to biological imaging laboratories is designed to operate on a single workstation and is limited by processor speed, memory capacity and graphics hardware. Here we present a potential solution by using the TeraGrid to access both remote multi-processor computation and remote visualization capabilities. Our test platform is Paraview, an open source application that can be run on distributed and shared memory. Paraview can be run in "server mode" where rendering is computed remotely and the 3D volume data are transferred over the network to the host workstation for local viewing. This has proved unsatisfactory due to bandwidth limitations. The alternative approach we have taken is to both render and display the volume on a remote system and transfer the screen image via VNC to the local system. In this way, only screen pixels need to be transferred over a network connection. Example data will be shown and file size and rendering complexity possibilities explored.

go to top of page
A Web Service-Enabled Workflow System for Climate Modeling Data Processing in TeraGrid
R. Kalyanam, L. Zhao, T. Park, S. Goasguen

This paper presents the design and implementation of a TeraGrid-based data workflow system at Purdue University. As a TeraGrid Resource Provider, we have developed and deployed a generic data management infrastructure that provides easy access to multidisciplinary data collections via the TeraGrid network. In addition, several application level data servers have been integrated into the system, providing application-specific functionalities based on the characteristics of each individual dataset. Our data workflow system is built on top of this data management infrastructure, allowing researchers to construct scientific workflows for data discovery, access, transformation, and analysis. Our system consists of JOpera, an open-source workflow engine and visual composer, as well as a set of web service-based data and computation modules. Using the climate modeling data from the Community Climate System Model (CCSM), we present an end-to-end climate simulation data analysis workflow that connects our TeraGrid data management infrastructure to computation resources. It allows researchers to easily analyze climate modeling results using the AMWG (Atmosphere Model Working Group) diagnostics package. Our workflow system will serve as an extensible platform for easy integration and provisioning of additional data collections and modeling tools.

go to top of page
The Grid Application Hosting Environment
P. Coveney

RealityGrid, a grid computing collaboration of the UK Engineering and Physical Sciences Research Council (EPSRC), has released version 1.0.0 of the Application Hosting Environment (AHE).

Easily deployable, AHE is designed to allow scientists to quickly and easily run applications on remote grid resources.

The AHE provides scientists with application-specific services to use grid resources in a rapid and transparent manner, with the scientific objective as the main driver of the activity. It provides resource selection, application launching, workflow execution, provenance and data-recovery.

Researchers can take any existing "legacy" application and easily host it inside the AHE for deployment on the U.S. National Science Foundation TeraGrid as well as the UK National Grid Service.

The AHE is a Perl-based lightweight hosting environment for running unmodified applications on grid. The application services hosted within the AHE are consistent with the Web Services Resource Framework (WSRF) specification and are interoperable with other WSRF-aware clients.

The AHE client, written in Java, is a consumer of the AHE application services and is designed to be sufficiently light-weight so as to be deployable on PDAs and cellular telephones.

go to top of page
Managing Biomolecular Simulations in a Grid Environment with NAMD-G
M. Gower, J. Cohen, J. Phillips, R. Kufrin, K. Schulten

We describe our experiences designing and deploying NAMD-G, an infrastructure for executing biomolecular simulations using the molecular dynamics code NAMD within the context of a Computational Grid. We motivate this effort through a general outline of the tasks involved in conducting research of this class as traditionally undertaken and follow with a description of the enhancements we perceive to be offered by current developments in Grid technologies. We then describe the specifics of the initial implementation of NAMD-G and provide an example of the use of the system in real-world scientific investigations simulating gas permeation in proteins. We conclude with potential directions for future development of the NAMD-G system.

go to top of page
A Database for Biomolecular Simulations: Challenges in Computing and Storage Resource Management
B. Connelly, J. Sowell, L. Xiao, M. Feig

Molecular dynamics simulations are a widely used computational tool for the study of dynamics in biological macromolecules. Such simulations commonly produce trajectories of atomic coordinates over nanosecond to microsecond simulation time scales with typical data sets ranging from GB to TB. The organization of such data into a publicly accessible database opens up many new research opportunities, in particular the possibility to carry out comparative analysis between multiple trajectories and would greatly facilitate collaborative research centered around biomelecular simulations. Furthermore, a database of biomolecular simulations with a suitable interface offers new educational perspectives for the exploration of biomolecular dynamics based on computer simulations.

The design and implementation of a suitable infrastructure for a biomolecular simulation database is described that addresses the particular challenges in providing public access to such large data sets within the restraints of current network infrastructure and user computing resources. An architecture is proposed where access to the simulation data is provided mainly through automated on-demand analysis so that only the analysis results have to be transmitted to the database user. Such a system depends on a coordination of extensive storage and computational resources in a distributed computing environment. First experiences with an initial implementation of the essential components of such a system in the context of the public SimDB database are presented.

go to top of page
Enabling "Fat Queries" on a TeraGrid-powered Systems Biology Data Warehouse
J. Chen

Systems biology is an emerging interdisciplinary scientific research area, which aims to study molecular networks by mining genomics, functional genomics, proteomics, and protein interactomics data collectively known as the "Omics" data. The understanding of intricate interplays between environmental stimuli and genetic predisposition in systems biology, particularly in human disease studies, can help pharmaceutical scientists design drugs with high therapeutic benefits and low toxicological effects, and help biotechnologists develop molecular biomarkers that can monitor the onset and progressions of diseases.

Supported by Indiana University (IU), we are developing a systems biology data warehouse on the IU TeraGRID. We use Oracle10g DBMS as a platform to manage more than 150GB of systems biology data integrated from more than 30 different public and local sources. Examples of these data includes: genome annotation, gene ontology, protein structures, gene/protein expressions, molecular pathways, transcription factors, protein interaction networks, SNP, and pubMed literature for human and yeast. We use a variety of integration techniques—standard ETL, text wrappers, IBM DB2 data mediators, and semantic webs—to collect data.

The data warehouse is the hub to all our scientific discovery research efforts in systems biology. Using this platform, we can perform queries across many Omics data sets' "fat queries" not possible to answer by any public information retrieval and database systems today. Examples of "fat queries" are "Do gene co-expression tend to predict protein interaction?" and "What are the significant functional cross-talks among differentially expressed proteins from Microarray or proteomics experiments?" A case study will be provided.

go to top of page

Technology

Middleware Integration and Deployment: the nanoHUB
S. Goasguen, R. Kennell, R. Figueiredo, S. Adabala, A. Roy, J. Frey, . Ruth, D. Xu

Virtual Organizations (VO) need an infrastructure to conduct their research activities, share their resources and knowledge, and ultimately achieve their research goals. Such an infrastructure calls for the integration and deployment of the latest middleware technologies and information technologies to deliver various services to the community. The nanoHUB is such a VO that delivers services to the computational nanotechnology community. Over the last years the nanoHUB community has grown to several thousand users. The key service of the nanoHUB is the on-line simulation, where application interfaces are embedded directly in the user's web browser. This service is delivered to a wide range of users such as students in their classroom and experimentalists who want to check theoretical models. Furthermore, the TeraGrid is now accessible through the nanoHUB thanks to the middleware efforts of the nanoHUB and the TeraGrid science gateway program. This connection will allow more researchers to use the nanoHUB to access significant compute power and tackle grand challenges in nanotechnology while using user friendly interfaces.

In this paper we will present the internals of the nanoHUB middleware. We will show how virtualization technologies are used to decouple the VO infrastructure from the physical infrastructure (In-VIGO) and how virtual networking (VIOLIN) and virtual machines migration can be used to provision resources on demand. Additionally, we will show how Condor-G and Condor-C are used from within that virtual infrastructure to access Globus-enabled TeraGrid resources. The architecture presented creates a possible science gateway solution that offers the benefits of interoperability with multiple peer grids and demonstrates recent advances in autonomic grid computing.

go to top of page
Prospects for Instrument Integration with TeraGrid Resources
G. Pike, J. Cobb, J. Rome

This paper presents a partial survey of efforts to integrate instruments and facilities to distributed computing environments. We examine several current instrument integration projects to determine common principles for classification based on instrument size, complexity, and integration technology. Using these classification principles, we evaluate the types of projects that are well suited to integration in a grid-enabled environment. We further assess which types of projects are well suited for deployment on the TeraGrid in terms integration with and benefit from common software stacks, software environment homogeneity across computational resources, data storage and transport tools, data collection integration, and other tools offered in the TeraGrid environment.

go to top of page
Network, Hardware, & Applications: An Integrated Approach to E2E Performance and Diagnostics
C. Rapier, K. Benninger

The very high bandwidth of TeraGrid has reduced the role of the long haul network component as the primary bottleneck in end to end performance. Instead, the source of the bottleneck is now commonly found on or very near the end systems. As such, people looking to understand and common problems that can significantly affect performance, and relate tools and research developed at PSC that can be used to address these problems. We will demonstrate how PCS's Linux based disk caching, the NPAD diagnostic tool, and efficient system and application buffering can provide the best E2E user experience.

go to top of page
Managing Credentials on the TeraGrid with MyProxy
J. Basney

MyProxy provides authentication and credential management services for TeraGrid. We describe recent developments with the MyProxy service relevant to the TeraGrid community.

The TeraGrid MyProxy service is now integrated with the TERAGRID.ORG Kerberos authentication service, allowing TeraGrid users to login via their TeraGrid-wide username and password to obtain credentials for single sign-on authentication to TeraGrid resources.

The new MyProxy Certificate Authority (CA) capability allows TeraGrid users to retrieve short-lived certificates directly from the TeraGrid MyProxy server using their TeraGrid-wide username and password, without needing any previous certificate request or configuration. The TeraGrid MyProxy CA is integrated with the NCSA CA for acceptance across all TeraGrid sites.

MyProxy provides the authentication and credential retrieval capabilities for the TeraGrid User Portal. TeraGrid users can load credentials into their portal session for authenticated access to TeraGrid resources. Portal users can obtain new credentials from the MyProxy CA or can store their existing credentials in the MyProxy repository.

New MyProxy features ease integration with grid portals. The MyProxy server can now trust portals to authenticate users, without requiring an additional MyProxy authentication step. Additionally, MyProxy now supports authentication via the Pubcookie web single sign-on system. MyProxy integration with the Shibboleth web single sign-on system is underway.

The MyProxy service is designed to provide convenient access to credentials across the grid while maintaining the security of private keys by allowing users to obtain short-lived credentials as needed. We have been improving the MyProxy service to meet the needs of the TeraGrid community.

go to top of page
Adaptive MPI on the TeraGrid
R. Reddy, D.C. O'Neal

Adaptive MPI (AMPI) is an extension of the Charm++ library developed by Parallel Performance Laboratory at the University of Illinois.

AMPI couples processor virtualization with an intelligent runtime system to effect communication optimizations and load balancing. As these are important issues for developers of distributed applications, AMPI's potential to deal with things like network latency within the context of the TeraGrid deserves evaluation.

To this end, a representative stencil-based application was selected to demonstrate AMPI's abilities to overcome obstacles presented by the TeraGrid. Our paper describes the use of Charm++ and AMPI at multiple TeraGrid sites, and reports significant results.

go to top of page
Infrastructure for Adaptive Scientific Applications
A. Adiga, A. Purkayastha

Scientific users are often faced with several choices when selecting algorithms and computational kernels for optimal execution of their application on a compute resource. This problem is further magnified in environments such as the TeraGrid, where users have the option of running their applications on a diverse set of computing resources, with a variety of architectures such as IA-64, Power, Alpha, etc. Determining the optimal choices can often be an expensive proposition, and users typically hand tune their applications for each target architecture and subsequently have to maintain several versions of their application code. We demonstrate an approach for tackling these issues in large-scale scientific applications, using a relational database to store performance data, and providing an interface for the application to obtain and use this data at execution time to select optimal computational kernels. We have prototyped a framework for collecting profile and execution data representing computational kernel performance, and storing this data in a relational database on a dedicated and separate archived system. The utility of this framework is demonstrated by using a serial scientific adaptive application, which uses performance data to make runtime decisions between computational kernel options to perform a single, optimal execution of the application.

go to top of page
Parallel IO Performance Studies and SRB integration with HDF5
A. Cheng, M. Yang, Q. Koziol, . Cao, M. Wan

Fast partial access to objects from very large files in the SDSC Storage Resource Broker (SRB) can be extremely challenging, even when those objects are small. The HDF-SRB model integrates the SRB and NCSA Hierarchical Data Format (HDF5), to create an access mechanism within the SRB that is more efficient than current methods for accessing object-based file formats. This model integrates two successful technologies, the SDSC SRB and the NCSA HDF, to create a new, more sophisticated distributed data service. The SRB serves as standard middleware to transfer data between the server and client. HDF5 provides interactive and efficient access to datasets or subsets of datasets in large files without bringing entire files into local machines. A new set of data structures and APIs have been implemented to support such object-level data access. A working prototype of the HDF5-SRB data system has been developed and tested.

HDF5 is a widely used high-performance scientific data IO package that can support parallel IO through MPI-IO. MPI-IO provides a collective IO option to help improve IO performance. HDF5 has supported collective IO through MPI-IO with contiguous array storage since its first release in 1999. Only recently HDF5 has supported collective IO with chunked storage.

However, no comprehensive performance studies have been performed to evaluate the efficiency of collective IO in applications and there are no detailed guidelines to help applications efficiently use different IO and storage options inside HDF5. Some HDF5 applications either avoid using collective IO or misuse collective IO, especially with chunked storage. This part of the presentation will address the above issues as follows:

First, we will present performance comparison results by using the FLASH-IO benchmark with HDF5 and parallel netCDF to illustrate the effectiveness of using collective IO inside HDF5. These results will show that HDF5 and parallel netCDF performance is very similar when collective IO is used in both.

Second, we will present several performance comparisons using different storage and MPI-IO options in HDF5. We hope this will provide some guidelines for applications to do parallel IO efficiently with HDF5.

Third, we will present the internal software management to support collective IO with chunked storage.

Fourth, we will share our experiences in handling the difficulties of utilizing MPI-IO packages on several high performance computing systems.
go to top of page
A Comparative Analysis of Grid Portal Security
D. Del Vecchio, V. Hazelwood, M. Humphrey

Grid portals have recently emerged as a popular paradigm for creating customizable, web-based interfaces to Grid services and resources. Due to the powerful, general-purpose nature of Grid technology, the security of any portal or entry point to such resources cannot be taken lightly. To understand and assess the current state of Grid portal security we undertake a comparative analysis of the three most popular Grid portal frameworks that are being pursued as frontends to the TeraGrid: GridSphere, OGCE and Clarens. We explore some general challenges that web applications face in the areas of authentication, authorization, auditing (logging) and session management then contrast how the different Grid portal implementations address these challenges. We find that although most Grid portals do devote some energy to security concerns, there is still room for improvement, particularly in the areas of secure default configurations and comprehensive logging and auditing support. Our comparative analysis motivates a set of best practice recommendations for designing, implementing and configuring secure Grid portals.

go to top of page
International Grid Trust Federation/The Americas Grid Policy Management Authority
J. Marsteller, D. Quesnel

Over the past few years the TeraGrid has grown to include new partners and collaborators. As these organizations join the TeraGrid project, they bring diverse credentials that must be reviewed for accreditation on an individual basis. To streamline Certificate Authority (CA) accreditation and further promote the TAGPMA charter, the TeraGrid is engaging The Americas Grid Policy Management Authority (TAGPAM) to act as an accreditation entity for CAs in the Americas seeking interoperability with TeraGrid resources.

This session will provide background on the TAGPMA and the larger governing organization the International Grid Trust Federation (IGTF). An overview of IGTF structure, the regional PMAs and IGTF goals will be reviewed. The goal of the IGTF is to foster harmonization and synchronization of these various PMAs policies to allow for a global trust relationship to be established.

go to top of page
Restricted Community Accounts
K. Price

Community accounts represent a means by which scientific research gateways can allow a dynamic user base to submit jobs on computational resources without requiring each end-user to have an individual account on the resource. However, due to their shared nature these accounts are a potential weak point in the security of both the resource and the end-user's data. The goal of this project is to explore ways to ameliorate these problems by providing a framework that a resource provider can use to restrict accounts. Specifically, the focus is on solutions that are usable on a wide variety of POSIX-compliant systems without any major modifications. Our current thrust is towards utilizing the chroot functionality included in the POSIX standard to sandbox community accounts. We have in alpha release two products geared towards this aim: (1) "chroot_jail" is a utility suite that facilitates the construction, maintenance, and regular security auditing of chroot jails; and (2) "commsh" is a shell that can verify that a jail environment is secure, invoke chroot, and check the requested command against a list of allowable commands.

go to top of page

Thursday, June 15 Abstracts


Education and Outreach

Alice: Making Learning to Program Easy and Fun
C. Kelleher

According to the Higher Education Research Association, the number of students interested in majoring in computer science has dropped by nearly 60% over the last five years. Yet, computer science currently enables progress across a wide variety of disciplines from basic science to medicine and education. With such broad impact, it is critical that computer science attract the best and brightest minds. In this talk, I will introduce Alice, a programming environment designed to introduce students to the basics of computer programming that shows promise in rekindling student interest in computer science. Alice allows students to create programs that control the motions of 3D objects in virtual worlds via drag and drop. At the college level, usage of Alice helps improve the grades and retention of beginning computer science students. Further, Alice shows promise in attracting more women to study computer science. Research shows that girls often decide for or against pursuing math and science related disciplines by the end of middle school. To reach middle school girls, we created Storytelling Alice, a version of Alice that introduces girls to programming as a means to the end of creating animated 3D movies, similar in style to those created by Pixar and Dreamworks. Results of a study comparing girls experiences and behavior using the basic and Storytelling versions of Alice demonstrated that using the Storytelling version of Alice, girls spend more of their time programming and are more motivated to work on their projects.

go to top of page
Virtual environments for social networking
S. Tettagah

Massively Multiplayer Online Simulations technologies are becoming a great tool for educators to evaluate various behaviors there were once done in real life environments. The synthetic world of Second Life was used as a platform for this study. Second Life is a 3D synthetic world built by the inhabitants of the world. Prior research documents virtual environments (VE's) are most useful when they are believable to the user. The environment should allow individuals to immerse themselves in an experience that is both functional and easy to relate too. Second Life was developed by its members to depict various representations of real life events. This research investigates human relations and social presence within the synthetic environment of Second Life. We investigated social interactions and social presence of 24 class members who had to work on collaborative teams. It is important to learn how these synthetic worlds can be used to investigate social presence, perception and other aspeccts related to human cognition and behaviors. With the increased growth of simulations and synthetic worlds, synthetic worlds may help researchers to examine influences on human functioning in ways with little effect on the lives of the participants. which very few studies have looked at in terms of how people react in these environments and how they percieve themselves within this environment.

go to top of page
Building Virtual Clusters for Running Caltech CMS Private Production on a Widely Distributed Supercomputing Cyberinfrastructure
V. Litvine, E. Walker, J. Gardner

This paper describes our research in designing, developing and deploying a virtual computing environment to support the submission of up to a million compute intensive serial jobs to the network connected compute clusters on the NSF TeraGrid, one of the world's largest distributed cyberinfrastructure for open scientific research. The system implements a scalable, persistent and robust distributed agent infrastructure that automatically submits and manages job proxies across a widely distributed system. These job proxies contribute resources to virtual clusters created for users on a per-experiment basis, or to physical departmental clusters to augment local scientific computation needs. The specific version of the system described in this paper allows users to build very large virtual Condor pools using the widely distributed resources on the TeraGrid. Up to 100,000 jobs have been submitted through the system to date, enabling approximately 900 teraflops of real scientific computation.

go to top of page
Ensuring Quality Resources for Digital Libraries
P. Jacobs

The Computational Science Education Reference Desk (CSERD), a Pathways project of the National Science, Technology, Engineering and Mathematics Digital Library (NSDL), is ensuring quality by applying verification, validation and accreditation (VV&A) processes to its collection of learning objects for computational science and engineering (CSE).

Along with the growth of the Internet, there has been a growth of interactive materials created by math and science professionals, students, and interested citizens. Much of the material on the Internet has no guarantee of quality, where we define for this case quality implying not just scientific accuracy, but also a well documented architecture, software that runs, the inclusion of educational materials, and accurate documentation of intended audience.

CSERD, in response to the growing consensus that the education community needs, wants, directly benefits from and will contribute to the sustainability of a digital library with quality resources, is applying the VV&A review processes to improve the quality of its resources.

The review processes for the CSERD resources is divided into the following three stages:

  1. Verification that the simulation is right with respect to its model and that the software will run on all computer systems as stated without failures.
  2. Validation that the science is valid and that the resources are based on current scientific principles and methods.
  3. Accreditation that the resources are appropriate for a given audience and correlated to other standards in the model's content area.
go to top of page
The CI Channel—Capturing Science in Action
K. Walsh, A. Bailey

The Cyberinfrastructure Channel (CI Channel) is a team within the Cyberinfrastructure Outreach (CIO) group that includes staff from Eductaional Outreach and Enterprise Network Services at San Diego Supercomputer Center (SDSC). The objective of our presentation is to describe and demonstrate how the CI Channel has created a scalable, streaming media resource that allows scientific research teams and user services staff to broadcast project presentations, and train users on how to effictively use HPC resources, as well as share their science with the educational community.

We will also describe the CI Channel Affiliates Program, a scalable infrastructure for centralized streaming, and cataloged, searchable archives of distributed scientific project team meetings, workshops, and seminars. One, and the same infrastruture can support project teams in the humanities and social sciences, as well.

In short, we will describe the current reality of or original vision, which is to work to proactively solicit and support the education, training, and outreach needs of scientific researchers within the cyberinfrastructure community.

The vision of the CI Channel is to design, assemble, and create multimedia content that both meets the training and outreach needs of the various scientific research communities within CI, but also to deliver this content in high quality multimedia rich formats. This content is purposed for use internally to the communities as well as for external groups such as K-20 and available 24 hours a day through a web based streaming media service of regularly scheduled multicasts, and video on-demand programs.

go to top of page
Security Challenges of the TeraGrid
A. Singer, J. Barlow, B. Link

The TeraGrid consists of sites in different security domains, each with their own policies and procedures. The TeraGrid has introduced some security interdependencies that had to be resolved between these sites, and history has already shown that the security at one TeraGrid site can have an impact on the TeraGrid as a whole.

The TeraGrid Security Working Group has worked together to respond to security incidents, identify and resolve security issues with the TeraGrid. However, the security of TeraGrid sites is not dependent on the working group alone, but the work of the system administrators integrating systems, and software developers providing grid technologies.

This talk will present the the TeraGrid Security policies and procedures that have been developed, discuss ongoing security issues with the TeraGrid, and provide suggestions for system administrators and developers on how their work and improve the security posture of the TeraGrid as a whole and assist the work of the Security Working Group.

go to top of page

Science Impact

Exploring Large-scale Scientific Applications on the TeraGrid
Z. Lan, Y. Li, J. Lee

In collaboration with domain scientists, we have been working on exploring two large-scale scientific applications on the TeraGrid over the past years. One is a cosmology application called ENZO, which is a community code designed for high-resolution, multiphysics, cosmological structure formation simulations using SAMR (Structured Adaptive Mesh Refinement) modeling techniques. The other is a three-dimensional beam dynamics simulation code called Synergia, which is a state-of-the-art beam dynamics modeling code for both linear and circular accelerators with a fully three-dimensional treatment of space charge. Our research aims at improving the performance and efficiency of these large-scale simulations on the TeraGrid through research in two areas: dynamic load balancing and adaptive parameterization. The objective of dynamic load balancing is to ensure that each processor has an appropriate amount of workload with regards to its capacity so as to minimize the overall parallel execution time, while the goal of adaptive parameterization is to dynamically tune data among different physical parameterizations so as to reduce the overall simulation time.

In this paper, we first analyze performance and scalability of both applications on the TeraGrid, and then describe optimization techniques (e.g. dynamic load balancing) that we proposed to resolve potential performance bottlenecks. Finally, we propose a distributed load balancing framework DistDLB to improve the efficiency of large-scale simulations on distributed computing environments. Our preliminary results with cosmology simulations indicate that by considering the heterogeneous and dynamic features of distributed systems, the proposed DistDLB can effectively improve the performance of distributed simulations by 2.56%-79.14%.

go to top of page
Massively parallel atomistic simulations of thermal properties of silicon on the TeraGrid
L. Sun, C. Le, F. Saied, D. McWilliams, J. Murthy

A massively parallel program developed for molecular dynamics simulations computes the thermal conductivity of silicon. The physics model involves only short range forces. Each atom interacts only with its nearest neighbors. As a result, no global communication is required in the parallel implementation. The program shows excellent scalability on a variety of TeraGrid architectures from NCSA, PSC, SDSC and RCAC. It scales up to over 1000 processors. This scalability was achieved by minimizing communication overhead and maintaining load balance across processors. The file input and output is performed infrequently and is not a significant factor. A task that takes 30 days on a single processor can now be completed in 2.5 and 5 hours on 1024 Power4+ processors and 1728 BlueGene's PowerPC processors respectively in the parallel program.

go to top of page
Brokering Metaworkflows
S. Hampton, A. Rossi, J. Alameda, S. Parker, G. Daues, B. Jewett, R. Wilhelmson

We have developed a service oriented architecture (SOA) to manage large numbers of scientific workflows placed on production resources such as those in the TeraGrid consortium. This SOA, which can be characterized as by its ability to broker metaworkflows, has a number of key components. First, the user interface, Siege, is the control panel by which the user launches and monitors complex workflow scenarios onto grid resources. Siege interacts with the Troll ensemble broker and execution service stack, which is informed by the Vizier information services, which provide many of the details necessary to resolve the execution of complex scientific applications on remote services. The Troll ensemble broker handles a high-level workflow above the grid resources, while workflow local to a particular cluster is handled by the Elf application container, typically running a local workflow language, ogrescript. Elf/ogrescript can handle many of the common workflow patterns normally needed on local computational resources, and uses the NCSA Trebuchet libraries to manage file movement needed for successful job execution. We will be describing the SOA, as well as some early application experience with the SOA within projects such as the NSF-funded LEAD ITR project.

go to top of page
Managing Storm Simulation Workflows using LEAD Gateway
S. Marru, M. Christie

An atmospheric scientist who wants to predict a storm using a secured grid system has to deal with security mechanisms require learning various computational skills which are otherwise irrelevant to their science goals. Constructing, configuring, scheduling, executing and monitoring a storm simulation workflow on a grid system involves multiple interactions between resource, execution and monitoring services, among others. The Linked Environments for Atmospheric Discovery (LEAD - http://leadproject.org) gateway, which is a NSF funded large ITR project, is building cyberinfrastructure to enable scientists to predict mesoscale weather events like tornadoes. To accomplish these goals, LEAD is developing an adaptive, on-demand grid infrastructure that responds to complex weather-driven events. The LEAD Gateway will incorporate the next generation of TeraGrid services and schedulers with a focus on using on-demand compute resources for ensemble weather simulations and TeraGrid storage resources for data mining tasks and services. This paper will focus on components of the LEAD Service Oriented Architecture which will allow users to compose a storm simulation workflow by connecting Fortran applications wrapped as web services. The paper also discusses how a user can launch the composed workflow on TeraGrid resources and monitor its progress. Methods to visualize the storm simulation workflow output are presented. Interfaces dealing with metadata of simulations, observational data and forecasted data outputs are discussed. The paper also presents information on various other LEAD Gateway capabilities including an authorization, authentication and auditing framework which will allow different classes of users to seamlessly access and use TeraGrid resources.

go to top of page

Technology

Bouncer: A Globus Job Forwarder
C. Baumbauer, S. Goasguen, S. Martin

Bouncer is a Globus Resource Allocation Manager (GRAM) jobmanager that is designed to assist with the interoperability of different grids by acting as a resource broker between various GRAM gatekeepers. Bouncer can also act as a central resource scheduler or broker that can be used to redirect jobs to various gatekeepers based on information collected from a known information service such as the metadata service (MDS). Some of the challenges that were experienced during the development and running of Bouncer involve the passing of environment variables between gatekeepers, the Resource Selection Language (RSL) used to describe the job, the redirection of output, and keeping everything secure. With Bouncer, we were able to redirect jobs, and obtain results, but to do it required familiarity with the way both the grid running Bouncer, and the destination grid were setup in order to fully utilize Bouncer. With this added functionality, new policies will need to be in place to handle such things as how Bouncer will be able to obtain information about all of the possible resources it can redirect to, how will the user be able to obtain information about what's available on each grid, and also how will each of the various sites track usage whether it be local to the grid, and also if jobs are bounced to other grid domains.

go to top of page
Desktop E-Science: Enabling Access to the TeraGrid from the Windows Platform
G. Wasson, S. Eswaran, M. Humphrey

The TeraGrid is one of the largest science grid deployments in the US and must support and interoperate with collections of scientists using diverse technologies. Increasingly, scientists are using the Windows platform to store their data because Windows machines are readily available (e.g. there is often one on the scientist's desk) and they run nearly ubiquitous applications for certain types of data analysis (e.g. SQLServer or Excel). These scientists would benefit from better client-side support to allow their applications (and the Windows platform in general) to leverage the computational power of the TeraGrid.

We have developed a set of .NET-based client tools that implement the GRAM and GridFTP protocols for remote execution and data movement on the TeraGrid. These tools are implemented entirely in managed (C#) code making calls only to native Windows OS libraries. This paper describes these tools, their implementation and their use, as well as the supporting security infrastructure. We also show how these tools can be used by Windows-based workflow engines to perform complex scientific experiments that span both native Windows machines and the TeraGrid.

go to top of page
Authentication Framework in the Kepler Workflow System
Z. Guan, I. Altintas, E. Jaeger, M. Jones, N. Mangal, J. Tao, M. Miller

Advances in Grid computing technology and Cyberinfrastructure have enabled scientists to explore research issues in a variety of disciplines at scales both finer and greater than ever before. The availability of efficient data collection and analysis tools presents researchers with vast opportunities to process heterogeneous data within a distributed environment. To support the opportunities enabled by available massive computation, the Kepler project provides scientists an open-source scientific workflow system to manage both data and programs, and to design reusable procedures of scientific experimental tasks. With more workflows executing on multiple secured computing resources, Kepler users need a mechanism to coordinate between different security infrastructures. A system-level support was needed to generate and manage the proxy certificates of different resources for the user. Kepler authentication framework is such a system component that not only facilitates users to login into various computing resources and store their corresponding authentications, but also provides workflows a unified interface to access remote resources with the support of Grid Security Infrastructure (GSI). This paper describes the motivation, the design, and the implementation of Kepler authentication framework. The usage of this framework is also demonstrated with several real-world workflows.

go to top of page
An Overlay File-System for Distributed Computing on the TeraGrid
E. Walker, E. Turner

A unified file name space provides users on the TeraGrid a transparent and consistent computing environment for running jobs across the distributed sites. Such a name space also eases the development, and porting, of applications across the heterogeneous resources available on the TeraGrid. This paper will introduce the concept of overlay file-systems: a user-space technique for enabling this unified name space across distributed compute sites. It will also describe tools that are using this concept to enhance the productivity of users on the TeraGrid. Overlay file-systems allow file-system services that are not available on the physical file-system to be dynamically provisioned on a per-application or per-user basis. This paper will show examples of this concept in the distributed file-system support for clusters created by GridShell/MyCluster, and a new remote login facility called USSH. The GridShell/MyCluster tool, already deployed in production on TeraGrid, will allow jobs running in its environment to access remote files locally using an overlay file-system. Also, the new remote login facility, USSH, will provide the union file system semantics through an overlay, allowing users to import entire file system partitions across sites on the TeraGrid. Users can then compile and debug their applications across sites transparently, easing the burden of maintaining separate code repositories at each site. This paper will finally conclude with performance data from a set of micro-benchmarks and real user-case scenarios.

go to top of page
NCSA Trebuchet: A Powerful File Management Interface for TeraGrid
A. Rossi, S. Hampton, E. Wu, D. Adams, J. Alameda

NCSA Trebuchet represents a breakthrough in file management for grid environments, such as TeraGrid, for both ease of use and capability for advanced file management. Packaged in three different forms (desktop graphical user interface, an emerging command line tool, and high level java library), Trebuchet provides some compelling features, such as:

We are working to provide the first public release of Trebuchet, in support of users needs throughout TeraGrid, through a working group from NCSA's Cyberenvironments and Technologies Directorate and Persistent Infrastructure Directorates dedicated to improving Trebuchet. The group's work has already improved the product, especially by bringing experts in TeraGrid data movement optimizations together with the Trebuchet developers. With high capability desktop tools such as Trebuchet, we hope to bring one's desktop computing environment closer to the high end computing and data environment provided by TeraGrid.

go to top of page
Implementing GPFS-WAN on the TeraGrid
C. Jordan, J. White, P. Kovatch

With fungible allocations spanning multiple sites, TeraGrid (TG) scientists want to be able to run and save their data from multiple sites. Sharing these Terabyte-sized data sets across the geographically distributed resources of the TeraGrid posed certain challenges. The General Parallel File System (GPFS) was deployed and mounted at several TG sites allowing the same files to be available at these sites. This file system, called GPFS-WAN, was built with 500 TB of disk.

IBM's General Parallel File System (GPFS) was initially developed as a parallel file system for local clusters. The multi-cluster feature to GPFS means that a cluster can participate in both local GPFS and remote GPFS clusters.

The use of existing Grid technologies to enable UID-mapping between administrative domains is described in detail, along with the relationship of this work to evolving Grid standards work. The experience of pilot projects such as the National Virtual Observatory (NVO) and BioInformatics Research Network (BIRN), and the production workflows enabled by the global filesystem are related.

It was possible to export and achieve reasonable performance for GPFS-WAN over TeraGrid for two reasons: the TeraGrid is a dedicated high performance network with limited participating sites and the disk and wide area network latency mask each other. GPFS-WAN has been in production on TeraGrid since October of 2005. It is mounted on over 1500 nodes on a variety of Linux and AIX platforms.

go to top of page