ECSS Symposium

Extended Collaborative Support program provides expert assistance in a wide range of
cyberinfastructure technologies. Any user may request this assistance through the XSEDE allocation process.

The primary goal of this monthly symposium is to allow the over 70 staff members working in ECSS to exchange information about successful techniques used to address challenging science problems. Tutorials on new technologies may also be featured. Two 30-minute, technically-focused talks will be presented each month and will include a brief question and answer period. This series is open to all.

Symposium coordinates:

Third Tuesday of the month, pending scheduling conflicts. Upcoming events will be listed on the website and announcements posted to the News Training category.
1pm Eastern/12pm Central/10am Pacific

These sessions will be recorded. For this large webinar, only the presenters and host will be broadcasting audio. Attendees may submit questions to the presenters through a moderator by sending a chat message.

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/350667546
 
Or iPhone one-tap (US Toll):  +14086380968,350667546# or +16465588656,350667546#
 
Or Telephone:
    Dial: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll)
    Meeting ID: 350 667 546
 

Videos and slides from past presentations are available below.
(Videos not listed below, from prior years, can be found here: 201220132014, 2015)

June 18, 2019

HPC+Jupyter for Computational Chemistry

Presenter(s): Albert Lu (TACC)

Presentation Slides

Methods of computational chemistry have demonstrated remarkable power in predicting materials properties, and therefore are widely utilized in academic researches and industrial applications. In 2018, at TACC for example, over 30% of the computational time used on the supercomputer Stampede2 were chemistry/materials science related applications. Providing a more intuitive way of performing simulations can not only help lower the learning curve for new users, but also create a different user experience and value. In this presentation, Albert Lu (TACC) will give an overview of interactive computing with Jupyter notebook, and demonstrate how to setup and run interactive simulation jobs (of LAMMPS) on Stampede2. Related tools for parallel computing (IPython Parallel) and workflow managing (Parsl) will also be discussed in this talk.

The Development of a Mobile Augmented Reality Application for Visualizing the Protein Data Bank

Presenter(s): Max Collins (UC Irvine)
Principal Investigator(s): Alan Craig (U. Illinois and Shodor)

Presentation Slides

In 2015-2016, then undergraduate student Max Collins was in the Blue Waters Student Internship Program. In that internship, he received training in high performance computing and developed a project in conjunction with his mentor, Alan Craig. His project was to create a mobile augmented reality application to visualize the Protein Data Bank. This presentation will discuss the technical details and development process of that application. In addition, Max will address how the internship and this application has affected his schooling and career choices. An early version of the application can be seen in the video on this page: http://www.ncsa.illinois.edu/news/story/blue_waters_intern_visualizes_a_career_in_app_development


May 21, 2019

ECSS CGEM project: Experiences & Beyond

Presenter(s): Kent Milfeld (Texas Advanced Computing Center)

Presentation Slides

Often performance can be analyzed through profilers such as gprof and VTune. At other times it is necessary to observe what is happening in a code with other tools to find performance problems. In this presentation we'll look at a few handy tools used to discover a performance problem in the marine science CGEM code.

SimCCS Science Gateway: Towards Creating a Dynamic Web based Portal for Carbon Capture and Storage

Presenter(s): Sudhakar Pamidighantam (Indiana University)

This presentation will describe the SimCCS Science gateway, a portal for Simulating Carbon Capture, Transport and Storage. We will motivate the need for the simulations and where it is used potentially and describe the gateway creation and interfaces in detail. The evolution of the gateway from basic optimization of pipeline network with user prepared inputs through a desktop application that drives the workflow ending with a web browser based interface for driving the workflow using Apache Airavata integrated Django framework will be presented.


April 16, 2019

The Digital Object Architecture and Enhanced Robust Persistent Identification of Data

Presenter(s): Rob Quick (Indiana University)

The expansion of the research community's ability to collect and store data has grown much more rapidly than its ability to catalog, make accessible, and make use of data. Recent initiatives in Open Science and Open Data have attempted to address the problems of making data discoverable, accessible and re-usable at internet scales. The Enhanced Robust Persistent Identification of Data (E-RPID) project's goal is to address these deficiencies and enable options for data interoperability and reusability in the current research data landscape by utilizing Persistent Identifiers (PIDs) and a kernel of state information available with PID resolution. To do this requires integrating a set of preexisting software systems along with a small set of newly developed software solutions. The combination of these software components and the core principles of making data FAIR (findable, accessible, interoperable and reusable) will allow us to use Persistent Identifiers to create an end-to-end fabric capable of realizing the Digital Object Architecture for researchers. This presentation will introduce the audience to the concepts of the Digital Object Architecture, describe the software services necessary to enable this architecture, introduce the existing E-RPID testbed that is available for experimental usage on the Jetstream cloud environment, and describe the diverse set of use cases already using E-RPID to enhance their data accessibility, interoperability and reusability.


March 19, 2019

SeedMe2: Data Sharing Cyberinfrastructure for Researchers

Presenter(s): Amit Chourasia (San Diego Supercomputer Center)

Presentation Slides

Data is an integral part of scientific research, and data size problems have become endemic as computation and analyses are producing an increasingly large amount of data that research teams are inevitably tasked with managing these rapidly growing data collections. Existing solutions are largely focused upon providing storage space, whether local or in the cloud, and a familiar folder tree-style hierarchy. While these file system solutions work, they separate the data from essential contextual information, such as metadata, descriptive text and equations, job execution parameters, visualizations, and on-going data discussion among the researchers. Important discussions, for instance, remain in email logs or forums, while descriptive text is left in README files or embedded in those same email logs and forums. This distribution of contextual information makes it harder to keep track of it all and keep data from being orphaned or misinterpreted. A more unified approach is needed that keeps data and context together within the same storage system. In this talk I will discuss and interactively demonstrate key features of building blocks for data sharing and data management developed by the SeedMe2 (Stream, Encode, Explore and Disseminate My Experiments) project . It enables research teams to manage, share, search, visualize, and present their data in a web- based environment using an access-controlled, branded, and customizable website they own and control. It supports storing and viewing data in a familiar tree hierarchy, but also supports formatted annotations, lightweight visualizations, and threaded comments on any file/folder. The system can be easily extended and customized to support metadata, job parameters, and other domain and project- specific contextual items. The software is open source and available as an extension to the popular Drupal content management system. Project website with easy trial option: http://dibbs.seedme.org


February 19, 2019

Sustaining Science Gateway Operations through SciGaP Service

Presenter(s): Suresh Marru (Science Gateways Research Center, Indiana University)

Science Gateways dramatically accelerate scientific discovery by providing crucial user- and science-centric points of entries to access cyberinfrastructure resources while shielding them from the technicalities of interacting with XSEDE like distributed infrastructure. XSEDE's Extended Collaborative Support Services (ECSS) has collaborated in making it as easy as possible for scientific communities to create such Science Gateways and help them integrate with XSEDE. However it is important to sustain these collaborative efforts and assist XSEDE communities in operating these gateways. In this talk we will present ECSS project exemplars which have adopted the hosted Apache Airavata services operated by the NSF funded Science Gateway Platform (SciGaP) project thus decreasing the overhead for gateway operations. The talk will conclude by providing references for future ECSS projects to take advantage of out-of-the box Gateway platform with customizable user interfaces, or integrating a la carte via direct programmatic access from existing community Gateway implementations.

Ansible on the Cloud: A match made in heaven

Presenter(s): Eric Coulter (Science Gateways Research Center, Indiana University)

Presentation Slides

One of the major difficulties facing researchers in getting started with national cyberinfrastructure (CI) is the pain of actually *using* it. For support staff, it is a continual struggle to effectively onboard new users and provide interfaces to compute resources. With the advent of cloudy research CI, it has become possible to provide highly customized resources for a variety of scientific domains, while at the same time giving access to those resources through gateways. I will discuss how customized infrastructure can enable a wide range of scientific projects, from bioinformatics to real-time data gathering. I will also demonstrate how the use of Ansible makes it relatively easy to create configurable, replicable infrastructure on Jetstream's Openstack cloud, and provide participants with a starting point for building their own customized infrastructure.


January 15, 2019

Searching through the SRA - A focus on the ECSS work

Presenter(s): Mats Rynge (USC)

Presentation Slides

The Sequence Read Archive (SRA), the world's largest database of sequences, hosts approximately 10 petabases (10^16 bp) of sequence data and is growing at the alarming rate of 10 TB per day. Yet this rich trove of data is inaccessible to most researchers: searching through the SRA requires large storage and computing facilities that are beyond the capacity of most laboratories. Enabling scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. As a prototype project, we specifically focus on providing a search capability against metagenomic sequences (whole community datasets from different environments). These data represent approximately 46 TB of data in the SRA. We provided two different search algorithms that can be used by domain scientists to explore this data. The presentation includes details on how XSEDE ECSS helped to create a science gateway using open community science gateway framework, Apache Airavata, and an auto-scaled processing setup using Jetstream and direct mounted Wrangler storage for efficient data access for the growing user community of Searching the SRA.

Hyperglyphs: Pushing the Limits of Glyph Structure to Gain Insight Into Large Datasets

Presenter(s): Jeff Sale (SDSC)

Presentation Slides

The concept of a glyph in scientific visualization is well known and has found numerous applications over the years. However, the limits to the level of complexity of glyph structure have only begun to be fully explored. At the same time, a growing percentage of the big data torrent consists of semi-structured, unstructured, and non-traditional data, presenting a challenge for conventional visualization methods. Some data are so complex it is difficult to know where to begin to gain insight into trends and anomalies hidden within. We need new and innovative ways to visually explore such massive amounts of complex data. In this symposium I will provide a brief history of glyphs in scientific visualization and conditions in which their use is appropriate and beneficial. Then I make the case that conventional, simple glyphs should be extended and complexified into what I call ‘hyperglyphs', highly complex visual structures designed to encapsulate much more information within a single glyph and which, when thousands are arrayed in an interactive 3D space, can significantly enhance perception and information assimilation leading to new knowledge and insight. I will provide a wide range of examples from diverse fields including education, physiology, meteorology, public health, and social media.


December 18, 2018

Bioinformatics: Working with Campus Champion Fellows

Presenter(s): Alex Ropelewski (PSC)

Presentation Slides

Problems which require Bioinformatics skills are attractive to a wide variety of researchers, including researchers at Research Intensive Universities as well as researchers at smaller institutions. In this talk, I will highlight two projects involving the analysis of Next Generation Sequencing data that I've worked on with XSEDE Campus Champion Fellows – one involving Cancer Data and one involving Metagenomics. I will conclude the talk with advice for integrating a Campus Champion Fellow into an ECSS project.

Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets

Presenter(s): Justin Matejka (Autodesk Research)

Presentation Slides

With traditional Computer Aided Design users typically create a single model. In contrast, generative design allows users specify high-level goals and constraints, and then the system can automatically generate hundreds or thousands of candidate designs all meeting the design criteria. Once a large collection of design variations is created, the designer is left with the task of finding the design, or set of designs, which best meets their requirements. This is a complicated task which could require analyzing the structural characteristics and visual aesthetics of the designs. In this talk we present Dream Lens, an interactive visual analysis tool for exploring and visualizing large-scale generative design datasets.


October 16, 2018

PolyRun - Polymer Microstructure Exploration HPC Gateway

Presenter(s): Amit Chourasia (SDSC) Christopher Thompson (Purdue)

Presentation Slides

Polymers are long chain macromolecules with physical properties that make them appealing for a wide range of uses in structural support, organic electronics, and biomedical applications. The microscopic structure adopted by polymers plays a key role in determining their suitability for advanced applications. Computational simulation tools provide a convenient and powerful method to guide experiments to create desirable structures. In this talk we will discuss ECSS activity to support development of PolyRun Gateway that allows seasoned and non-HPC users to easily perform complex computations and utilize simulations as an aid in designing experiments towards desired materials.

Efficient construction of limit order books for financial markets

Presenter(s): Robert Sinkovits (SDSC)

Presentation Slides

A limit order book (LOB) is a record of unexecuted orders to buy or sell a stock at a specified price. The LOB can then be used as a starting point for deeper analysis of markets, leading to a better understanding of the impact of trading behaviors, suggestions for regulations to make markets more effective or identification of manipulative practices such as quote stuffing. Construction of full-resolution LOBs is computationally demanding and, as a consequence, approximations are often employed. Unfortunately, this limits the utility of the LOBs in the era of high frequency trading. In this collaboration with Mao Ye (U. Illinois), we describe how we were able to first optimize the performance of existing full-resolution LOB construction software to achieve a 100x reduction in run time, and then refactor the software to ultimately improve time to solution by 1000-3000x.


September 18, 2018

The XSEDE Monthly HPC Workshops

Presenter(s): John Urbanic (PSC)

Presentation Slides

I will talk about the XSEDE Monthly Workshop Series, which uses the Wide Area Classroom. It has exceeded 10,500 actual-sitting-in-the-classroom students over the past 5 years, with growth continuing. The HPC topics core to the series will be discussed, as will the benefits of the WAC approach. We will discuss audience satisfaction and demographics as well as discuss the latest improvements and developments. All of this with the intention that many of these techniques are of use to other XSEDE outreach, training and education efforts.

GISandbox: A Science Gateway for Geospatial Computing

Presenter(s): Davide Del Vento (NCAR)

Presentation Slides

Science gateways provide easy access to domain-specific tools and data. The field of Geographic Information Science and Systems (GIS) uses myriad tools and datasets, which raises challenges in designing a science gateway to meet users' diverse research and teaching needs. GISandbox is a new science gateway designed to meet the needs of researchers and educators leveraging geospatial computing. The GISandbox is built on Jupyter Notebooks to create an easy, open, and flexible platform for geospatial computing. Jupyter Notebooks is a widely used interactive computing environment running in the browser that integrates live code, narrative, equations and images. We extend the Jupyter Notebook platform to enable users to run interactive notebooks on the cloud resource Jetstream or computationally-intensive notebooks on the Bridges supercomputer located at the Pittsburgh Supercomputing Center. A novel Job Management platform allows the user to easily submit a Jupyter Notebook for batch execution on Bridges (and eventually Comet), monitor the SLURM job, and retrieve output files. GISandbox Virtual Machines are created in Jetstream's Atmosphere interface and then deployed and configured using a series of Ansible scripts. When properly used, Ansible scripts allow to create an easily reproducible and scalable system. In this talk we will highlight use cases of GISandbox, give a bird's view on how we have met their requirements in our implementation and discuss future plans including how it could be applied in other domains.


August 21, 2018

OpenTopography: A gateway to high resolution topography data and services

Presenter(s): Choonhan Youn (SDSC)

Presentation Slides

Over the past decade, there has been dramatic growth in the acquisition of publicly funded high-resolution topographic and bathymetric data for scientific, environmental, engineering and planning purposes. Because of the richness of these data sets, they are often extremely valuable beyond the application that drove their acquisition and thus are of interest to a large and varied user community. However, because of the large volumes of data produced by high-resolution mapping technologies such as lidar, it is often difficult to distribute these datasets. Furthermore, the data can be technically challenging to work with, requiring software and computing resources not readily available to many users. Some of these complex algorithms require high performance computing resources to run efficiently, especially in an on-demand processing and analysis environment. With the steady growth in the number of users, complex and resource intensive algorithms to generate derived products from these invaluable datasets, HPC resources are becoming more necessary to meet the increasing demand. By utilizing the comet XSEDE resource, OpenTopography aims to democratize access and processing of these high-resolution topographic data.

Development of multiple scattering theory method: the recent progress and applications

Presenter(s): Yang Wang (PSC)

Presentation Slides

Multiple scattering theory is an ab initio electronic structure calculation method in the framework of density functional theory. It differs from other ab initio methods in that it is an all-electron method and is not based on variational approach. Its advantage of having easy access to the Green function makes it a unique tool for the study of random alloys and electronic transport. In this presentation, I will give a brief overview of the multiple scattering theory, and will discuss the recent ECSS projects relevant to the development and applications of multiple scattering theory method.