ECSS Symposium

Extended Collaborative Support program provides expert assistance in a wide range of
cyberinfastructure technologies. Any user may request this assistance through the XSEDE allocation process.

The primary goal of this monthly symposium is to allow the over 70 staff members working in ECSS to exchange information about successful techniques used to address challenging science problems. Tutorials on new technologies may also be featured. Two 30-minute, technically-focused talks will be presented each month and will include a brief question and answer period. This series is open to all.

Symposium coordinates:

Third Tuesday of the month, pending scheduling conflicts. Upcoming events will be listed on the website and announcements posted to the News Training category.
1pm Eastern/12pm Central/10am Pacific

These sessions will be recorded. For this large webinar, only the presenters and host will be broadcasting audio. Attendees may submit questions to the presenters through a moderator by sending a chat message.

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/350667546
 
Or iPhone one-tap (US Toll):  +14086380968,350667546# or +16465588656,350667546#
 
Or Telephone:
    Dial: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll)
    Meeting ID: 350 667 546
 

Videos and slides from past presentations are available below.
(Videos not listed below, from prior years, can be found here: 201220132014, 2015)

June 16, 2020

Scalable Research Automation using Globus

Presenter(s): Rachana Ananthakrishnan (Globus)

Presentation Slides

REST APIs exposed by the Globus service, combined with high-speed networks and Science DMZs, create a data management platform that can be leveraged to increase efficiency in research workflows. In many cases, current ad hoc or human centered processes fall short of addressing the needs of researchers as their work becomes more data intensive. As data volumes grow, the overhead introduced by such non-scalable processes hampers core research activities, sometimes to the point where research takes a back seat to wrangling with IT infrastructure. However, technologies exist for reducing this burden and reengineering processes such that they can easily cope with growing data velocity and volume. One such technology is the Globus platform-as-a-service that facilitates access to advanced data management capabilities, and enables integration of these capabilities into existing and new scientific workflows to automate repetitive tasks: data replication, ingest from instruments, backup, archival, data distribution, etc. We will present real-world examples that illustrate how Globus can be used to perform data management tasks at scale, with no or minimal effort on the part of the researcher. Examples include streamlined data flows at the Advanced Photon Source data sharing system, used to distribute data from light source experiments. We will describe how the Globus platform provides intuitive access to authentication, authorization, sharing, transfer, and synchronization capabilities that can be included in simple scripts or integrated into more full-featured applications.

Building Source-to-Source Tools for High-Performance Computing

Presenter(s): Chunhua "Leo" Liao (LLNL)

Presentation Slides

Computational scientists face numerous challenges when trying to exploit powerful and complex high-performance computing (HPC) platforms. These challenges arise in multiple aspects including productivity, performance, correctness and so on. In this talk, I will introduce a source-to-source approach to addressing HPC challenges. Our work is based on a unique compiler framework named ROSE. Developed at Lawrence Livermore National Laboratory, ROSE encapsulates advanced compiler analysis and optimization technologies into easy-to-use library APIs so developers can quickly build customized program analysis and transformation tools for C/C++/Fortran and OpenMP programs. Several example tools will be introduced, including the AST inliner, outliner, and a variable move tool. I will also briefly mention ongoing work related to benchmarks, composable tools, and training for compiler/tool developers. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-ABS-810981).


May 19, 2020

Gateway Production Monitoring

Presenter(s): Kenneth Yoshimoto (SDSC)

Presentation Slides

In order to monitor the function of a production gateway, Neuroscience Gateway (NSG), two programs were developed to test gateway functions: data upload, job submission, and output retrieval. NSG uses the Workbench Framework (WF) code base. Other gateways using WF are COSMIC2 and CIPRES. The WF gateway can provide both a non-API web interface and a RESTful API. NSG makes both these interfaces available to users. For routine monitoring of production status, programs were written to do a daily test of both interfaces. The programs and testing process will be presented.

Essentials for a Successful XRAC Proposal: Code Performance and Scaling

Presenter(s): Lars Koesterke (TACC)

Presentation Slides

Many PI's struggle with putting together a sound computational plan based on code performance and scaling information. In fact, for first-time PI's the most common reason for rejection is an insufficient computational plan. With this new training module we are trying to address this problem. The training module attempts to answer two questions: Why is scaling and performance data important and how is it used by reviewers, and how to use this data to put together a computational plan? Currently the module is geared towards traditional HPC communities and we are working on extending the content towards new communities. The purpose of my talk at the ECSS symposium is to bring staff members on the same page and to raise awareness that there is a new resource available that may help educating users struggling with writing a successful XRAC proposal.


April 21, 2020

Beginners tutorial on cloud devops on Jetstream focused on Kubernetes and JupyterHub

Presenter(s): Andrea Zonca (SDSC)

Presentation Slides

This symposium assume no previous knowledge of cloud technologies and will cover the following topics: * Example virtual machine setup with Openstack command line tools * Deploying a Kubernetes Cluster on Jetstream * How Kubernetes works, architecture, differences between containers and Virtual Machines * Deploying JupyterHub on Jetstream for a workshop


March 17, 2020

AMP Gateway: An portal for atomic, molecular and optical physics simulations.

Presenter(s): Sudhakar Pamidighantam (Indiana University)

Presentation Slides

We describe the creation of a new Atomic and Molecular Physics science gateway (AMPGateway). The gateway is designed to bring together a subset of the AMP community to work collectively to make their software suites available and easier to use by the partners as well as others. By necessity, a project such as this requires the developers to work on issues of portability, documentation, ease of input, as well as making sure the codes can run on a variety of architectures. The gateway was built using Apache Airavata gateway middleware framework. Initially it was deployed using the Airavata PHP client on the web but has since been redeployed under a Django web framework. Here we outline the organization and facility of the Django deployment and how it has been used discuss future directions for the AMP gateway.

Bursting into the public Cloud – Sharing my experience doing it at large scale for IceCube

Presenter(s): Igor Sfiligoi (SDSC)

Presentation Slides

When compute workflow needs spike well in excess of the capacity of a local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. I have recently helped IceCube expand their resource pool by a few orders of magnitude, first to 380 PFLOP32s for a few hours and later to 170 PFLOP32s for a whole workday. In the process we moved O(50 TB) of data to and from the clouds, showing that networking is not a limiting factor, either. While there was a non-negligible dollar cost involved with each, the effort involved was quite modest. In this session I will explain what was done and how, alongside an overview of why IceCube needs so much compute.


January 21, 2020

CUDA-Python and RAPIDS for blazing fast scientific computing

Presenter(s): Abe Stern (NVIDIA)

Presentation Slides

We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming.


December 17, 2019

Extracting Domain Information using Deep Learning

Presenter(s): Amit Gupta (TACC)

Presentation Slides

In this session we will present an overview of our exploration of using Deep Learning in extracting entities of interest from journal article text. Over various scientific domains, extracting and curating new knowledge from large bodies of text remains a challenging task. To this end, we have developed a computational tool, named DIVE (Domain Informational Vocabulary Extraction) to provide entity extraction and expert curation functionality. The tool has been integrated with the publication pipeline used by American Society of Plant Biologists. Using the author feedback mechanism in our deployed tool we were able to create an expert user annotated dataset based on articles submitted over an entire year. This new gold standard dataset for supervised training now enables us to contrast several methods for the entity extraction task. We use the NeuroNER tool to investigate the effectiveness of deep neural network in this task and also contrast it with other tools using a variety of different methods such as ABNER (using CRF) and DIVE (using an ensemble of regular expression rules, keyword dictionaries and ontology files). Our early results from NeuroNER training with author annotations shows very promising improvement on predicting the important words from the documents. This makes it an excellent candidate for future development and integration into the DIVE tool.

The Distant Reader: Reading at scale

Presenter(s): Eric Lease Morgan (Notre Dame)

The Distant Reader is a tool for reading. It takes an arbitrary amount of unstructured data (text) as input, and it outputs sets of structured data for analysis -- reading. Given a corpus of just about any size (hundreds of books or thousands of journal articles), the Distant Reader analyzes the corpus, and outputs a myriad of reports enabling the researcher to use & understand the corpus. Designed with college students, graduate students, scientists, or humanists in mind, the Distant Reader is intended to supplement the traditional reading process. This presentation outlines the problems the Reader is intended to address as well as the way it is implemented on the Jetstream platform with the help of both software and personnel resources from XSEDE. The Distant Reader is freely available for anybody to use at https://distantreader.org


October 15, 2019

On Developing Reusable Software Components for the Advanced Cyberinfrastructure

Presenter(s): Ritu Arora (TACC)

Presentation Slides

Developing reusable software components that can be integrated in unforeseen software projects has the potential of enhancing the productivity of the programmers who are reusing the software. However, the initial cost of developing such components can be higher than developing components for a single use-case. In this talk, we will discuss a couple of reusable software components that were developed for the BOINC@TACC and Gateway-In-a-Box (GIB) projects. One software component is named as Greyfish and it is a portable, cloud-based filesystem. Another software component is named as Midas, which is a tool for automating the generation of Docker images from source code. Both these software components were initially prototyped for predefined needs and were tightly coupled with other components they interoperated with. However, after determining that the amount of effort involved in teasing out these components and making them available as stand-alone software is insignificant and can help with the sustainability goals of the aforementioned projects, we refactored these software components, and wrote clear documentation for installing and using them. Doing this helped us in improving the software quality - people in the community started using the software, and helped us in fixing some bugs and improving the documentation. In summary, there is often a direct or indirect cost involved in making software reusable, and this cost may vary from project to project. However, the long-term sustainability and maintenance needs of the project may far outweigh the cost associated with software reusability.

Exploring the Dynamics of a Quantum-Mechanical Compton Generator

Presenter(s): Marty Kandes (SDSC)

Presentation Slides

In 1913, while he was still was an undergraduate, American physicist Arthur Compton invented a simple way to measure the rotation rate of the Earth with a tabletop-sized experiment, independent of any astronomical observation. The experiment consisted of a large diameter circular ring of thin glass tubing filled with water and oil droplets. After placing the ring in a plane perpendicular to the surface of the Earth and allowing the fluid mixture of oil and water to come to rest, Compton then abruptly rotated the ring, flipping it 180 degrees about an axis passing through its own plane. The result of the experiment was that the water acquired a measurable drift velocity due to the Coriolis effect arising from the daily rotation of the Earth about its own axis. Compton measured this induced drift velocity by observing the motion of the oil droplets in the water with a microscope. This device, now named after him, is known as a Compton generator. The fundamental research objective of this XSEDE project is to explore the dynamics of a quantum-mechanical analogue to the classical Compton generator experiment through the use of numerical simulations. In this presentation, I describe how the physics of the problem itself drives many of the computational challenges in the simulations; what numerical methods and computational techniques were implemented in the custom simulation code written to explore the problem (and other quantum systems in rotating frames of reference); the performance characteristics and limitations of this code; some challenges in creating a post-simulation visualization pipeline; as well as the latest results and future directions of the project.


September 17, 2019

The "Morelli Machine": A Proposal Testing a Critical, Algorithmic Approach to Art History

Presenter(s): Paul Rodriguez (SDSC)

Presentation Slides

The Morelli Machine refers to an algorithmic approach to characterizing authorship from the late 19th century which proposed that fine details of minor items in a painting would reveal particular styles. The PIs set out to test the hypothesis that contemporary computer vision techniques could perform this sort of "stylistic" matching. In order to do this, they sought to mechanize a method that is indigenous to art history and that uses details as a proxy for style. This project approached the question of "style" as one of extracting features that have some discriminatory power for distinguishing paintings or groups of paintings. We used feature discovery from a pretrained convolution network (VGG19) for object recognition. We processed both whole images and some class of image parts (ie mouths), and performed clustering. In this presentation I will review the image preparation steps, extraction steps, clustering results, and cluster evaluation. The upshot is that all convolution layers indeed have discriminatory features, and different layers might have different kinds of features, with different interpretability that may be hard to define.

Improving Science Gateways usage reporting for XSEDE

Presenter(s): Amit Chourasia (SDSC)

Presentation Slides

Science domain-specific gateways have gained wide use by providing easy web-based access to complex cyberinfrastructure. Science Gateways are consuming an increasing proportion of computational capacity provided by XSEDE. A typical approach used by Science Gateways is to use a single community account with a compute allocation to process compute jobs on behalf of their end users. The computation usage for Science Gateways is compiled from batch job submission systems and reported by the XSEDE service providers. However, this reporting does not capture and provide information about the user who actually initiated the computation, as the batch systems do not have this information. To overcome this reporting limitation, Science Gateways utilize a separate pipeline to submit job-specific attributes to XSEDE, which is then later co-joined with batch system information submitted by the Service Providers to create detailed usage reports. In this presentation I will describe improvements to the Gateway attribute reporting system, which better serves the needs of the growing Science Gateway community and provides them with a simpler and streamlined way to report usage and ultimately publish this information via XDMoD.


August 20, 2019

Hadoop and Spark on a Shared Resource

Presenter(s): Byron Gill (PSC)

Presentation Slides

Hadoop, Spark, and the ecosystem of other software that interacts with them are in demand, but many of the assumptions about the typical use case for these programs don't apply to the typical user on a shared HPC cluster. This talk will explore some of the challenges in creating a workable environment within the confines of a shared cluster and describe some of the approaches we've used at PSC to accommodate the needs of our users.

Lessons learned in Developing a coupling interface between Kinetic PUI code (Fortran) and a Global MHD code (C++)

Presenter(s): Laura Carrington (SDSC)

Presentation Slides

The objective of the PI's team was to obtain a quantitative understanding of the dynamical heliosphere, from its solar origin to its interaction with the LISM, by creating a data-driven suite of models of the Sun-to-LISM connection. To accomplish this, I worked to develop a coupling interface between a Kinetic PUI code (Fortran) and a Global MHD code (C++). The kinetic PUI code models the nonthermal (pickup) ions (PUIs) created as new populations of neutral atoms are born in the SW and LISM. The PUIs generate turbulence that heats up the thermal ions. PUIs are further accelerated to create anomalous cosmic rays (ACRs). This code was originally serial and designed to compute a single trajectory of a particle. The coupling allows the PUI code to get magnetic field data from a large Global MHD parallel simulation code and compute ~5000 trajectories in a single run. The challenges of parallelizing the PUI code and coupling its Fortran77 and Fortran90 code with the C++ Global MHD code is presented along with lessons learn in working with mixed mode codes and on TACC Stampede2.


June 18, 2019

HPC+Jupyter for Computational Chemistry

Presenter(s): Albert Lu (TACC)

Presentation Slides

Methods of computational chemistry have demonstrated remarkable power in predicting materials properties, and therefore are widely utilized in academic researches and industrial applications. In 2018, at TACC for example, over 30% of the computational time used on the supercomputer Stampede2 were chemistry/materials science related applications. Providing a more intuitive way of performing simulations can not only help lower the learning curve for new users, but also create a different user experience and value. In this presentation, Albert Lu (TACC) will give an overview of interactive computing with Jupyter notebook, and demonstrate how to setup and run interactive simulation jobs (of LAMMPS) on Stampede2. Related tools for parallel computing (IPython Parallel) and workflow managing (Parsl) will also be discussed in this talk.

The Development of a Mobile Augmented Reality Application for Visualizing the Protein Data Bank

Presenter(s): Max Collins (UC Irvine)
Principal Investigator(s): Alan Craig (U. Illinois and Shodor)

Presentation Slides

In 2015-2016, then undergraduate student Max Collins was in the Blue Waters Student Internship Program. In that internship, he received training in high performance computing and developed a project in conjunction with his mentor, Alan Craig. His project was to create a mobile augmented reality application to visualize the Protein Data Bank. This presentation will discuss the technical details and development process of that application. In addition, Max will address how the internship and this application has affected his schooling and career choices. An early version of the application can be seen in the video on this page: http://www.ncsa.illinois.edu/news/story/blue_waters_intern_visualizes_a_career_in_app_development