ECSS Symposium

Extended Collaborative Support program provides expert assistance in a wide range of
cyberinfastructure technologies. Any user may request this assistance through the XSEDE allocation process.

The primary goal of this monthly symposium is to allow the over 70 staff members working in ECSS to exchange information about successful techniques used to address challenging science problems. Tutorials on new technologies may also be featured. Two 30-minute, technically-focused talks will be presented each month and will include a brief question and answer period. This series is open to all.

Symposium coordinates:

Third Tuesday of the month, pending scheduling conflicts. Upcoming events will be listed on the website and announcements posted to the News Training category.
1pm Eastern/12pm Central/10am Pacific

These sessions will be recorded. For this large webinar, only the presenters and host will be broadcasting audio. Attendees may submit questions to the presenters through a moderator by sending a chat message.

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/350667546
 
Or iPhone one-tap (US Toll):  +14086380968,350667546# or +16465588656,350667546#
 
Or Telephone:
    Dial: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll)
    Meeting ID: 350 667 546
 

Videos and slides from past presentations are available below.
(Videos not listed below, from prior years, can be found here: 201220132014, 2015)

September 15, 2020

High Resolution Spatial Temporal Analysis of Whole-Head 306-Channel Magnetoencephalography & 66-Channel Electroencephalography Brain Imaging in Humans During Sleep

Presenter(s): David Shannahoff-Khalsa (UCSD) Mona Wong (SDSC) Jeff Sale (SDSC)

Presentation Slides

In chronobiology, the circadian rhythm is known as the 24-hr sleep-wake cycle. The ultradian rhythm has a shorter cycle with approximately a 1-3 hour periodicity, with considerable variability. This project's goal is to follow up on our earlier EEG work during sleep, and that of others, that has identified a rhythm of how the two cerebral hemispheres alternate in dominance with coupling to the ultradian rhythm of the rapid eye movement (REM) and non-rapid eye movement (NREM) sleep cycle. Here we are also comparing whole head and regional variations in cerebral dominance to gain better insight to this novel rhythm during sleep. This rhythm of alternating cerebral hemispheric dominance also manifests during the waking state, and it is apparently coupled to every major bodily system and now presents as a novel rhythm regulated by the central and autonomic nervous systems via the hypothalamus. With the support of XSEDE ECSS, this project has processed 306-channel magnetoencephalography that includes 3 signal types (1 magnetometer, 2 opposing gradiometers) and 66-channel EEG recordings from 4 normal healthy sleep subjects. We are analyzing the data to compare the 4 signal types filtered into 6 frequency bands, over the whole head and 6 discrete regions of the head to see how they vary with the REM and NREM sleep stages. Our analysis includes a relatively new algorithm called Fast Orthogonal Search that is well suited for analyzing the periodicity in nonlinear dynamical systems. Our analysis also includes unique methods in visualization for observing how these patterns of left minus right hemisphere power exhibit during sleep stages.


August 18, 2020

RDA Recommendations and Outputs

Presenter(s): Anthony Juehne (RDA Foundation)

Presentation Slides

The RDA was launched in 2013 to fill the identified need for a neutral, collaborative space gathering the diverse data communities and, through informed consensus, building the social and technical bridges to enable open data sharing. Since its founding, the RDA principles - Open, Consensus, Balance, Harmonization, Community-driven, Non-profit, and Technology-Neutral - have resonated across research communities. RDA membership includes currently over 11,000 participants representing 144 countries from all populated continents collaborating in 97 working or interest groups. The RDA is focused on actively building outcomes to accelerate the work to support open data interoperability, sharing, and use. This happens through the development and deployment of two primary output categories: i) technical infrastructure (e.g., tools, models, registries); and ii) social infrastructure (e.g., common standards, best practices, policies). This presentation will discuss an approach to implementing RDA developed outputs and recommendations across multiple areas of organizational operation, including human development and education, data laws and policies, research practices, data and metadata formats and standards, data sharing workflows, and infrastructure management for enhanced interoperability.

FAIR Data and SEAGrid Gateway a Research Data Alliance Adoption Project

Presenter(s): Rob Quick (Indiana University)

The Science and Engineering Grid (SEAGrid) Gateway has been an active resource for the computational community since 2016. During this time the utility of persistent identifiers for research data products has become prevalent in research communities as defined in the FAIR principles for open data. At the beginning of 2020 the Research Data Alliance funded an adoption project to integrate RDA outputs and recommendations focused on PID issuance to data and software components that make up a science workflow within the SEAGrid environment. This presentation will summarize this project and describes the gateway and data infrastructure components required for integration along with the details of the integration process. The work done in this adoption project can be used to inform future gateway projects that adopt the technical components of FAIR which are reliant on a persistent identifier resolution infrastructure.


June 16, 2020

Scalable Research Automation using Globus

Presenter(s): Rachana Ananthakrishnan (Globus)

Presentation Slides

REST APIs exposed by the Globus service, combined with high-speed networks and Science DMZs, create a data management platform that can be leveraged to increase efficiency in research workflows. In many cases, current ad hoc or human centered processes fall short of addressing the needs of researchers as their work becomes more data intensive. As data volumes grow, the overhead introduced by such non-scalable processes hampers core research activities, sometimes to the point where research takes a back seat to wrangling with IT infrastructure. However, technologies exist for reducing this burden and reengineering processes such that they can easily cope with growing data velocity and volume. One such technology is the Globus platform-as-a-service that facilitates access to advanced data management capabilities, and enables integration of these capabilities into existing and new scientific workflows to automate repetitive tasks: data replication, ingest from instruments, backup, archival, data distribution, etc. We will present real-world examples that illustrate how Globus can be used to perform data management tasks at scale, with no or minimal effort on the part of the researcher. Examples include streamlined data flows at the Advanced Photon Source data sharing system, used to distribute data from light source experiments. We will describe how the Globus platform provides intuitive access to authentication, authorization, sharing, transfer, and synchronization capabilities that can be included in simple scripts or integrated into more full-featured applications.

Building Source-to-Source Tools for High-Performance Computing

Presenter(s): Chunhua "Leo" Liao (LLNL)

Presentation Slides

Computational scientists face numerous challenges when trying to exploit powerful and complex high-performance computing (HPC) platforms. These challenges arise in multiple aspects including productivity, performance, correctness and so on. In this talk, I will introduce a source-to-source approach to addressing HPC challenges. Our work is based on a unique compiler framework named ROSE. Developed at Lawrence Livermore National Laboratory, ROSE encapsulates advanced compiler analysis and optimization technologies into easy-to-use library APIs so developers can quickly build customized program analysis and transformation tools for C/C++/Fortran and OpenMP programs. Several example tools will be introduced, including the AST inliner, outliner, and a variable move tool. I will also briefly mention ongoing work related to benchmarks, composable tools, and training for compiler/tool developers. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-ABS-810981).


May 19, 2020

Gateway Production Monitoring

Presenter(s): Kenneth Yoshimoto (SDSC)

Presentation Slides

In order to monitor the function of a production gateway, Neuroscience Gateway (NSG), two programs were developed to test gateway functions: data upload, job submission, and output retrieval. NSG uses the Workbench Framework (WF) code base. Other gateways using WF are COSMIC2 and CIPRES. The WF gateway can provide both a non-API web interface and a RESTful API. NSG makes both these interfaces available to users. For routine monitoring of production status, programs were written to do a daily test of both interfaces. The programs and testing process will be presented.

Essentials for a Successful XRAC Proposal: Code Performance and Scaling

Presenter(s): Lars Koesterke (TACC)

Presentation Slides

Many PI's struggle with putting together a sound computational plan based on code performance and scaling information. In fact, for first-time PI's the most common reason for rejection is an insufficient computational plan. With this new training module we are trying to address this problem. The training module attempts to answer two questions: Why is scaling and performance data important and how is it used by reviewers, and how to use this data to put together a computational plan? Currently the module is geared towards traditional HPC communities and we are working on extending the content towards new communities. The purpose of my talk at the ECSS symposium is to bring staff members on the same page and to raise awareness that there is a new resource available that may help educating users struggling with writing a successful XRAC proposal.


April 21, 2020

Beginners tutorial on cloud devops on Jetstream focused on Kubernetes and JupyterHub

Presenter(s): Andrea Zonca (SDSC)

Presentation Slides

This symposium assume no previous knowledge of cloud technologies and will cover the following topics: * Example virtual machine setup with Openstack command line tools * Deploying a Kubernetes Cluster on Jetstream * How Kubernetes works, architecture, differences between containers and Virtual Machines * Deploying JupyterHub on Jetstream for a workshop


March 17, 2020

AMP Gateway: An portal for atomic, molecular and optical physics simulations.

Presenter(s): Sudhakar Pamidighantam (Indiana University)

Presentation Slides

We describe the creation of a new Atomic and Molecular Physics science gateway (AMPGateway). The gateway is designed to bring together a subset of the AMP community to work collectively to make their software suites available and easier to use by the partners as well as others. By necessity, a project such as this requires the developers to work on issues of portability, documentation, ease of input, as well as making sure the codes can run on a variety of architectures. The gateway was built using Apache Airavata gateway middleware framework. Initially it was deployed using the Airavata PHP client on the web but has since been redeployed under a Django web framework. Here we outline the organization and facility of the Django deployment and how it has been used discuss future directions for the AMP gateway.

Bursting into the public Cloud – Sharing my experience doing it at large scale for IceCube

Presenter(s): Igor Sfiligoi (SDSC)

Presentation Slides

When compute workflow needs spike well in excess of the capacity of a local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. I have recently helped IceCube expand their resource pool by a few orders of magnitude, first to 380 PFLOP32s for a few hours and later to 170 PFLOP32s for a whole workday. In the process we moved O(50 TB) of data to and from the clouds, showing that networking is not a limiting factor, either. While there was a non-negligible dollar cost involved with each, the effort involved was quite modest. In this session I will explain what was done and how, alongside an overview of why IceCube needs so much compute.


January 21, 2020

CUDA-Python and RAPIDS for blazing fast scientific computing

Presenter(s): Abe Stern (NVIDIA)

Presentation Slides

We will introduce Numba and RAPIDS for GPU programming in Python. Numba allows us to write just-in-time compiled CUDA code in Python, giving us easy access to the power of GPUs from a powerful high-level language. RAPIDS is a suite of tools with a Python interface for machine learning and dataframe operations. Together, Numba and RAPIDS represent a potent set of tools for rapid prototyping, development, and analysis for scientific computing. We will cover the basics of each library and go over simple examples to get users started. Finally, we will briefly highlight several other relevant libraries for GPU programming.


December 17, 2019

Extracting Domain Information using Deep Learning

Presenter(s): Amit Gupta (TACC)

Presentation Slides

In this session we will present an overview of our exploration of using Deep Learning in extracting entities of interest from journal article text. Over various scientific domains, extracting and curating new knowledge from large bodies of text remains a challenging task. To this end, we have developed a computational tool, named DIVE (Domain Informational Vocabulary Extraction) to provide entity extraction and expert curation functionality. The tool has been integrated with the publication pipeline used by American Society of Plant Biologists. Using the author feedback mechanism in our deployed tool we were able to create an expert user annotated dataset based on articles submitted over an entire year. This new gold standard dataset for supervised training now enables us to contrast several methods for the entity extraction task. We use the NeuroNER tool to investigate the effectiveness of deep neural network in this task and also contrast it with other tools using a variety of different methods such as ABNER (using CRF) and DIVE (using an ensemble of regular expression rules, keyword dictionaries and ontology files). Our early results from NeuroNER training with author annotations shows very promising improvement on predicting the important words from the documents. This makes it an excellent candidate for future development and integration into the DIVE tool.

The Distant Reader: Reading at scale

Presenter(s): Eric Lease Morgan (Notre Dame)

The Distant Reader is a tool for reading. It takes an arbitrary amount of unstructured data (text) as input, and it outputs sets of structured data for analysis -- reading. Given a corpus of just about any size (hundreds of books or thousands of journal articles), the Distant Reader analyzes the corpus, and outputs a myriad of reports enabling the researcher to use & understand the corpus. Designed with college students, graduate students, scientists, or humanists in mind, the Distant Reader is intended to supplement the traditional reading process. This presentation outlines the problems the Reader is intended to address as well as the way it is implemented on the Jetstream platform with the help of both software and personnel resources from XSEDE. The Distant Reader is freely available for anybody to use at https://distantreader.org


October 15, 2019

On Developing Reusable Software Components for the Advanced Cyberinfrastructure

Presenter(s): Ritu Arora (TACC)

Presentation Slides

Developing reusable software components that can be integrated in unforeseen software projects has the potential of enhancing the productivity of the programmers who are reusing the software. However, the initial cost of developing such components can be higher than developing components for a single use-case. In this talk, we will discuss a couple of reusable software components that were developed for the BOINC@TACC and Gateway-In-a-Box (GIB) projects. One software component is named as Greyfish and it is a portable, cloud-based filesystem. Another software component is named as Midas, which is a tool for automating the generation of Docker images from source code. Both these software components were initially prototyped for predefined needs and were tightly coupled with other components they interoperated with. However, after determining that the amount of effort involved in teasing out these components and making them available as stand-alone software is insignificant and can help with the sustainability goals of the aforementioned projects, we refactored these software components, and wrote clear documentation for installing and using them. Doing this helped us in improving the software quality - people in the community started using the software, and helped us in fixing some bugs and improving the documentation. In summary, there is often a direct or indirect cost involved in making software reusable, and this cost may vary from project to project. However, the long-term sustainability and maintenance needs of the project may far outweigh the cost associated with software reusability.

Exploring the Dynamics of a Quantum-Mechanical Compton Generator

Presenter(s): Marty Kandes (SDSC)

Presentation Slides

In 1913, while he was still was an undergraduate, American physicist Arthur Compton invented a simple way to measure the rotation rate of the Earth with a tabletop-sized experiment, independent of any astronomical observation. The experiment consisted of a large diameter circular ring of thin glass tubing filled with water and oil droplets. After placing the ring in a plane perpendicular to the surface of the Earth and allowing the fluid mixture of oil and water to come to rest, Compton then abruptly rotated the ring, flipping it 180 degrees about an axis passing through its own plane. The result of the experiment was that the water acquired a measurable drift velocity due to the Coriolis effect arising from the daily rotation of the Earth about its own axis. Compton measured this induced drift velocity by observing the motion of the oil droplets in the water with a microscope. This device, now named after him, is known as a Compton generator. The fundamental research objective of this XSEDE project is to explore the dynamics of a quantum-mechanical analogue to the classical Compton generator experiment through the use of numerical simulations. In this presentation, I describe how the physics of the problem itself drives many of the computational challenges in the simulations; what numerical methods and computational techniques were implemented in the custom simulation code written to explore the problem (and other quantum systems in rotating frames of reference); the performance characteristics and limitations of this code; some challenges in creating a post-simulation visualization pipeline; as well as the latest results and future directions of the project.


September 17, 2019

The "Morelli Machine": A Proposal Testing a Critical, Algorithmic Approach to Art History

Presenter(s): Paul Rodriguez (SDSC)

Presentation Slides

The Morelli Machine refers to an algorithmic approach to characterizing authorship from the late 19th century which proposed that fine details of minor items in a painting would reveal particular styles. The PIs set out to test the hypothesis that contemporary computer vision techniques could perform this sort of "stylistic" matching. In order to do this, they sought to mechanize a method that is indigenous to art history and that uses details as a proxy for style. This project approached the question of "style" as one of extracting features that have some discriminatory power for distinguishing paintings or groups of paintings. We used feature discovery from a pretrained convolution network (VGG19) for object recognition. We processed both whole images and some class of image parts (ie mouths), and performed clustering. In this presentation I will review the image preparation steps, extraction steps, clustering results, and cluster evaluation. The upshot is that all convolution layers indeed have discriminatory features, and different layers might have different kinds of features, with different interpretability that may be hard to define.

Improving Science Gateways usage reporting for XSEDE

Presenter(s): Amit Chourasia (SDSC)

Presentation Slides

Science domain-specific gateways have gained wide use by providing easy web-based access to complex cyberinfrastructure. Science Gateways are consuming an increasing proportion of computational capacity provided by XSEDE. A typical approach used by Science Gateways is to use a single community account with a compute allocation to process compute jobs on behalf of their end users. The computation usage for Science Gateways is compiled from batch job submission systems and reported by the XSEDE service providers. However, this reporting does not capture and provide information about the user who actually initiated the computation, as the batch systems do not have this information. To overcome this reporting limitation, Science Gateways utilize a separate pipeline to submit job-specific attributes to XSEDE, which is then later co-joined with batch system information submitted by the Service Providers to create detailed usage reports. In this presentation I will describe improvements to the Gateway attribute reporting system, which better serves the needs of the growing Science Gateway community and provides them with a simpler and streamlined way to report usage and ultimately publish this information via XDMoD.