Last update: June 25, 2020
XSEDE is now accepting Research Allocation Requests for the allocation period beginning October 1, 2020 and ending September 30, 2021. The submission period is for one month from June 15, 2020 through July 15, 2020.Review the new XSEDE resources and information below prior to submitting your allocation request through the XSEDE User Portal. Also, consult the Estimated Resource Amounts Available for the current XRAC meeting on the Research allocations page.
First time here? Check out the Resource Info page to learn about the resources available, and then visit the Startup page to get going! Startup, Campus Champions, and Education Allocation requests may be submitted at any time throughout the year.
NEW See the latest webinar from XSEDE, "Code Performance and Scaling". Learn the technical aspects of research allocation proposals including how best to gather and present scaling and code performance statistics and estimating SU requests.
|Code Performance and Scaling |
Recorded: April 1, 2020
Run time: 1hr 7mins
See the XSEDE Resources Catalog for a complete list of XSEDE compute, visualization and storage resources, and more details on the new systems.
SDSC is pleased to announce it's newest supercomputer Expanse. Expanse will be a Dell integrated cluster, composed of compute nodes with AMD Rome processors, GPU nodes with NVIDIA V100 GPUs (with NVLINK), interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. Expanse supercomputer will provide three new resources for allocation. Limits noted below are subject to change, so consult the Expanse website for the most up-to-date information.
Expanse Compute: The compute portion of Expanse features AMD Rome processors, interconnected with Mellanox HDR InfiniBand in a hybrid fat-tree topology. There are 728 compute nodes, each with two 64-core AMD EPYC 7742 (Rome) processors for a total of 93,184 cores in the full system. Each compute node features 1TB of NVMe storage, 256GB of DRAM per node, and PCIe Gen4 interfaces. Full bisection bandwidth will be available at the rack level (56 nodes) with HDR100 connectivity to each node. HDR200 switches are used at the rack level and are configured for a 3:1 over-subscription between racks. In addition, Expanse has four 2 TB large memory nodes.
There are two allocation request limits for the Expanse Compute resource:
- A maximum request(SU) limit of 15M SUs except for Science Gateway requests, which may request larger amounts (up to 30M SUs)
- A limit on the maximum size of a job set at 4,096 cores, with higher core counts possible by special request
Expanse GPU: The GPU component of Expanse has 52 GPU nodes each containing four NVIDIA V100s (32 GB SMX2), connected via NVLINK, and dual 20-core Intel Xeon 6248 CPUs. Each GPU node has 1.6TB of NVMe storage and 256GB of DRAM per node, and HDR100 connectivity.
Expanse Projects Storage: Lustre-based allocated storage will be available as part of an allocation request. The filesystem will be available on both the Expanse Compute and GPU resources. Storage resources, as with compute resources, must be requested and justified, both in the XRAS application and the proposal's main document.
Expanse will feature two new innovations: 1) scheduler-based integration with public cloud resources; and 2) composable systems, which supports workflows that combine Expanse with external resources such as edge devices, data sources, and high-performance networks.
Since the Expanse AMD Rome CPUs are currently not available for benchmarking, PIs are requested to use Comet (or any comparable system) performance/scaling information in their benchmarking and scaling section. For the Expanse GPU nodes, PIs can use performance info on V100 GPUs (if available) or use 1.3X speed up over Comet P100 GPU (or comparable GPU) performance as a conservative estimate. The time requested must be in V100 GPU hours.
PIs requesting allocations should consult the Expanse website (https://expanse.sdsc.edu) for additional details and the most current information.
PSC's Bridges-2 platform will address the needs of rapidly evolving research by combining high-performance computing (HPC), high-performance artificial intelligence (HPAI), and high-performance data analytics (HPDA) with a user environment that prioritizes researcher productivity and ease of use.
Hardware highlights of Bridges-2 include HPC nodes with 128 cores and 256 to 512GB of RAM, scalable AI with 8 NVIDIA Tesla V100-32GB SXM2 GPUs per accelerated node and dual-rail HDR-200 InfiniBand between GPU nodes, a high-bandwidth, tiered data management system to support data-driven discovery and community data, and dedicated database and web servers to support persistent databases and domain-specific portals (science gateways).
User environment highlights include interactive access to all node types for development and data analytics; Anaconda support and optimized containers for TensorFlow, PyTorch, and other popular frameworks; and support for high-productivity languages such as Jupyter notebooks, Python, R, and MATLAB including browser-based (OnDemand) use of Jupyter, Python, and RStudio. A large collection of applications, libraries, and tools will make it often unnecessary for users to install software, and when users would like to install other applications, they can do so independently or with PSC assistance. Novices and experts alike can access compute resources ranging from 1 to 64,512 cores, up to 192 V100-32GB GPUs, and up to 4TB of shared memory.
Bridges-2 will support community datasets and associated tools, or Big Data as a Service (BDaaS), recognizing that democratizing access to data opens the door to unbiased participation in research. Similarly, Bridges-2 is available to support courses at the graduate, undergraduate, and even high school levels. It is also well-suited to interfacing to other data-intensive projects, instruments, and infrastructure.
Bridges-2 will contain three types of nodes: Regular Memory (RM), Extreme Memory (EM), and GPU (Graphics Processing Unit; GPU). These are described in turn below.
Bridges-2 Regular Memory (RM) nodes will provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. Each of Bridges-2's 504 RM nodes will each consist of two AMD 7742 "Rome" CPUs (64 cores, 2.25-3.4 GHz, 3.48 Tf/s peak), 256-512 GB of RAM, 3.84 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. 488 Bridges-2 RM nodes have 256 GB RAM, and 16 have 512 GB RAM for more memory-intensive applications. Bridges-2 will be HPE Apollo 2000 Gen11 servers.
Bridges-2 Extreme Memory (EM) nodes will provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Each of Bridges-2's 4 EM nodes will consist of four Intel Xeon Platinum 8260M CPUs, 4 TB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and one HDR-200 InfiniBand adaptor. Bridges-2 will be HPE ProLiant DL385 Gen10+ servers.
Bridges-2 GPU (GPU) nodes will be optimized for scalable artificial intelligence (AI). Each of *Bridges-2's 24 GPU nodes will contain 8 NVIDIA Tesla V100-32GB SXM2 GPUs, providing 40,960 CUDA cores and 5,120 tensor cores. In addition, each GPU node will contain two Intel Xeon Gold 6248 CPUs, 512 GB of DDR4-2933 RAM, 7.68 TB NVMe SSD, and two HDR-200 adaptors. Their 400 Gbps connection will enhance scalability of deep learning training across up to 192 GPUs. The GPU nodes can also be used for other applications that make effective use of the V100 GPUs' tensor cores. Bridges-2 GPU nodes will be HPE Apollo 6500 Gen10 servers.
The Bridges-2 Ocean data management system will provide a unified, high-performance filesystem for active project data, archive, and resilience. Ocean will consist of two tiers – disk and tape – transparently managed by HPE DMF (Data Management Framework) as a single, highly usable namespace, and a third all-flash tier will accelerate AI and genomics. Ocean's disk subsystem, for active project data, is a high-performance, internally resilient Lustre parallel filesystem with 15 PB of usable capacity, configured to deliver up to 129 GB/s and 142 GB/s of read and write bandwidth, respectively. Its flash tier will provide 9M IOps and an additional 100 GB/s. The disk and flash tiers will be implemented as HPE ClusterStor E1000 systems. Ocean's tape subsystem, for archive and additional resilience, is a high-performance tape library with 7.2 PB of uncompressed capacity (estimated 8.6 PB compressed, with compression done transparently in hardware with no performance overhead), configured to deliver 50TB/hour. The tape subsystem will an HPE StoreEver MSL6480 tape library, using LTO-8 Type M cartridges. (The tape library is modular and can be expanded, if necessary, for specific projects.)
Bridges-2, including both its compute nodes and its Ocean data management system, is internally interconnected by HDR-200 InfiniBand in a fat tree Clos topology. Bridges-2 RM and EM nodes each have one HDR-200 link (200 Gbps), and Bridges-2 GPU nodes each have two HDR-200 links (400 Gbps) to support acceleration of deep learning training across multiple GPU nodes.
Bridges-2 will be federated with Neocortex, an innovative system also at PSC that will provide revolutionary deep learning capability that accelerates training orders of magnitude. This will complement the GPU-enabled scalable AI available on Bridges-2 and provide transformative AI capability for data analysis and to augment simulation and modeling.
More information about the Bridges-2 resource can be found at: https://www.psc.edu/bridges-2
The XRAS developers have updated the allocations submissions interface so that Renewal submissions will now have some data fields pre-populated with values from the prior submission.
The pre-populated values include the title, project roles, fields of science, keywords, and supporting grants that have not expired. Users are strongly encouraged to review these pre-filled values for any that may need updates.
Request End Dates for Education Allocations: Education Allocations can now be requested to align with the semester or training course period being taught. See the Education Allocations page for details.
Continuing this submission period, access to XSEDE storage resources along with compute resources will need to be requested and justified, both in the XSEDE Resource Allocation System (XRAS) and in the body of the proposal's main document. The following XSEDE sites will be offering allocatable storage facilities, these are:
- IU/TACC Jetstream - required when requesting IU/TACC Jetstream
- SDSC Data Oasis - required when requesting SDSC Comet or Comet GPU
- TACC Ranch - required when requesting TACC Stampede2
Storage needs have always been part of allocation requests, however, XSEDE will be enforcing the storage awards in unison with the storage sites. Please vist XSEDE's Storage page for more info.