| User Support & Documentation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data StorageOn this page
Related links:Need Help?File Management OverviewUsers control where their data resides; selection of appropriate storage is important for efficient management of job output. Data can be stored in a user's home directory, in a temporary location or in archival (mass) storage.
It is helpful to familiarize yourself with the characteristics of file systems at locations where you choose to store data in order to make the best choices:
SpeedIn general, if you plan on moving data across sites, use a fast file system, that is, a parallel file system for temporary storage of intermediate to large quantities of data. VisibilityDifferences exist in the underlying storage locations that are mounted at each site. Environment variables that are in common use across all sites provide a common syntax for refering to the location of storage that is available at each site, thereby hiding the underlying differences in their paths. See the Environment page of this guide for more information on environment variables. However, to move data from one resource to another, explicit paths must be specified rather than using environment variables. QuotasHome directories at each TeraGrid site have enforced quotas. Scratch and parallel file systems share the total space. In the table below, space available on scratch and parallel file systems is dependent upon concurrent use. Use the df command to display available space before sending large data outputs to these file systems. Backup & Purge PoliciesHome directories at each site are backed up; scratch, parallel file systems, and archival storage are not. Regardless of backup and purge policies at individual sites, users are advised to back up valuable data frequently. Backups may be made to the user's local system or to other archival storage on the TeraGrid; in the case of HPSS, requesting two copies of data will result in the creation of copies on two separate tapes. Please note that the tapes are physically stored in the same facility. See the Data Storage Policies and Specs table below for detailed backup and purge policies at each site. For detailed information on each site's storage policies, type tg-policy -data from the command line to see detailed data policy information of the site you are logged into. Some examples of running tg-policy at various sites can be found here. Data Storage File Systems & PoliciesReview the table below as well as the Resource page and details at specific sites
Temporary storage locations on the TeraGrid and permanent (archival or mass) storage in a UNIX-like file system (UniTree at NCSA or HPSS at SDSC) or in a collection that is organized for searching and sharing (SRB). This guide will point out where interfaces to data and transfer commands vary and will provide links to more detailed explanations at individual sites. Please call or send e-mail to the Help Desk for individualized assistance. Archival StorageA wide range of data resources are available for researchers; computation allocations have access to limited archival storage. However, for larger collections, long-term preservation of data, and staging of data collections, independent data resource collections are available. See the Data Resources page for details and links to allocations information. Storage Resource Broker (SRB)The Storage Resource Broker, a data management tool, may be used for storage, replication, archiving, third-party copying and movement of large TeraGrid data sets across distributed, heterogeneous storage systems. It uses its own set of commands. SRB is available to users through either a compute allocation, through which users are automatically entitled to an SRB account, or through an independent data allocation for storing data and data collections on tape or disk that does not require a compute allocation. Any TeraGrid user can use an SRB client to download data that is available in public collections. SRB servers and clients on TeraGrid
Obtaining an SRB account (SDSC)To activate or request an SRB account, go to the SDSC SRB Account Activation page. SDSC users who have received a compute allocation automatically receive SRB accounts; you must activate your account before using it. To see instructions specific to computing allocations, go to the SDSC SRB Account Activation page and select "I have a compute allocation, and I will be using gsi-enabled authentication (certificate proxies)." Then click on "View Instructions". Users with only data/SRB allocations should select "I have an SRB data allocation." Using our SRB account
High Performance Storage System (HPSS)HPSS is available at SDSC and IU. Each site has an HPSS user guide with additional information. SDSC golem - PSCGolem is a SGI Origin 300 that runs a combination disk and tape archival system. Initially, files moved to golem reside on disk. Factors such as file size and time of last access determine when a file gets migrated to tape. When you access a migrated file, it is automatically read in from tape. Golem supports GridFTP transfers using the address tg-gridftp.psc.teragrid.org. For more information, including common commands used for file transfer, visit the golem user guide. DiskXtender Mass Storage System (MSS) - NCSATeraGrid users may also use the DiskXtender mass storage system, housed at NCSA, for permanent storage of large file sets. A proxy certificate is required for access. Globus-url-copy (see Grid-FTP page or globus-url-copy at NCSA), and UberFTP transfer methods are supported. Note that gsiscp is not supported. Data Migration Facility - TACCTo provide long-term, reliable data storage, TACC operates a four processor SGI Origin 2000 with four gigabytes of fast, dynamic RAM and 1.3 terabytes of high performance, high availability fiber channel RAID-3 disks. This archive system is configured for dedicated file service using SGI's Data Migration Facility (DMF) to migrate files to a tape archival system. The disk farm on the Origin 2000 acts as a cache for recently accessed files. Files are permanently stored in two StorageTek PowderHorn 9310 automated cartridge systems. TACC's data archive is exported as a filesystem named /archive mounted on each supercomputer at the TACC via Network File System (NFS). Fast access from the high performance computing (HPC) machines at TACC is provided via the 800 megabit per second HiPPI (High Performance Parallel Interface) local area network. See the TACC archival user guide for more information. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
|
The TeraGrid project is funded by the National Science Foundation
and includes 11 partners: Please email help@teragrid.org with questions or comments. This site is XHTML 1.0 Transitional, CSS compliant. |
||
![]() |
![]() |