SCEC CARC Migration

From SCECpedia
Jump to navigationJump to search

SCEC has computers and disks located at the USC Center for High Performance Computing under an arrangement called condoing. USC HPC is changing to the Center for Advanced Research Computing (CARC) and the way condo computers and storage are handled will be changing. This page outlines our plans for managing this transition.

CARC Data Migration Documentation

Globus connect endpoints on HPC and CARC disks

  • "uschpc#hpc-transfer.usc.edu" endpoint
  • "USC CARC User Directories" endpoint

Storage Summary on /home/scec-06

  • 21T johnyu
  • 3.6T kmilner
  • 538G maechlin
  • 5.3T scottcal
  • 32T wsavran

Summary of Migration Plan

  • CARC will continue to run SCEC condo compute nodes, but not through the main CARC discovery.usc.edu system. To use SCEC computer nodes, users will log into a system called endeavor.usc.edu and then they can submit to our SCEC condo computer nodes.
  • SCEC has two disks systems at HPC, that I’ll call old (287TB) and new (86TB). We should plan to use the new storage for active projects including broadband, opensha, and cybershake. We should clean-up the new disk, and remove any files not used for these active project by transferring them off the disk, or deleting them. Then, this new 86TB will be mounted on our condo computers and on our SCEC servers (broadband, opensha, moment, shock).
  • Any files on the old storage, should migrated to the CARC /projects file system, starting in the next two weeks. We will get charged for storage that we use on CARC at a rate of $40 per TB per year. We may get a period of free storage, but based on current amounts, once the charges start, this will cost SCEC $10K+ per year in storage fees. Before we start to pay those fees, we have the option of removing files to reduce the fees.

Current Migration Details

On 9 October 2020, we discussed 5 main topics with CARC: 1) SCEC storage at HPC, 2) SCEC computer nodes at HPC, 3) shared disk systems, 4) SCEC allocations at HPC, and 5) CARC cloud computing.

SCEC (condo) disk storage:

SCEC has two disks arrays at HPC, a DDN array (287 TB/258 TB used – out of maintenance) and a Melanox array (87TB/64TB used – maintenance through 6/2022).

They want us to move the DDN data to discovery.usc.edu, onto what they call /project storage. They plan to charge us $4/TB/year. We think we may get the first year of storage at no cost. But assuming we move everything from the DDN to discovery, this data storage will cost (258 * 40) = $10320/year.

We can migrate the Melanox data to discovery, and they won’t charge us storage fee’s until our disk system goes out of maintenance.

They would like us to start migrating data from our DDN to /project file system on discovery.usc.edu (8PB of storage) in the next two weeks.

SCEC (condo) computer nodes:

SCEC has 38 computer nodes at HPC. We can currently submit jobs to these nodes by logging into HPC and submitting our jobs to a special “scec” queue.

CARC plans to make these nodes available through a new login called endeavor.usc.edu. We will be able to log into endeavor, and submit jobs to our SCEC computer nodes. The main discovery.usc.edu nodes will not be accessible from endeavor.usc.edu

Shared File systems: SCEC currently mounts some of our DDN storage on local SCEC computers including broadband.usc.edu, shock.usc.edu, and opensha.usc.edu. CARC will not allow us to mount their /project file system onto these local SCEC servers.

However, CARC will allow us to attach our condo disks arrays under maintenance to endeavor, and they will allow us to mount our storage on endeavor on our SCEC servers.

We discussed the following as a possible way to support the storage needs of CyberShake, OpenSHA, and broadband.

We will move all data off SCEC condo disks that is unrelated to active broadband, opensha, and cybershake to the CARC /projects. Once files are moved to CARC /project disks, we will delete those files from our SCEC condo disks to free up space.

Then we attached the SCEC condo storage to our condo compute nodes on endeavor. Then we mount SCEC our condo storage on endeavor onto our SCEC servers OpenSHA, broadband, and shock. This should let SCEC servers (broadband, opensha, shock) access to SCEC condo storage.

In this plan, routine users will run jobs on discovery.usc.edu, and write data to the /project file system. We can mount /project file systems on endeavor, so data produced on discovery.usc.edu will be accessible on our SCEC compute nodes.

SCEC CARC Allocations:

Previously, we wrote a USC HPC allocation request each summer and submitted it to HPC. It looks like the CARC allocation process to get time on CARC nodes is on an “as needed” basis. They give us a small allocation to begin. When we need more computing hours, we submit a request for more hours, with a fairly short “justification” explanation of how many hours we need and why. At this point, these resource requests can be submitted at any time.

CARC Cloud Resources:

CARC described that they expect to offer inexpensive cloud computing resources. There will be some cost, but they should cost less than AWS or other commercial services. A big advantage is that these resources will be at CARC, have good network links, and will be able to access CARC /project file systems. We expect more details on the types and costs of these cloud resources. But CARC cloud resources might be a way to upgrade our older local SCEC servers including shock, moment, strike, broadband, CSEP, and opensha. Before we buy, or repair existing SCEC servers, we should evaluate the cost of these CARC cloud resources.

Related Issues

  • Local SCEC Servers
    • shock - CyberShake job submission host
    • moment - CyberShake db manager
    • strike - CyberShake job status and SCECpedia server
    • broadband - broadband development system
    • CSEP Cluster - csep dedicated cluster
    • opensha - opensha services including hazard mapping servlets
  • Local SCEC Storage
    • scec01
  • URL accessible SCEC files
    • hypocenter url accessible project storage

Related Entries