HPCC's Big Move

Dear Users,

I’m happy to say that the “Big Move” is well underway. The following has taken place in the last month.

  • The Intel 18 cluster is up and in production in the new Data Center
  • The Intel 14 cluster has been moved to the Data Center and is also back in production
  • By the time you read this, Intel16 will also have been moved to the Data Center and will be up and in production as well.

Besides moving the main equipment to the Data Center, other changes have also taken place:

  • The 7.4 version of CentOS has been placed on all machines replacing the older CentOS 6. This new image has a number of advantages for running our current system.
  • We have end-of-lifed Torque/Moab, our scheduler for a decade. We are now only using SLURM, an open-source scheduler for which we bought maintenance from SchedMD.
  • We have brought up a new scratch file system in the Data Center which should provide more reliable access for small files, something that was lacking on the older Lustre scratch system. It is mounted on /mnt/gs18 and is an IBM GPFS/SpectrumScale device with just under 1PB of storage.
  • A new module system was put in place that, in the long run, will make it easier to do your work. This has been the most difficult change for many of you, as well as for our staff, but we are working diligently to make it better for you, our valued users.

The following work still remains and will be done over the next two months or so:

  • A new home file system is installed in the Data Center and is being brought up. At present we have 4 Petabytes of storage, 3 of which are available for general use. It will take some time to migrate files stored on the old system to this new system. This work will be done incrementally, and you will be informed when you are moved. File systems have been one of our most persistent problems. This new IBM/GPFS/SpectrumScale system should go a long way towards resolving those issues.
  • The old Lustre scratch space, /mnt/ls15, will be moved to the Data Center in early December and will be unavailable during that time. (More detailed information of the timeline will be forthcoming). In the process we will be updating the software significantly to help with problems currently experienced. Lustre continues to be the best way to move large files quickly and so it will serve a role into the future.
  • The default for scratch (/mnt/scratch and $SCRATCH) will default to the new scratch system in late November as we get ready to move the old Lustre scratch. This might well require you to migrate some files from Lustre to the new scratch to avoid any downtime.
  • Remaining elements will also be moved (Firewalls, servers, switches, etc.) as needed.

This has been an enormous undertaking to say the least. The HPCC staff, the system administrators and research consultants, have worked very hard to get all this in place and, despite some hiccups, have done a tremendous job. I, and perhaps you, should thank them for their efforts.

 

iCER Associate Director, 
Bill Punch