Go Above and Beyond Using the HPCC’s New Scavenger Queue
On May 11th, 2022, ICER will deploy a new 'scavenger' queue that will allow users to run preemptable jobs on idle cores. They will not be limited by the regular queues’ limits on running jobs and do not count against yearly usage totals, but they may be canceled at any time to allow general or buy-in jobs to run. The general and buy-in queues will continue to function as-is and no change is required for users who do not want to use this new queue.
With few exceptions, each researcher using the HPCC is limited to running up to 520 jobs or 1040 cores at one time. Annually, non-buyin users are limited to a total of 500,000 CPU hours and 10,000 GPU hours. These limits will not apply to jobs submitted to the scavenger queue. Jobs in this queue can start on resources that would otherwise be left idle and improve research throughput. Similar to the general-long queue, these jobs can request up to a 7-day wall time; however, jobs in the scavenger queue may be interrupted if resources are required for other non-scavenger jobs. The default behavior for interrupted jobs is to be re-queued, but users can opt for cancellation if it is more conducive to their workflow. This new queue will be available following the upcoming maintenance outage on May 11th. We recommend that only users who can checkpoint and restart or have a workflow implemented that can manage jobs being canceled or requeued use this new queue.
If you would like help understanding if your workflow is a good match for this system or if you have any questions, please contact us.
ICER System Administrator