How is job suspend/resume useful?
Job suspend/resume is most useful to get particularly large jobs initiated in a timely fashion with minimal overhead. Say you want to get a full-system job initiated. Normally you would need to either cancel all running jobs or wait for them to terminate. Canceling jobs results in the loss of their work to that point from either their beginning or last checkpoint. Waiting for the jobs to terminate can take hours, depending upon your system configuration. A more attractive alternative is to suspend the running jobs, run the full-system job, then resume the suspended jobs. This can easily be accomplished by configuring a special queue for full-system jobs and using a script to control the process. The script would stop the other partitions, suspend running jobs in those partitions, and start the full-system partition. The process can be reversed when desired. One can effectively gang schedule (time-slice) multiple jobs using this mechanism, although the algorithms to do so can get quite