How can I easily preserve drained node information between major SLURM updates?
Major SLURM updates generally have changes in the state save files and communication protocols, so a cold-start (without state) is generally required. If you have nodes in a DRAIN state and want to preserve that information, you can easily build a script to preserve that information using the sinfo command. The following command line will report the Reason field for every node in a DRAIN state and write the output in a form that can be executed later to restore state. sinfo -t drain -h -o “scontrol update nodename=’%N’ state=drain reason=’%E'” 31. Why doesn’t the HealthCheckProgram execute on DOWN nodes? Hierarchical communications are used for sending this message. If there are DOWN nodes in the communications hierarchy, messages will need to be re-routed. This limits SLURM’s ability to tightly synchronize the execution of the HealthCheckProgram across the cluster, which could adversely impact performance of parallel applications. The use of CRON or node startup scripts may be better
Related Questions
- When using the Cursor Position slider, how can I more easily open the browser to the highlighted web page node online (without using the keyboard)?
- I need to make frequent updates to the content on my web site. Does Web Site Creator allow me to easily update my content in real time?
- How can I easily preserve drained node information between major SLURM updates?