Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How can I easily preserve drained node information between major SLURM updates?

0
Posted

How can I easily preserve drained node information between major SLURM updates?

0

Major SLURM updates generally have changes in the state save files and communication protocols, so a cold-start (without state) is generally required. If you have nodes in a DRAIN state and want to preserve that information, you can easily build a script to preserve that information using the sinfo command. The following command line will report the Reason field for every node in a DRAIN state and write the output in a form that can be executed later to restore state. sinfo -t drain -h -o “scontrol update nodename=’%N’ state=drain reason=’%E'” 31. Why doesn’t the HealthCheckProgram execute on DOWN nodes? Hierarchical communications are used for sending this message. If there are DOWN nodes in the communications hierarchy, messages will need to be re-routed. This limits SLURM’s ability to tightly synchronize the execution of the HealthCheckProgram across the cluster, which could adversely impact performance of parallel applications. The use of CRON or node startup scripts may be better

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123