Why is a node shown in state DOWN when the node has registered for service?
The configuration parameter ReturnToService in slurm.conf controls how DOWN nodes are handled. Set its value to one in order for DOWN nodes to automatically be returned to service once the slurmd daemon registers with a valid node configuration. A value of zero is the default and results in a node staying DOWN until an administrator explicitly returns it to service using the command “scontrol update NodeName=whatever State=RESUME”. See “man slurm.conf” and “man scontrol” for more details.