Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why is my MPI job failing due to the locked memory (memlock) limit being too low?

April 26, 2017failing Job limit locked low memory MPI

0

0 Posted

Why is my MPI job failing due to the locked memory (memlock) limit being too low?

1 Answer

0

0 Posted

By default, SLURM propagates all of your resource limits at the time of job submission to the spawned tasks. This can be disabled by specifically excluding the propagation of specific limits in the slurm.conf file. For example PropagateResourceLimitsExcept=MEMLOCK might be used to prevent the propagation of a user’s locked memory limit from a login node to a dedicated node used for his parallel job. If the user’s resource limit is not propagated, the limit in effect for the slurmd daemon will be used for the spawned job. A simple way to control this is to insure that user root has a sufficiently large resource limit and insuring that slurmd takes full advantage of this limit. For example, you can set user root’s locked memory limit ulimit to be unlimited on the compute nodes (see “man limits.conf”) and insuring that slurmd takes full advantage of this limit (e.g. by adding something like “ulimit -l unlimited” to the /etc/init.d/slurm script used to initiate slurmd). Related information