Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

When I checkpoint LAM MPI jobs, all the checkpoints wind up in my $HOME directory. Can I use a different directory?

0
Posted

When I checkpoint LAM MPI jobs, all the checkpoints wind up in my $HOME directory. Can I use a different directory?

0

By default, LAM/MPI will use $HOME as the location for storing the checkpoint files for all the processes involved in an MPI job, unless it was configured at build time with ‘configure –with-cr-file-dir=/somewhere/else’, in which case ‘/somewhere/else’ will be the default location. So rebuilding LAM is one (rather slow and painful) way to change where checkpoints are stored. A much easier solution is to set the LAM ‘cr_base_dir’ SSI parameter for each individual job that you wish to have use a different directory for storing checkpoints. This can either be done by setting the ‘$LAM_MPI_SSI_cr_base_dir’ environment variable to the full path of the directory you want to use, or by setting the ‘cr_base_dir’ parameter on the command line: $ mpirun -np 2 -ssi cr_base_dir /somewhere/else a.out See the LAM Documentation for more details, especially the “Available MPI Modules | Checkpoint/Restart of MPI Jobs” section in the User’s Guide.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123