Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Does BLCR support checkpointing parallel/distributed applications?

0
Posted

Does BLCR support checkpointing parallel/distributed applications?

0

Not by itself. But by using checkpoint callbacks (see previous FAQ). some MPI implementations have made themselves checkpointable by BLCR. You can checkpoint/restart an MPI application running across an entire cluster of machines with BLCR, without any application code modifications, if you use one of these MPI implementations (listed alphabetically): • LAM/MPI 7.x or later • MPICH-V 1.0.x • MVAPICH2 0.9.8 or later • Open MPI 1.3 or later See the documentation of your specific MPI for usage instructions. In almost all cases you will need to use a tool provided by the MPI implementation to request a checkpoint or restart, rather then using BLCR’s cr_checkpoint and cr_restart utilities. At this time we are aware of at least three other MPI implementations that are working on BLCR support, but surprisingly our information is not always the latest. If in doubt, check the support channels of your favorite MPI implementation Note that any questions about using these MPI implementations with

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123