Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How do I tune small messages in Open MPI v1.1 and later versions?

April 26, 2017later Messages MPI small tune versions

0

Posted

How do I tune small messages in Open MPI v1.1 and later versions?

1 Answer

0

Posted

Starting with Open MPI version 1.1, “short” MPI messages are sent, by default, via RDMA to a limited set of peers (for versions prior to v1.2, only when the shared receive queue is not used). This provides the lowest possible latency between MPI processes. However, this behavior is not enabled between all process peer pairs because it can quickly consume large amounts of resources on nodes (specifically: memory must be individually pre-allocated for each process peer to perform small message RDMA; for large MPI jobs, this can quickly cause individual nodes to run out of memory). Outside the limited set of peers, send/receive semantics are used (meaning that they will generally incur a greater latency, but not consume as many system resources). This behavior is tunable via several MCA parameters: • btl_openib_use_eager_rdma (default value: 1): These both default to 1, meaning that the small message behavior described above (RDMA to a limited set of peers, send/receive to everyone else)