How do I tune small messages in Open MPI v1.1 and later versions?
Starting with Open MPI version 1.1, “short” MPI messages are sent, by default, via RDMA to a limited set of peers (for versions prior to v1.2, only when the shared receive queue is not used). This provides the lowest possible latency between MPI processes. However, this behavior is not enabled between all process peer pairs because it can quickly consume large amounts of resources on nodes (specifically: memory must be individually pre-allocated for each process peer to perform small message RDMA; for large MPI jobs, this can quickly cause individual nodes to run out of memory). Outside the limited set of peers, send/receive semantics are used (meaning that they will generally incur a greater latency, but not consume as many system resources). This behavior is tunable via several MCA parameters: • btl_openib_use_eager_rdma (default value: 1): These both default to 1, meaning that the small message behavior described above (RDMA to a limited set of peers, send/receive to everyone else)
Related Questions
- I would like to fly something, or do something, that TrueFlight can not handle at the moment. Will this and that feature be introduced in later versions?
- How can agents support messages for other (potentially not yet created) versions of SIF?
- How is MSX Snatcher different from later versions like Sega CDs?