What is special about MPI performance analysis?
The synchronization among the MPI processes can be a key performance concern. For example, if a serial program spends a lot of time in function foo(), you should optimize foo(). In contrast, if an MPI process spends a lot of time in MPI_Recv(), not only is the optimization target probably not MPI_Recv(), but you should in fact probably be looking at some other process altogether. You should ask, “What is happening on other processes when this process has the long wait?” Another issue is that a parallel program (in the case of MPI, a multi-process program) can generate much more performance data than a serial program due to the greater number of execution threads. Managing that data volume can be a challenge. 2. What are “profiling” and “tracing”? These terms are sometimes used to refer to two different kinds of performance analysis. In profiling, one aggregates statistics at run time — e.g., total amount of time spent in MPI, total number of messages or bytes sent, etc. Data volumes a