What are “profiling” and “tracing”?
These terms are sometimes used to refer to two different kinds of performance analysis. In profiling, one aggregates statistics at run time — e.g., total amount of time spent in MPI, total number of messages or bytes sent, etc. Data volumes are small. In tracing, an event history is collected. It is common to display such event history on a timeline display. Tracing data can provide much interesting detail, but data volumes are large. 3. How do I sort out busy wait time from idle wait, user time from system time, and so on? Don’t. MPI synchronization delays, which are key performance inhibitors you will probably want to study, can show up as user or system time, all depending on the MPI implementation, the type of wait, what run-time settings you have chosen, etc. In many cases, it makes most sense for you just to distinguish between time spent inside MPI from time spent outside MPI. Elapsed wallclock time will probably be your key metric. Exactly how the MPI implementation spends tim