Nubeam tracing diagrams from Franklin

The following are charts generated from a 4 processor run on Franklin using TAU.

Trace overview

First, following is a view of the entire 41 second trace on 4 processors running on Franklin. Some routines have been given their own color as noted in the legend on the right. The others are grouped into "OTHER" and are colored black. We can see that at both the beginning and the end, nodes 1-3 are waiting for node 0.

The red is MPI_Allreduce and purple is MPI_Reduce. This is using the version of NUBEAM after Kumar's change that changed some MPI_Allreduce's to MPI_Reduce.

In the beginning of the trace, there are several MPI_Allreduce calls, a few of which are fairly synchronized and don't cause any delay and others that are not. Zoomed in, I can see three calls that cause significant delay in nodes 1-3. For example, Node 0 is spending 0.92 seconds in PS_NAMEL_STRIP while the other nodes are waiting in MPI_Allreduce().

http://www.nic.uoregon.edu/~amorris/facets/nubeam/nubeam-trace-overview.png

Desynchronization

We can see in these zoomed in views that the time spent in MPI_Allreduce() on nodes 1-3 is due to the delay in node 0 due to the EZCDF I/O routines:

http://www.nic.uoregon.edu/~amorris/facets/nubeam/nubeam-trace-13-17-detail-noted.png