UEDGE Performance
Instrumentatation
To instrument UEDGE using TAU, we found it easiest to wrap the compiler scripts:
For example, gcc becomes a shell script:
#!/bin/bash # unwrap export PATH=`echo "$PATH" | sed -e "s@\`dirname $0\`@@g"` tau_cc.sh "$@"
g++ becomes:
#!/bin/bash # unwrap export PATH=`echo "$PATH" | sed -e "s@\`dirname $0\`@@g"` tau_cxx.sh "$@"
and gfortran:
#!/bin/bash # unwrap export PATH=`echo "$PATH" | sed -e "s@\`dirname $0\`@@g"` tau_f90.sh "$@"
After creating these files, simply change your make invocation to:
PATH=<path/to/wrappers>:$PATH make install
UEDGE binaries will now be built with TAU.
Running with python wrapper
Whether UEDGE is built with or without TAU, you can still run it with the python profiling interface connected to TAU. Use the python wrapper:
#!/bin/bash
LOC=`dirname $0`
# unwrap
export PATH=`echo "$PATH" | sed -e "s@\`dirname $0\`@@g"`
if [ "x$TAU_OUTPUT" = "x" ]; then
TAU_OUTPUT=$HOME/datarepo/uedge
fi
if [ "$1" = "-c" ] ; then
python "$@"
else
ARG=`echo $1 | sed -e s/.py//g -e s/-/_/g`
cp $1 $ARG.py
cat $LOC/taurun.manual.py.skel | sed -e "s/@PROGRAM@/$ARG/" > taurun.py
export PROFILEDIR=$TAU_OUTPUT/$ARG
mkdir -p $PROFILEDIR
python taurun.py
fi
You can choose between taurun.manual.py.skel and taurun.auto.py.skel. The 'auto' script will instrument all python functions. However, we found that nearly all time is spent in non-python routines, so we found this unnecessary.
taurun.manual.py.skel:
#!/usr/bin/env python
import tau
import sys
import pytau
x = pytau.profileTimer("python")
print "taurunner(manual) begin"
pytau.start(x)
import @PROGRAM@
pytau.stop(x)
print "taurunner(manual) end"
taurun.auto.py.skel:
#!/usr/bin/env python
import tau
import sys
def main():
print "taurunner begin"
import @PROGRAM@
print "taurunner end"
tau.run('main()')
Results
Running both the uninstrumented UEDGE and instrumented UEDGE for all of the make check test cases results in the following runtimes:
For the instrumented version, we can see the time split up among various native routines:
Now, individual profiles for some of the more time consuming test-cases:
rdftest3:
rdftest4_1:
rdftest5_1:
rdftest7_3:






