oprofile

Highly recommended. Click on the above link to get Jonas Larsen's presentation.

Main drawbacks:
  • Linux only
  • You must be able to load a module into the kernel, i.e. have some sort of root access.

gprof

Really, oprofile will save you a lot of time and disk space. If necessary, ask your sysadmin to let you sudo it.

Preparation

Unless you want to profile casapy itself, you will need a standalone app. e.g. to profile split I use casasplit instead of casapy, because the output of gprof casapy is not useful.

Make a profilable build

Note: this will still only profile the app! Unless you know how to produce a profilable .so, you probably need to statically link the app. Or go try oprofile.

<pre>cd $CASAROOT/code
mkdir build_profile  # or whatever you want to call it.
cd build_profile
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j 1 VERBOSE=on              # Replace 1 with how many processors you want to use.
</pre>

The main reason for running make with VERBOSE=on is so you can check that compiler is using -pg. I have only tried this using g++ in Linux.

Run the resulting executable

Remember, from Preparation above, that you probably don't want to use casapy - use a standalone app that exercises the section of code you are interested in, i.e. casaplotms, casasplit, a fresh app for the occasion...

The run will store the profiling information in gmon.out in the current directory.

Achieve enlightenment

gprof /absolute/path/to/the/executable

I/O profiling

All of the above, and most parallelization, is fairly pointless if the program is limited by I/O instead of the CPU, so it is worth checking which resource is the bottleneck. Some applications may even be network limited.

Interactively

Run the program while simultaneously watching a disk monitor, a CPU monitor, and ideally a memory monitor. I like gkrellm because it does all 3 and more. This will give you a picture of what is happening as a function of time, not code. free is also useful to watch the swap space.

strace

strace shows what function calls are using system resources. In practice I found the strace log was far larger than my program's output, and I had to stop it before my disk filled up.

(man strace and/or search the Web for more info.)

-- RobReid - 2010-04-07
Topic revision: r5 - 2010-08-23, JonasLarsen
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding NRAO Public Wiki? Send feedback