Highly recommended. Click on the above link to get Jonas Larsen's presentation.
- Linux only
- You must be able to load a module into the kernel, i.e. have some sort of root access.
Really, oprofile will save you a lot of time and disk space. If necessary, ask your sysadmin to let you sudo it.
Unless you want to profile
itself, you will need a standalone app. e.g. to profile split I use
, because the output of
is not useful.
Make a profilable build
Note: this will still only profile the app! Unless you know how to produce a profilable .so, you probably need to statically link the app. Or go try oprofile
mkdir build_profile # or whatever you want to call it.
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j 1 VERBOSE=on # Replace 1 with how many processors you want to use.
The main reason for running
with VERBOSE=on is so you can check that compiler is using -pg. I have only tried this using g++ in Linux.
Run the resulting executable
Remember, from Preparation above, that you probably don't
want to use
- use a standalone app that exercises the section of code you are interested in, i.e.
, a fresh app for the occasion...
The run will store the profiling information in
in the current directory.
All of the above, and most parallelization, is fairly pointless if the program is limited by I/O instead of the CPU, so it is worth checking which resource is the bottleneck.
Some applications may even be network limited.
Run the program while simultaneously watching a disk monitor, a CPU monitor, and ideally a memory monitor. I like gkrellm because it does all 3 and more. This will give you a picture of what is happening as a function of time, not code. free is also useful to watch the swap space.
strace shows what function calls are using system resources. In practice I found the strace log was far larger than my program's output, and I had to stop it before my disk filled up.
(man strace and/or search the Web for more info.)