Commercial profiling suites are expensive and require some practice before you can use them effectively. This month, I will show how to implement a simple and effective stopwatch class that automatically calculates and reports the execution time of functions, loops, and code blocks.
Designing for Automation and Simplicity
The constructor and destructor of an automatic object execute at a block's beginning and end, respectively. We take advantage of this feature. The stopwatch's constructor starts counting time and its destructors calculate and report the total execution time of a certain operation. Profilers offer time resolution of a millisecond or less. To achieve a similar resolution, we will use the clock() function (declared in <time.h>). clock() returns the processor's elapsed time since the program's outset in clock ticks. A clock tick is a platform-dependent unit of time. The macro CLK_TCK represents the number of clock ticks per second on your machine.
Our stopwatch class looks like this:
#include <time.h>
class stopwatch
{
public:
stopwatch() : start(clock()){} //start counting time
~stopwatch();
private:
clock_t start;
};
The constructor initializes the member start with the current tick count. We don't define other member functions except for the destructor. The destructor calls clock() again, computes the time elapsed since the object's construction, and displays the results:
#include <iostream>
using namespace std;
stopwatch::~stopwatch()
{
clock_t total = clock()-start; //get elapsed time
cout<<"total of ticks for this activity: "<<total<<endl;
cout< <"in seconds: "<< double(total/CLK_TCK) <<endl;
}
Note that clock_t and CLK_TCK are integers. Therefore, you have to cast them to double before a division.
To delay the output on the screen, you can add the following lines to the destructor:
char dummy;
cin >>dummy; //delay output on the screen
You can also write the results to a file to log performance changes in different profiling sessions.
Measuring Performance With the Stopwatch Class
To measure the duration of a code block, create a local instance of the stopwatch class at the block's beginning. For example, suppose you want to measure the duration of the following loop that allocates 5000 string objects on the heap:
string *pstr[5000]; //array of pointers
for (int i=0;i<5000;i++)
{
pstr[i] = new string;
}
Surround the relevant code in a pair of braces and create instantiate a stopwatch object at the block's beginning:
{
stopwatch watch; // start measuring time
string *pstr[5000];
for (int i=0;i<5000;i++)
{
pstr[i] = new string;
}
} // watch is destroyed here and reports the results
That's all! When the block begins, the watch starts counting time. When the block exits, the watch's destructor displays the results:
total of clock ticks for this activity: 27
in seconds: 0.027
The loop took 27 milliseconds on my machine. This result may seem impressive. However, what is the performance gain of replacing dynamic allocation with stack allocation? Let's try it and compare the results:
{
stopwatch watch;
for (int i=0;i<5000;i++)
{
string s;//create and destroy a local automatic string
}
}
This time, the results are as follows:
total of clock ticks for this activity: 14
in seconds: 0.014
In other words, we achieved a 50% speed increase by using stack memory instead of heap memory. Considering that our heap version didn't count the time needed for destroying the 5000 stringsas opposed to the stack versionthe results are even more impressive.
You probably noticed that our heap version also had 5000 assignment operations:
pstr[i] = new string;
The stack version didn't include an assignment expression. Could this skew the results? Again, let's try a slightly different form of the heap version:
{
stopwatch watch;
for (int i=0;i<5000;i++)
{
new string; // heap allocation without assignment
}
}
Normally, you wouldn't write such codeit leaks memory abundantly. However, it isolates the allocation operation from other confounding variables. This is common practice in performance tuning. Here are the results of heap allocation without assignment:
total of clock ticks for this activity: 27
in seconds: 0.027
The assignment doesn't affect performance at all.
Performance measurements are tricky. Often, our intuition as developers is misleadingoperations that we consider expensive incur no performance penalty at all, whereas seemingly innocuous operations such as dynamic memory allocation prove to be expensive in terms of CPU cycles. Without a reliable time measurement class such as stopwatch, we wouldn't have discovered these facts.