Rosario 3D

My developer blog

Let's do optimization

A lot of programmer start with the idea of optimizing everything, they want to save the last clock cycle to make their program faster even if this cost some readability of the code. Making optimized code make you feel smart and sometime you can say… “WOW!! These are the smartest five line of code I ever wrote”


Don’t optimize, never, as Knuth says “Premature optimization is the root of all evil (in 97% of the cases).” Knuth don’t want us to make slow program, the problem is that probably we don’t know witch part of the program will be slow and we can spend days optimizing a code that is called  only once or that have to wait for a synchronization with another module, in other cases the compiler automatically optimize the code for us. If you start optimizing in the first stage of development most probably you will get a very bad design and unreadable code.

Let’s do a very simple example:

I have a matrix of fixed size…

template<unsigned int size>
class matrix{
   int *data_;
      data_ = new int[size*size];
   ~matrix() {
      delete []data_;
   inline int & operator() (const size_t i, const size_t j){
      return data_[j+i*size];

…and I want to copy is from one place to another. I can do that in a lot of way:

static const unsigned int rowSize = 512;
static const unsigned int buffSize = rowSize*rowSize;
matrix<rowSize> source, destination;
// mem copy
memcpy(destination.data_, source.data_, sizeof(int)*buffSize);
// standard copy algorithm
std::copy(source.data_, source.data_+buffSize, destination.data_);
// single loop
for(unsigned int i = 0; i < buffSize; i++)
   destination.data_[i] = source.data_[i];
// double loop
for(unsigned int i = 0; i < rowSize; i++)
   for(unsigned int j = 0; j < rowSize; j++)
      destination(j, i) = source(j, i);
// I can even try to optimize the double loop
for(register unsigned int i = 0; i < rowSize; i++)
   for(register unsigned int j = 0; j < rowSize; j++)
      destination.data_[j+(i<<9)] = source.data_[j+(i<<9)];

Witch one do you expect to be faster?

If you say memcpy you are right, this is a simple case where you have contiguous  memory using sys library is the best solution. Standard copy is the second with a 5% of overhead. The other three?

“No way, single loop must win, didn’t make any operation and only have only loop.” – The last three solution take the same time to run (a lot more of the library solution). The double loop don’t influence a lot the computation, rowSize is 512, so the outer loop is called only the 0.19% of the times, so no relevant difference between single and double loop. – “But the operation? Single loop already have the index, double loop have to computer the right index. Is expensive!!” – Well actually the CPUs are faster then memory access, making three or four operation don’t slow the computation cause synchronization with cache/memory. – “Ok, but the optimized version must win against the (int, int) operator it call a function.” – Here is where a good design lead to best performance. Matrix is a template, and the (int, int) operator is inlined by the compiler, also, cause the matrix already know the size the compiler can convert the multiplication into a shift operation, so no performance loss even here.

But check the difference between the code, the non optimized double loop is more readable, and is clear what we are doing (try to invert the index to implement a transpose copy to check is optimized double loop is clear), also we don’t have to access to the internal representation of the data (exposing data_ as public is a bad blunder).


So even in a trivial case as this we have seen that savage optimization can lead to bad design. So before optimize you code think twice, and use a profiler to check in witch part of the code you have to waste your precious time. A good class design often is enough.

BTW: The solution to this problem is to create a clone() function that use memcpy. 😛


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: