Colour Fill

These notes show how given a pointer to a memory buffer, for example the screen,  we can clear it to a colour. I am assuming the buffer is in a 32 bit colour format. If you have a different format then these notes can easily be adjusted.

Clearing to White, Black or Grey

The memset function is a very fast function for setting all the bytes in a buffer to the same value. This allows us to clear a buffer to white (255) or black (0) or any grey scale in between. Unfortunately this will not allow is to set an arbitrary colour as we can only set one byte where as our colour is 4 bytes (Red, Green, Blue, Unused / Alpha).

If the pitch of the buffer is the same as the number of pixels wide * bytes per pixel we can set the whole buffer in one call

memset(buffer,0,bufferWidth*bufferHeight*bytesPerPixel);

Where buffer is a pointer to the area to be filled (e.g. the screen), for a 32 bit format bytesPerPixel is 4.

If the pitch of the buffer is not the same as the width we will need to do this operation in a loop

BYTE *temp=buffer;
for (int y=0;y<bufferHeight;y++)
{
    memset(temp,0,bufferWidth*bytesPerPixel);
    temp+=bufferPitch;
}

Where BYTE is a typedef:

typedef unsigned char BYTE;

Here a temp pointer is used to keep track of the memory position for setting a row of pixels. Each time we go down a row we must move the pointer forward by pitch bytes. Note that we only set bufferWidth*bytesPerPixel bytes as that is the size of one row.

Clearing to an arbitrary colour

To clear our buffer to any colour we cannot simply set all the bytes but must create a colour representation using red, green and blue values. For this we can use the memcpy function that allows a number of bytes to be copied from one memory location to another. We can use this to copy one pixel worth of memory (4 bytes when using a 32 bit format) into the buffer.

The first step is to pack our colour channels into one 32 bit variable. It is useful to create a DWORD type which is 32 bits:

typedef unsigned long DWORD;

If we are using a 32 bit colour format in the order Alpha, Red, Green, Blue we need to shift these byte values into the DWORD using bitwise operators. Note that I am using alpha here, however for a screen alpha does not exist it is purely a padded value. Alpha can be used for textures however.

// Pack colour into a 32 bit variable, 4 bytes or 32 bits laid out like: 0xAARRGGBB
DWORD c = (a << 24 | r << 16 | g << 8 | b );

Now that we have the colour packed into a 32 bit variable we can loop through every pixel in our buffer and use memcpy to copy the value into the buffer memory.

DWORD *pnter=(DWORD*)buffer;
for (int y=0;y<bufferHeight;++y)
{
    for (int x=0;x<bufferWidth;++x)
    {
        memcpy(pnter,&c,4);
        ++pnter;
    }
}

This assumes that the buffer pitch is the same as bufferWidth*bytesPerPixel (4) if it is not then at the end of each row the memory pointer will need to be offset by the difference (pitch - bufferWidth*bytesPerPixel).

Optimisations

We can get a slight speed up on the above by using just one loop for the whole memory area and hence reducing instructions:

// pack the colour pattern into a 32 bit
DWORD c= (a << 24 | r << 16 | g << 8 | b );

DWORD *pnter=(DWORD*)screen;
for (int i=0;i<w*h*4;i+=4)
{
  memcpy(pnter,&c,4);
  ++pnter;
}

It may be tempting to use the Set Pixel function called in a loop to set all the pixels in the buffer however this will be much slower than doing the loop above as each time an offset will need calculating etc,

When carrying out any operations on buffer data it is important that you process the data linearly (e.g. row by row rather than column by column). This is because in memory the data is laid out linearly as one row after another and the various cache in your platform will be best utilised if you traverse the data sequentially. If we were to loop on x then y we would be jumping about in the memory buffer and taking little benefit from caching.

There are methods of increasing the speed of the colour fill operation. One way would be to just fill in half the buffer and then copy the first half of the buffer over the second half. The optimisation here is that we do a larger memcpy operations. Since memcpy is guaranteed to be very fast the more data we can copy using it at a time the better. However be aware of causing cache misses. At the end of the day you need to measure the speed and find the best solution for your platform / game.

Further Reading

See the notes on Colour Formats and Pitch / Stride



© 2004-2016 Keith Ditchburn