Blitting

Low level 2D graphics involves the copying of blocks of memory. For example to display a texture we must copy the texture block of memory into the correct part of the screen block of memory. In addition during the copy we may want to take into account an alpha value (for transparency). In graphics this process is called a Blit (Bit Block Transfer).

Assumptions / Requirements

In the following notes I refer to source and destination memory blocks. If we are copying a texture to the screen then the source is the texture and the destination is the screen. I prefer to use the idea of source and destination as this does not limit what can be achieved e.g. you may want to copy a texture to another texture in order to carry out some graphic effect.

In order to copy from one block of memory to another we are going to need pointers to each block and the size of data to copy. If we are copying between two different sized pieces of memory we need to know the pitch (see the Pitch / Stride notes) of each block. In the following notes I assume that pitch is the same as the width in pixels multiplied by the bytes per pixel. If you are using an API like HAPI this will always be the case however if you are using something like Direct3D you will need to bear in mind that pitch does not equal width in pixels times bytes per pixel. In these notes I still use the term pitch as it helps clarify the code.

The diagrams show the source as a blue block and the destination as a green block. Where an area is shown it is shown in red.

Same sized copy

blit1

If the source and destination blocks of memory are the same size and there is no end of line padding (pitch equals width in pixels times bytes per pixel) we can do a very quick copy:

memcpy(destination, source, pitch * height);

Line by Line copy

blit202

If the source is a different size to the destination we will need to copy line by line. We can still use the fast memcpy but only on one line at a time. In order to do this we will need to maintain temporary pointers for both source and destination.

The whole of the source

In this example we copy the whole of the source (e.g. a texture) to the destination (e.g. the screen). So we will need the following data as input to our blit function:

// Pointer to the start of the source memory block e.g. a texture
BYTE *sourcePointer;
// Pointer to the start of the destination memory block e.g. the screen
BYTE *destinationPointer;
// The pitch of the source memory block
int sourcePitch;
// The pitch of the destination memory block
int destinationPitch;
// The width of the source in pixels;
int copyWidth;
// The height of the source in pixels
int copyHeight;
// Bytes per pixel - 4 for 32 bit buffers
int bytesPerPixel;

We will create some temporary pointers to advance through the source and destination memory blocks:

BYTE *sourceTemp=sourcePointer;
BYTE *destTemp=destinationPointer;

The next step is to loop through all the rows:

for (int y=0;y<copyHeight;y++)
{

During each loop we need to copy one line of data and then advance the pointers down to the next line.
To copy one line:

memcpy(destTemp,sourceTemp,copyWidth*bytesPerPixel);

To advance the pointers we need to use the pitch (the width of one row in bytes):

destTemp+=destinationPitch;
sourceTemp+=sourcePitch;

A part of the source

blit3

If we only want to copy a part of the source to the destination we are going to need to pass into the blit function the rectangular area of the source to be copied. It is easiest to create a class or structure representing a rectangle for this purpose. This rectangle class would have public member data for left, right, top and bottom rectangle values (using a coordinate system where top left is 0,0). The example below assumes an input source rectangle called sourceRect:

The first step is to set the temporary source pointer to the start of the rectangular area to be copied: We will need to work out an offset in terms of bytes. The offset is going to be pitch bytes for every row plus the amount of bytes across.

int sourceOffset = (sourceRect.top * sourcePitch) + (sourceRect.left * bytesPerPixel);

Note: the brackets above are unnecessary but are used for clarity.

When assigning our temporary pointer we now add the offset:

BYTE *sourceTemp=sourcePointer + sourceOffset;

This sets the pointer to the top left pixel memory of the area to be copied. The rest of the code is as before. We loop through all the rows (sourceRect.bottom - sourceRect.top) copying a line of data at a time e.g.

memcpy(destTemp,sourceTemp,(sourceRect.bottom - sourceRect.top)*bytesPerPixel);

After each row we advance the pointers in exactly the same way as before since the pitch of one row is always constant whatever the size of data being copied.

To a position on the destination

blit4

If we want to blit out source to an arbitrary position on the destination we must create an offset for the destination in the same way as we did for the source above.

int destOffset = (x * bytesPerPixel) + (y * destPitch);

Pixel by Pixel Copy with Transparency

blit5

The following notes assume a 32 bit pixel format with alpha, red, green and blue channels of 8 bits each (see Colour Formats). In addition the colour format is in the order ARGB.

Calculating destination and source offsets is carried out in exactly the same way as described in the above notes. This time however we need to loop through every pixel:

for (int y=0;y<sourceRect.right-sourceRect.left;y++)
{
for (int x=0;x<sourceRect.top-sourceRect.bottom;x++)

Note: always have the inner loop on x as it maximises the effects of caches within the platform hardware. If we traverse x first the cache will fetch the next pixels automatically and we will get maximum speed gains. If we did it on y we may cause a cache miss as we are jumping forward by a number of bytes each time.

In order to carry out a transparent copy we will need to read the source pixel colour and combine it with the destination pixel colour based on the value in the alpha channel.

Reading the colours

BYTE sourceBlue=sourcePnter[0];
BYTE sourceGreen=sourcePnter[1];
BYTE sourceRed=sourcePnter[2];
BYTE sourceAlpha=sourcePnter[3];

BYTE destBlue=destPnter[0];
BYTE destGreen=destPnter[1];
BYTE destRed=destPnter[2];

Our source and destination pointers are BYTE pointers so as we travel through the data we will advance them both by 4 bytes at a time (for a 32 bit format). We can use the array form to access the data as an array in C++ is simply a pointer and an offset. Note that the destination does not have an alpha channel. This is because I am assuming a blit to a screen buffer which does not use alpha but does have a byte pad.

Calculating the colour

When the source alpha channel is 0 the resultant colour should be the destination colour (fully transparent). When it is 255 the resultant colour should be the source colour (opaque). Values in between require some of the source and some of the destination. The easiest way of approaching this is to get the alpha value as a value from 0 to 1 and use that to modulate the destination and source colours. The formulae per channel is then:

destinationChannel = sourceChannel * mod + destinationChannel * (1.0-mod)

Example

float mod=sourceAlpha/255.0f;
destPnter[0]=(BYTE)( mod * sourceBlue + (1.0f-mod) * destBlue);
destPnter[1]=(BYTE)( mod * sourceGreen + (1.0f-mod) * destGreen);
destPnter[2]=(BYTE) (mod * sourceRed + (1.0f-mod) * destRed);

This code writes to the destination straight after calculating the result.

A quicker way

We want our blitting to be as fast as possible as we will be calling it many times in our render loop. The above code does a number of floating point operations which while fast on modern processors are still much slower than not using floating point. Since all our values are bytes we can simplify the above to:

destChannel = destChannel + ((sourceAlpha*(sourceChannel-destChannel))>>8);

Since the alpha varies from 0 to 255 by using it to multiply we have a 255 magnitude error, hence we use the binary shift operator >> to divide the result by 256. This gives a very slight inaccuracy but it is acceptable given the speed increases.

Advancing the pointers

Each time we loop on x we need to advance the source and destination pointers by the number of bytes in a pixel. In these examples (using a 32 bit colour format) this is always 4.

destPnter+=4;
sourcePnter+=4;

Once one row of pixels has been processed we need to move down to the next row. We have achieved this previously by adding the appropriate pitch to the pointers however this time the pointers are being moved each loop and so at the end of one row the source pointer will point to the end of the source row and the destination will point to the end of the row on the destination. If the source and destination are the same size as the area to be copied we could simply continue as the pointers would be in the correct place however this is uncommon so we will need to advance the pointers by enough bytes to get to the start of the next row (note: we could just recalculate the offset each time but this is slower than simply adding an offset).

Prior to the start of the loop we can calculate how far we will need to advance our pointers at the end of each row to get to the start of the next. For the destination in pixels this will be width of the destination minus the width of the rectangular area being copied.

int endOfLineDestOffset = (destinationWidth - (sourceRect.right - sourceRect.left)) * 4;

The result is multiplied by the number of bytes per pixel, in this case 4. A similar offset can be calculated for the source.

At the end of a row we then advance our pointers:

destPnter+=endOfLineDestinationOffset;
sourcePnter+=endOfLineSourceOffset;

Summary

Copying a whole block of memory in one go is the fastest copy we can do. Copying line by line is slower but still very fast. Copying pixel by pixel is the slowest method of them all but is necessary if we are going to handle transparency. It is wise to create functions for all three methods as in a game situation you will want to use the fastest method for the situation. e.g. if you were copying a background texture over the whole of the screen you do not need transparency and so can use one of the other methods.