Can this Delphi 6 bitmap modification code be sped up with SIMD or another approach?

I have a Delphi 6 application that modifies bitmaps in real time. Currently I am using the code shown below to do quickie brightness boost and contrast changes. If the operation were just an addition or just a multiplication, I could see how SIMD could be used, but since both an addition and a multiplication are involved, and since there is also the Trunc() operation to restrict it to the range of a Byte, I'm not sure if SIMD could be used here. Here are my questions:

  1. Can SIMD be used with this code and do you know of a good code sample I could work from? What kind of a speed boost could I expect?
  2. Would the (potential) padding of the scan lines be a problem?
  3. Any general optimization tips on speeding up the code?


// A fast version of this function would be to only allow range reductions // as a power of 2 and then use shl operations instead of divisions. procedure doBrightnessAndContrast(var clip: tbitmap; compressionRatio: double; shiftValue: Byte); var p0: PByte; x,y: Integer; begin for y := 0 to clip.Height-1 do begin p0 := clip.scanline[y]; // Can't just do the whole buffer as a big block of bytes since the // individual scan lines may be padded for CPU alignment. for x := 0 to clip.Width - 1 do begin // Red p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue); Inc(p0); // Green p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue); Inc(p0); // Green p0^ := IntToByte(Trunc(p0^ * compressionRatio) + shiftValue); Inc(p0); end; end; end;

Sure, SSE or MMX is possible.

In your case however you may get almost the same speed improvement if you precompute a 256 entry table using your equations.

Then replace all computations with a simple table lookup. My best bet is, that on modern processors this will give nearly the same speed as MMX/SSE.

