[Back to GRAPHICS SWAG index]  [Back to Main SWAG index]  [Original]

{
> It seems that the location of the routine in the program also affects
> its performance.  The one closest to the main procedure always seems to
> win. This probably has to do something with caching...

No, I have made many tests ( 3 hours spending on it ) and my results are :
First : The Procedure relative to the main programm has NO affect one the
preformance ( atleast with small programms ).
My System look like this :
        486 DX 50 with 256 kb Cache
        ISA ET4000 W32 VGA-Card
        16 MB RAM

> These numbers are too small to compare, try setting N to 50, and
> swap the position of the routines sometimes. You'll get many different
> results.

Sorry, but when I was testing the routines I had set N to 1000 !!! That
should be big enough.. Well since Jannie Hanekom wrote me a message, I made
again some test and these are my results :

 ( numbers have no meaning, higher is better ,  N = 1000 )
 pix1 :   59,68 M  ( Mem in Pascal )
 pix2 :  158,00 M  ( with lookup table, old proc )
 pix3 :  170,35 M  ( without lookup table )
 pix4 :  186,37 M  ( with lookup table, NEW proc )
 pix5 :  153,31 M  ( with lookup table, optimized by Jannie Hankom
                     trying to reduce penalty cycles, while Memory access )
 pix6 :  213,57 M  ( with lookup table, from Jannie Hanekom,
                     optimized by me )
     { M = Million }


{$A+,B-,D+,E+,F-,G+,I+,L+,N-,O-,P-,Q+,R+,S+,T-,V+,X+,Y+}
{$M 16384,0,655360}

const N = 1000;

var lut : array[0..199] of word;

procedure call(x,y:word; c:byte); begin end;

procedure pix1(x,y:word; c:byte); begin mem[$A000:x+y*320] := c end;

procedure pix2(x,y:word; c:byte); assembler;
asm
 mov ax,0A000h
 mov es,ax
 mov bx,y
 add bx,bx
 mov si,x
 mov bx,word ptr lut[bx]
 mov al,c
 mov es:[bx+si],al
end;

procedure pix3(x,y:word; c:byte); assembler;
asm
 mov ax,0A000h
 mov es,ax
 mov ah,byte ptr y
 mov bx,x
 add bx,ax
 shr ax,2
 add bx,ax
 mov al,c
 mov es:[bx],al
end;

procedure pix4(x,y:word; c:byte); assembler;
{ code from  Andreas Jung  }
asm
 mov ax,0A000h
 mov es,ax
 mov bx,y
 add bx,bx
 mov si,x
 mov bx,word ptr lut[bx]
 mov al,c
 add bx,si
 mov es:[bx],al
end;


Procedure Pix5(X, Y : Word;  C : Byte);  Assembler;
{ code from  Jannie Hanekom  }
Asm
  mov  bx, Y
  add  bx, bx
  mov  es, SegA000
  mov  bx, word ptr lut[bx]  { Note:  BX not changed within 2 cycles }
  add  bx, X
  mov  al, C
  mov  byte ptr es:[bx], al  { Again 1 cycle before memory move }
End;


Procedure Pix6(X, Y : Word;  C : Byte);  Assembler;
{ code from  Jannie Hanekom  }
{ optimized by  Andreas Jung }
Asm
  mov  bx, Y
  add  bx, bx
  mov  ax, 0A000h
  mov  es, ax
  mov  bx, word ptr lut[bx]  { Note:  BX not changed within 2 cycles }
  mov  cx, x
  add  bx, cx
  mov  al, C
  mov  byte ptr es:[bx], al  { Again 1 cycle before memory move }
End;

var time:longint absolute $0:$46c; t,c,p1,p2,p3,p4:longint; i:word;

begin
 write('Filling Look-Up Table');
 for i := 0 to 199 do lut[i] := i*320;

 write(#13#10'Timing Procedure Call');
 randseed := 0; c := 0; t := time; while t = time do; inc(t,N);
 repeat call(random(320),random(200),random(256)); inc(c)
 until time = t;

 asm mov ax,13h; int 10h end;

 randseed := 0; p1 := 0; t := time; while t = time do; inc(t,N);
 repeat pix1(random(320),random(200),random(256)); inc(p1)
 until time = t;


 asm mov ax,03h; int 10h end;

 writeln('1 : ',1/(1/p1-1/c):0:0);

end.


Jannie Hanekom said correctly, if a register is changed one cycle befor a
memory move is made with this register, the processor will make one penalty
cycle to calculate the adress. So you must always try to do such a thing :

  add  bx, cx                   { calc offset to pixel in Mem }
  mov  al, C                    { do some thing else, so the processor can
                                  calc the offset in Mem }
  mov  byte ptr es:[bx], al     { Now you wont get a penalty cycle, because
                                  you have changed bx 2 cycles befor the
                                  Mem move !! }

If you would use this code, you WOULD get a penalty cycle :
  mov  al, C
  add  bx, cx
  mov  byte ptr es:[bx], al
Because you have changed the bx one cycle befor the mem move..

It would be very interessting to know which results you get on your
computer, because pix6 is now the FASTEST routine to put a pixel in
320x200x256 !! Try these thing with N = 1000, so you wont get much
diffrents between two tests..

BTW, if you have found any faster routines, let me know !!!! I'm realy
interessted in this !!

Greetings,
            Andreas.


[Back to GRAPHICS SWAG index]  [Back to Main SWAG index]  [Original]