[Back to MEMORY SWAG index] [Back to Main SWAG index] [Original]
program move32_test;
USES Crt;
const count=1000;
var block1,block2:pointer;
i,time:longint;
timer:longint absolute $40:$6C;
size:word;
procedure Move32(var source,dest;count:word);assembler;
asm
PUSH DS
LDS SI,source
LES DI,dest
MOV CX,count
SHR CX,1
JNC @@1
MOVSB
@@1: SHR CX,1
JNC @@2
MOVSW
@@2: DB 66h
REP MOVSW
POP DS
end;
{ --- Mainprog --- }
begin
clrScr;
getmem(block1,65010);
getmem(block2,65010);
for size:=65000 to 65003 do
begin
writeln('Timing blocks of ',size,' bytes :');
writeln(' Timing Move ...');
time:=timer;
for i:=1 to count do
move(block1^,block2^,size);
writeln(' Time for ',count,' Move''s : ',(timer-time)/18.2:8:1,' s');
writeln(' Timing Move32 ...');
time:=timer;
for i:=1 to count do
move32(block1^,block2^,size);
writeln(' Time for ',count,' Move32''s : ',(timer-time)/18.2:8:1,' s');
end;
end.
{ -------------------------------------------------------- }
If you can't find anything wrong in it, test it !
Here are the results on a 486DX4-100 :
Timing blocks of 65000 bytes :
Timing Move ...
Time for 1000 Move's : 11.0 s
Timing Move32 ...
Time for 1000 Move32's : 3.6 s
Timing blocks of 65001 bytes :
Timing Move ...
Time for 1000 Move's : 11.0 s
Timing Move32 ...
Time for 1000 Move32's : 6.0 s
Timing blocks of 65002 bytes :
Timing Move ...
Time for 1000 Move's : 11.0 s
Timing Move32 ...
Time for 1000 Move32's : 6.0 s
Timing blocks of 65003 bytes :
Timing Move ...
Time for 1000 Move's : 11.0 s
Timing Move32 ...
Time for 1000 Move32's : 6.0 s
3 times faster on a 4 byte boundary and still almost twice as fast on other
addresses ! I think that's a nice score...
EH> For REP MOVSD to work faster the values to be moved have to be on "32
bit" EH> addresses, that is: both SI and DI have to be a multiple of 4.
EH> You didn't test for that and with the extra MOVSB and MOVSW it might well
EH> be they are on a multiple of 4 + 1 or 3 (as the aligment of TP normally
is EH> on EVEN addresses).
You're right about that, maybe I'll work on it... some day. ;-)
EH> Apart from that you didn't test for overlap (does the move partially
EH> overwrite the bytes TO be moved, because then those bytes have to be
moved EH> first)
I hadn't tought about that. I'm not often moving overlapping blocks though.
Are you sure the TP Move checks for that ? (I mean, do you not only assume,
but have you tested it ? :-))
EH> and you didn't set a direction flag so it just MIGHT be you're
EH> moving the wrong bytes (mostly the direction flag IS upwards, but it just
EH> might be downwards, which means you're moving the bytes BELOW "ds:si" to
EH> "es:di").
The direction flag is assumed to be cleared in TP. Every procudere that
changes it, should clear it again. But it's not forbidden to do a CLD of
course...
EH> A complete Move32 has to be much more complicated than this (and much
EH> bigger, thus). Further Move is most often used to/from screen memory and
EH> unless you got a PCI screen card 32-bits moves are not possible to screen
EH> memory (the cpu will automatically do each 32-bit doubleword as 2 16-bits
EH> words, as the bus is only 16 bits).
PCI (and VLB) are becoming more common today, so I don't see the problem...
I tested this with mapping the 2'nd block to $A000 in mode 13h. And I've found
these _strange_ results with a VLB card :
Timing blocks of 65000 bytes :
Timing Move ...
Time for 1000 Move's : 10.7 s
Timing Move32 ...
Time for 1000 Move32's : 3.4 s
Timing blocks of 65001 bytes :
Timing Move ...
Time for 1000 Move's : 10.7 s
Timing Move32 ...
Time for 1000 Move32's : 5.8 s
Timing blocks of 65002 bytes :
Timing Move ...
Time for 1000 Move's : 10.6 s
Timing Move32 ...
Time for 1000 Move32's : 5.8 s
Timing blocks of 65003 bytes :
Timing Move ...
Time for 1000 Move's : 10.6 s
Timing Move32 ...
Time for 1000 Move32's : 5.8 s
I always tought videoRAM was SLOWER than normal RAM ???
Do you have an explanation for this ?
[Back to MEMORY SWAG index] [Back to Main SWAG index] [Original]