Code Gems Part 3
This text comes from IMPHOBIA Issue X - June 1995
Welcome and greetings! Prepare for another bunch of coding
tips... By the way, if You have some nice tricks but You don't feel enough
inspiration to write Code Gems part 5, please send 'em to me. I'm running out of
ideas, but I'm sure there's a couple of tricks left :-) I'd like to say a big-
big thanks to my enthusiastic friends who helped me with finishing this article
: Perla, Nicke, Deity, George, Stinyo, Rodrigo, G.O.D., #coders and all I
forgot...
* Correction for the previous
part *
In the last issue I wrote that TASM doesn't support the
LOOP instruction using ECX instead of CX in 16-bit code. Well, I was wrong,
sorry about that. Of course, it has the ability of doing that. (I've managed to
throw a glance at an original Tasm book :-) There are two instruction aliases
called LOOPW and LOOPD. The first one always uses CX as counter (independently
from the size of the current code segment), the other vice versa. These can be
used as LOOPWE, LOOPDNE, and so on. JECXZ also available.
* Calculating the absolute value of AX *
This
wonderful 'gem' was developed by Laertis / Nemesis. cwd
xor ax,dx
sub ax,dx
* Short Compare, part II *
Checking if a register contains 8000h (or 80h or 80000000h): neg register
jo it_was_8000
The content of the desired register won't be changed if it was 8000h :-)
* Multi-Segment STOS/MOVS *
In flat real mode it's possible to use multi-segment block movement :
ECX for counting and ESI / EDI for addressing. ESTOSD macro
db 67h
stosd
endm
For example, this code clears four megabytes of memory: xor eax,eax
mov ecx,100000h
mov esi,200000h
rep estosd
* Pixel drawing in protected mode *
Here comes a 'routine' which sets a pixel to the given value in
256-color mode: (parameters: EAX=X coordinate,EBX=Y coordinate, CL=color)
add eax,table[ebx*4]
mov [eax],cl
The only difference from the real mode method that the TABLE doesn't
contain the 0, 320, 640, etc. values.It contains (a0000-base of DS),(a0000-base
of DS+320), ... There's an other version which doesn't change EAX : mov edx,table[ebx*4]
mov [eax+edx],cl
* Simple recursive calls *
Sometimes we have to call one subroutine many times like this: mov cx,4
call waitraster
loop $-3
But this requires a register as cycle counter ;-) There's the other way: call waitraster4
...
waitraster4:
call waitraster2
waitraster2:
call waitraster
waitraster:
mov dx,03dah
...
ret
Well, this is not really interesting. It just works :-) Now a more usable
example : loading instrument data to the AdLib card. ;Load AdLib instrument. Inputs:
;ds:si: register values (5 words;
; lower byte: data for operator
; 1, higher byte: data for
; operator 2)
;al: adlib port (0,1,2,8,9,a,10h,
11h,12h)
loadinstr:
mov dx,388h
add al,0e0h
call double_load
sub al,0c6h
call double_double
add al,1ah
double_double:
call double_load
add al,1ah
double_load:
call final_load
final_load:
mov ah,[si]
inc si
out dx,al
call adlib_address_delay
xchg al,ah
out dx,al
call adlib_data_delay
mov al,ah
add al,3
ret
* Hardware scroll with one page *
First a few words about vertical hardware scrolling. The 'standard'
scroll requires at least two pages. In the beginning the first page is visible,
and it's black. Then the screen goes up one row - the first row of the second
page appears at the bottom. Now this row is copied to the 1st row of the 1st
page (which row is now invisible). This process continues until the 2nd page is
entirely visible. At this point the two pages are identical. Now the 1st page is
displayed again and the whole process starts from the beginning. The problem
with it is the memory requirememnt, which is too big. With this method it's
impossible to make a 640*480 scroll since one page occupies more than 128k video
memory.
But why do we need two pages? Because the video memory is not
'circular'. I mean if we'd scroll the screen up by one pixel, the 1st row of the
video memory which was on the top of the screen now would be at the bottom. With
this kind of video memory we could do a smooth vertical scroll with a single
page : in the beginning, the screen is black. Now wait for a vertical retrace,
then change the 1st row, and shift the screen up by one row that the previously
modified row appear in the bottom. Perfect eh? The question is how can we make
'circular' memory...
It's a well-known fact that there's a certain
problem with the hardware scroll on TSENG cards : every second page contains
some 'noise' instead of the scroll we're expected to see...The cause of this is
the 'memory display start' register (3d4/0c,0d) which works a bit different than
other cards. At other cards always only the first 256k of the video memory will
be displayed on the screen, even if the memory display start register (MDSR) is
set close to the end of the 256k. These cards handle this 256k memory as a
circular buffer, but the TSENG boards not: .----------. <- screen -> .----------.
| video |<- MDSR ->|video |
| memory | |memory |
| | | |
| 3ffff| | 3ffff |
|----------| ---------- |
|00000 | TSENG -> |40000 |
| wraps | |continues |
| | <- VGA | |
`----------´ `----------´
So what we can do is 'emulate' the standard VGA circular buffer with the
LINE COMPARE REGISTER (LCR, 3d4/18h). The function of this register is pretty
simple: if the scanline counter reaches this value, the display address wraps to
0, beginning of the video memory: .---------.
MDSR -> |video |
|memory |
| |
line | ?????|
compare -> |---------|
register |00000 |
|wraps |
| |
`---------´
The *big* advantage is that it's possible to emulate shorter than 256k
circular video memory! It should work on all VGA cards. The most elegant way is
to add a LCR changer code to the MDSR modifier routine. With this the existing
'standard' scrollers can be fixed for TSENG cards too. Remember, the line
compare register is 10-bit, the highest two bits are located in 3d4/7/4. bit and
3d4/9/6. bit.
* Gouraud shading - 2
instructions/pixel *
The main goal of this example is not really
to show a G-shading with two instructions ;-) It's rather an example for 'how to
pray down the upper words of 32-bit registers without shifting'. There's often a
need for calculating with fixed-point numbers: a doubleword's upper word is the
whole part, the lower is the fractional part. The problem is that the upper
words of the 32-bit registers are hard to reach. For example, at ADD EAX,EBX how
to get EAX's upper word? No (quick) way :-( The idea beyond t he trick is
changing the upper & lower words, and using ADC instead of ADD: ; EAX & EBX are fixed-point numbers
ror eax,16
ror ebx,16
cycle:
...
adc eax,ebx
stosw
...
loop cycle
The whole part of the fixed-point numbers will be in the lower words :-)
It's very important to save the Carry flag for appropriate result. Now the
Gouraud shading. The following piece of code is only a horizontal shaded line
drawer routine, not the whole poly-filler. Colors are expected to be fixed-point
numbers presented as doublewords with 8-bit whole part in the highest byte (this
value will appear on the screen) and 18-bit fractional part. (18 bits may seem
to be a lot, but surely more accurate than 8 bits ;-) ;In: eax: end color
; ebx: start color
; ecx: line length
; es:edi: destination
;!!! 32-bit PM version !!!
gou_line:
sub eax,ebx
;Fill edx with the carry flag
rcr eax,1
cdq
rcl eax,1
idiv ecx
;Pull down the upper parts of dwords
rol eax,8
rol ebx,8
xchg ebx,eax
;Calculate the address of the entry
;point in the linearized code
neg ecx
lea ecx,[ecx*2+ecx+320*3+
offset gou_linearized]
jmp ecx
gou_linearized:
rept 320
stosb
adc eax,ebx
endm
ret
Variations: If You want to use it in real mode, then You have to modify
the linearized-code entry point calculation, because the length of a stosb/adc
pair is four bytes: neg cx
shl cx,2
add cx,320*4+offset g.lin.
jmp cx
486-optimization fans may think some indexed linearized code instead of
stosb :-) In this case take care to correctly set up the lin. code because the
lengths of 'mov [edi+0],al', 'mov [edi+1],al' and 'mov [edi+200h],al' are
different, so with a rept we won't get equal-length instructions.