FIG - loading a GIF file

TAD/Hugi

Introduction

This article briefly describes the GIF file format, how the LZW decompression works and gives an overview of the supplied 80x86 source-code. Yep, for all you 80x86 people out there... here is finally some code to load a GIF!!

For such a widely used graphics file format the GIF it is still a mysterious Pandora's box of CompuServe Patents and lack of 80x86 friendly info. I see the question "How do I load a GIF?" very often in the various assembly newsgroups, so hopefully this will address the problem.

The ZIP files

No, not staring Gillian Anderson... The ZIP file contains source code for 32-bit Flat/P-mode. I've tried to keep it all simple and give plenty of comments were possible. The code was written for TASM and tested on both TASM 5.0 and MASM 6.11 (sorry, no N-ASM version).

     GIF32.ASM       - 32-bit source code for Flat/pmode 
     GIF.INC         - include file (contains GIF structures & equates) 
     TINYHUGI.GIF    - 640x480x16bit piccy I drew resized to 160x100x8

I used the nice WDOSX dos extender by Michael Tippach for the "GIF32.ASM" 32-bit version of the GIF loader, you can find this at:

http://www.wuschel.demon.co.uk/index.html

You MAY need the DLINK.EXE program by Adam Seychell (depending on your TASM and possibly MASM version). This can be found in the DOS32B35.ZIP at:

http://www.programmersheaven.com/zone5/cat19/index.htm

The GIF file format

The supplied code was written to only load and decompress a single image from a GIF file, so it does NOT deal with all the other extension sub-blocks (like comments, graphic-control data, plain-text or application data). So for animated GIFs you will need to adapt the source code to suit your own needs.

This shouldn't be as bad as it first sounds. The code already skips past these sub-block extensions and does the core work of decompressing the LZW bitstream (which is THE hardest part, IMHO).

A basic GIF87a/GIF89a file looks something like this:

gifhdr       - the GIF header (Date/Version & "FIG" signature ;)

giflsd       - the LSD (Logical Screen Descriptor) 
             - the GCT (Global Color Table, ie. the palette) 

gifxxx       - the various extension blocks (AEX, CEX, PTE etc..) 

giftid       - the TID (The Image Descriptor) 
             - the LCT (Local Color Table, ie. one image's palette) 
             - the TBI (Table Based Image - LZW bitstream) 

As far as I know the extensions and TID/LCT/TBI sections can be freely repeated for GIF files with multiple images. The extensions supply timing information and a method to intergrate text information between images.

A loader overview

The GIF32 works like this:

1) Open the file

2) Read the gifhdr (header) and check the "G" "I" "F" signature)

3) Read the LSD (Logical Screen Descriptor) (you may wish to extract this screen size information from here so you can allocate/select the correct screen size & resolution)

4) Read the GCT (Global Color Table) if any (the GCT_FLAG in the [giflsd.LsdFlags] indicates if GCT is present)

5) Read the next byte and check for extension OR TID signature (02C hex) (you can skip the extension sub-blocks by sitting in a loop and looking for a BlockSize = 0, which marks the end of a section)

6) Read the TID (The Image Descriptor) (this begins with the 02C hex Separator and contains info about the LZW compressed image which follows any optional LCT)

7) Read the LCT (Local Color Table) (this is a local palette used for the next image(s) only, its in the same format as the GCT which is 8-bits per R,G,B component. The LCT_FLAG in the [giftid.TidFlags] denotes if a LCT is present.)

8) Read the LZW code size, then decompress the TBI data sub-blocks. (The first byte, the ImageBits code size, is used to init the LZW decompression code token values. The actual LZW bitstream is broken into sub-blocks of between 1 and 255 bytes, just like the extension sections. Each block is preceeded by a byte count, a 00 means the end of the bitstream.)

The LZW algorithm

Sorry, but I'll skip most of the explaination about how the LZW works (there is plenty of information available on the net), besides you already have some 80x86 source-code. ;)

Oh, okay... here is a rough decompressor..

     ImageBits = 2 ... 8     ; This is the minimum LZW code size 
                             ; (i.e. number of bits in image pixels) 
 
     phraze[4097] bytes      ; used to build up a reversed LZW phraze 
     suffix[4097] bytes      ; last byte of each phraze 
     prefix[4097] words      ; token of the sub-phraze before the suffix 
 
Init: 
     FutureCode = 0 
     OldCode = 0 
     BytesLeft = 0 
 
     CodeBits = ImageBits + 1 
     TopSlot = 1 << CodeBits 
 
     FlushCode = 1 << ImageBits 
     EndCode = FlushCode + 1 
     Slot = FlushCode + 2 
 
     BytesLeft = 0 
     BitPos = 8                      ; any number > 7 !! 
 
Depack: 
     Token = ReadGIFCode()           ; get the next LZW token from the 
                                     ; bitstream. 
 
     if Token = EndCode then done..  ; end of image ? 
 
     if Token = FlushCode then ... 
                     CodeBits = ImageBits + 1 
                     TopSlot = 1 << CodeBits 
                     Slot = EndCode + 1 
 
                     while Token = FlushCode 
                             Token = GetGIFCode() 
                     wend 
 
                     if Token = EndCode then done ... 
 
                     if Token >= Slot then Token = 0 
 
                     OldCode = Token 
                     FutureCode = Token 
                     OutputByte(Token) 
 
     else.. 
 
             Code = Token 
 
             Rev = 4097 
 
             if Code >= Slot then ... 
                             OldCode = Code 
                             FutureCode = Code 
                             Rev - 1 
                             phraze[Rev] = Code 
 
             while Code > EndCode ... 
                     Rev - 1 
                     phraze[Rev] = suffix[Code] 
                     Code = prefix[code] 
             wend 
 
             Rev - 1 
             phraze[Rev] = Code 
 
             if Slot < TopSlot then ... 
                     FutureCode = Code 
                     suffix[Slot] = Code 
                     prefix[Slot] = OldCode 
                     OldCode = Code 
                     Slot + 1 
 
             if Slot >= TopSlot and CodeBits < 12 then ... 
                                                     TopSlot << 1 
                                                     CodeBits + 1 
 
             while Rev < 4097 
                     OutputByte( phraze[Rev] ) 
                     Rev + 1 
             wend 
 
     goto Depack 

That's the main decompression, the only thing missing is the "ReadGIFCode" routine which constructs a N-bit token code from the TBI bitstream.

The GIF Bitstream

The bitstream is stored in an Intel friendly low, high-byte style with bits stored in a bit-0 first, bit-7 last order. The actual bytes which make up the entire bitsteam is broken up into smaller blocks, each block contains between 1 and 255 bytes. Each block is preceeded by a length count, exactly like the other extension blocks. A length count = 0 means the end of the bitstream.

      byte 1    byte 2    byte 3
      
     76543210  76543210  76543210 
     hgfedcba  ponmlkji  --vutsrq    <-- 22 bits abcdefghijklmnopqrstuv 

The above 22 bits (3 bytes) could be stored in a block like this:

               76543210 
     byte 0    00000011      = 3     the length byte 
      
     byte 1    hgfedcba              bits  0...7 
     byte 2    ponmlkji              bits  8...15 
     byte 3    --vutsrq              bits 16...21 

A LZW token consists of between 3 and 12 bits (depending on both the number of ImageBits and the number of phrazes stored in the dictionary). A "GetBits" routine has to deal with both constructing a token code from a number of bitstream bytes and also dealing with crossing the TBI block boundaries.

ReadGIFCode:  
      
     if BitPos > 7 then ... 
                     BitRack = ReadGIFByte() 
                     BitPos = 0 
 
     Token = BitRack >> BitPos 
     GotBits = -BitPos + 8 
      
     while GotBits < CodeBits 
             BitRack = ReadGIFByte() 
             Token = Token or (BitRack << GotBits) 
             GotBits + 8 
     wend 
      
     BitPos = 8 - (GotBits - CodeBits) 
     Token = Token and (TopSlot - 1) 

The following code will read a bitstream byte and deals with the caching of a single TBI block (which is between 1 and 255 bytes in length).

ReadGIFByte: 
             if BytesLeft = 0 then ... 
                             BytesLeft = ReadFileByte() 
                             for n = 1 to BytesLeft 
                                     cache[n-1] = ReadFileByte() 
                             next n 
                             BytePos = 0 
      
             n = cache[BytePos] 
             BytePos + 1 
             return (n) 

Closing words

I think I've said enough here. You know where the source code is and you can always fire up your favourite search engine if you still need some more info about the LZW algorithm or the GIF87a/GIF89a specs...

Happy viewing...

TAD/Hugi