Well, thanks for the replies. They're certainly something to think about.
I think I've finally got my tree updating properly. I've figured out how to determine how many bits to read to find the start of a run but I haven't coded it yet. I can decompress about 5 lines before that becomes a problem.
As for what stuff in the .exe does, I can break it down (at least the major parts) by the starting address of the routine. (For the few who can understand this (thumbsup) ). Actually, I better not. Op2hacker, if you want them, PM me.
00XXXXXX Determines what to do with the codes read from the tree
00XXXXXX Returns the next bit of input in EAX
00XXXXXX Returns the next 8 bits of input encoded in a byte in EAX (used for reading the block starting addresses)
00XXXXXX Returns the next encoded byte. The above two use CALL [EBP+1C]
00XXXXXX Updates the Huffman Tree
00XXXXXX Returns the next code decoded using the tree
This last routine is really cool. It uses a table to represent the tree. The siblings are always next to each other in the table. So by adding 0 or 1 to the index of the left child, you get either the left or right child. The first CALL (in the loop) gets the next bit which controls which branch to follow. You just add this bit to the index and reindex. So if EDI is the index of the current node then
ADD EAX, EDI
MOVSX EDI, WORD PTR DS:[ECX+EAX*2]
is used to traverse a branch of the tree. (ECX is of course the base of the table and EAX gets the value of the next bit.)
It's also interesting to note that a leaf node is determined by the value stored in the data/link part. So a value above 0x273 represents data and values below represent a link to another node. Hence the terminating condition for the loop. To get the data back, it then subtracts 0x273.
I just had to share that little bit. The coolness of it all was killing me B) . Extracting the bits from the stream is also somewhat interesting. Updating the tree is a small nightmare. Especially the parent indexes. It took me awhile to see what I was missing there.
Oh, and the number is bits used for the starting address of a block I believe is always at least 9 and can be up to 14 bits. It also has default values for certain bits once the length of a block gets long enough.
Anyways, back to work for me. Hmm, I probably wouldn't have gotten this done today if it wasn't for that freak snowstorm. <_<
Edit: I've now got it properly working (mostly) for a few of the files. (mines, morale, space, vehicles) The only problem with those files is that the decompressor doesn't know how to detect the end of the files and so it spits out a bunch of garbage at the end. The *tek files I get almost nothing out of yet (except the leading comments and part of the first tech in multitek). I'll have to recheck my code for decompressing runs. The numbers are much too large and are causing problems. Meh. Time to take a break and go watch a movie.
Edit:
Looks like I've got them all decompressed properly now (except the garbage at the end, but that can just be deleted with notepad or something). I was forgetting to wrap my read pointer at 4096. Also, the buffer has to be initialized to spaces so that large values (or negative values that wrap around to large values) point to spaces and not just junk.
Hmm, it's kinda tempting to try writting a compression program sometime. Sometime being the key word here. I don't care to do it anytime soon.
Anyways, has anyone figured out how to put the decompressed files back into the .vol files. I've been using that ReVOLver program but the game just quits while loading up. I've tried editing the .vol file to change the 0x103 in the index record to 0x100 but that didn't work either. (I assume that's how it knows the files are compressed.)