Author Topic: Intel Assembly (Read 4234 times)

Hooman · « **on:** August 31, 2009, 01:29:59 AM »

For the two or so people that might understand or care...

Why the heck do so many BIOS functions take addresses in DS:DX? In 16 bit mode, there is no instruction (that I can find) which loads a value using DX as the address (or segment offset, rather). The usual encodings have completely different meanings in 16 bit mode. Things you expect to work in 32 bit mode, like "MOV AL, [EDX]" have no corresponding 16 bit equivalent "MOV AL, [DX]". Instead, the encoding you'd expect to be that instruction would actually mean "MOV AL, [BP + SI]". I was a bit baffled at first when NASM told me I couldn't do what I wanted to do, but checking the Intel docs for the ModR/M byte encodings, I found rather strangely that it was right.

Another oddity was in the SIB byte encodings. What's the difference between encodings 0x24, 0x64, 0xA4, and 0xE4? They all equate to an address of "[ESP]". Indeed, OllyDbg decodes all of them as the same thing. I suppose if you wanted a really obscure way to store 2 bits somewhere unexpected, you could put them in the scale field of the SIB byte.

Actually, that reminds me of an idea I had for compilers to mark there output by encoding bits into fields such as these, or in careful selection of registers during register allocation when there is a choice between multiple ones. That could be an interesting way to watermark programs and later check for unauthorized distribution, or perhaps tracking down the author of a virus (if it auto marked with a computer locked GUID or MAC address).

Of course, that whole SIB thing is odd in that you even need a SIB byte to index off of ESP. You'd think that's the one register you'd almost never be able to use a SIB byte for, and the ModR/M byte encodings force a SIB byte on you, which means loading a value from an offset to the stack pointer takes an extra byte over using other registers as a base. But, I suppose they needed an escape to mark the use of a SIB byte somewhere. Still though, it seems to make the whole Frame Pointer Optimization a bit questionable.

instigator · « **Reply #1 on:** August 31, 2009, 09:01:12 AM »

my brain only understood half of that post XD props

I call conspiracy and backdoor registers !

CK9 · « **Reply #2 on:** August 31, 2009, 11:42:06 AM »

simple: They decided to take a new route with that part of the coding

Hooman · « **Reply #3 on:** September 02, 2009, 01:29:24 PM »

As for another way to hide a bit here and there, the direction bit of register to register moves will work.

There are different encoding types for MOV, such as "MOV Eb, Gb", and "MOV Gb, Eb".

The "E" means:

Quote

A ModR/M byte follows the opcode and specifies the operand. The operand is either a general-purpose register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a scaling factor, a displacement.

The "G" means:

Quote

The reg field of the ModR/M byte selects a general register (for example, AX (000)).

Hence, to store to memory, you'd use the "MOV Eb, Gb" type, such as "MOV [ESP+10], EAX". To load from memory, you'd use the "MOV Gb, Eb" type, such as "MOV EAX, [ESP+10]". These encodings differ by the so called direction bit in the MOV opcode encoding.

However, if you're doing a register to register transfer, either form can be used by flipping the direction bit of the MOV opcode encoding (switching between the two types listed), and then swapping the register indexes. Essentially this gives you full control over what you want that direction bit to be. Plus, that would be fairly easy to scan for, as you just need to check for register to register moves.

Eddy-B · « **Reply #4 on:** September 03, 2009, 12:12:38 PM »

..No comment..

Hooman · « **Reply #5 on:** September 03, 2009, 10:49:46 PM »

Technically, that's a comment.

Eddy-B · « **Reply #6 on:** September 04, 2009, 08:43:45 AM »

(it won't accept an empty reply)

Hidiot · « **Reply #7 on:** October 25, 2009, 05:36:57 AM »

After a bit of study on the matter, I think I'm starting to understand something.

Regarding the use of DS:DX in 16-bit mode, does the excessive use mean that it uses DX wastefully, as in, it could have used one of the other, more often used and probably easier to "translate" into 32-bit, segment offsets?

Just curious, because if the DX is used because the other, better suited segment offsets, were already used, then I'd be curious to know which better way to store excess values within a single data segment.

If I said something stupid, do tell me. It's all in the process of learning.

Hooman · « **Reply #8 on:** October 25, 2009, 04:11:08 PM »

I'm not entirely sure what've you've said, but here goes an attempt at clarifying things.

From the perspective of user mode code, it really doesn't matter whether the BIOS functions accept values in DX or say, BX. It's probably going to be the same either way to load that value. However, from the perspective of the BIOS code, if you actually want to load a value pointed to by a register, and that address was in DX, you'd need to move it to another register before using it, such as BX (or SI, or DI, or maybe BP if you also need a displacement). It seems like the BIOS code could have been shorter if they'd used a different register to accept the pointer value.

Of course this probably doesn't matter if that pointer value was being passed to a hardware device, such as a DMA controller, or a hard disk controller. Perhaps that has something to do with the decision.

This has nothing to do with 32 bit code. Generally speaking, code is written for a specific processor mode, and will not work in a different mode. In particular, the size of immediate constants following instructions will change depending on the processor mode, and so would offset the entire instruction stream if you tried to run it in a different processor mode. This would most likely not produce the effects you want, as it would then combine or splits the bits of following instructions in a different place and then interpret those values differently.

You perhaps might get away with using the same code in two different modes if you avoided instructions with immediate constants, and also avoided memory access instructions with widely different encoding formats, but that really leaves you with very little to work with. You'd probably be stuck with PUSH/POP for memory access.

If you're really interested, take a look at "Appendix A: Opcode Map" from the Intel document "IA32 Vol 2 - Instruction Set Reference". The first 6 (ADD) instruction formats should provide examples of what I'm talking about. You'll need the tables on previous pages to decode the meaning of the addressing and operand letters.

News:

Author Topic: Intel Assembly (Read 4234 times)