Programs Under the Hood...Part 8: Disassembling Stuff Posted by: dargueta in Untagged  on
 

Welcome back to Part 8 of Programs Under the Hood. Today we're going to disassemble a BIOS interrupt to get a real-world example of what programs are structured like, and we'll see if we can convert some of it to C/C++ code. (I apologize for the large line breaks. For some reason, they just appeared.)

 

POPPING THE HOOD-DISASSEMBLING A BIOS ROUTINE

A few issues back I mentioned that the BIOS provides a lot of basic functionality in assembly-language programs. It's sort of like a primitive library, albeit one that's slow as hell because it's designed to be compatible with everything. Nowadays performance programs bypass it altogether and execute functions in device drivers and system libraries. However, since the BIOS came first, it's interesting to see what it's built like, and how it works.  We're going to disassemble part of interrupt 10H, which was responsible for graphics functions such as plotting pixels and changing video modes.

 

FINDING THE ENTRY POINT

Every interrupt has a starting address, like a function. I think the addresses are standard, but what if they aren't? How do you find out where INT 12H is on your computer? By using the BIOS, of course. By flipping through an interrupt table, I found that INT 21H, subfunction 35H gives the entry point, or starting address, of an interrupt:

 

INTERRUPT 21H SUBFUNCTION 35H

Description: Returns the entry point of the specified interrupt.

Arguments:

      AH = 35H (the function identifier)

      AL = interrupt number

Returns: Interrupt starting address is in ES:BX.

 

We need to call this interrupt and examine the output. But how? I really don't feel like writing an entire program, most of which will be spent converting a binary number into human-readable text. This is where debug comes in handy. You can write a program and run it in real time, stepping through each instruction and examining the contents of registers at each step, if you like. That's perfect! We only need two or three lines of code instead of who knows how many.

Remember that debug is a dumb command-line utility. We need to tell it we want to start assembling a program by typing A  100. This means "Assemble beginning at address 100H". COM programs, by the way, always begin at address 100H of their segment. The first 256 bytes contain the command line and some other extra information stuck in by the loader.

So now that we're ready to begin writing our program (more like a stub since it's not a complete one), we need to do three things: 1) set AH to the number of the function we want to execute, 35H; 2) set AL to the interrupt whose address we want, 10H; 3) call interrupt 21H. We can do this with two instructions, plus a breakpoint to prevent debug from executing anything we don't want it to:

 

mov   ax,3510

int   21H

;hard-coded breakpoint

int   03H

 

Leave a blank line, then press enter again to let debug know you're done. It should return to the prompt, a single dash. Now we need to execute our little stub program. At the prompt, type G =100 and press enter. Here's what my debug spat out:

 

AX=3510  BX=08A9  CX=0003  DX=0000  SP=FFEC  BP=0000  SI=0090  DI=0000

DS=0000  ES=0210  SS=13EC  CS=13EC  IP=0105   NV UP EI PL NZ NA PE NC

13EC:0105 CC            INT     3

 

The first two lines are a printout of the state of the registers and the CPU flags. The third line shows the next instruction that would be executed. What do you know...it's our breakpoint. If we hadn't put that there, debug would've kept right on going into undefined memory, and who knows what it could be executing, which is kinda dangerous.

But I digress. If you remember the specification for INT 21H, subfunction 35H, the address we're interested in is in ES:BX. Well, what does the output say? ES=0210, BX=08A9. So the entry point of interrupt 10H is 0210:08A9, or at least on my computer. Let's go right now and see what's lurking there.

 

DISASSEMBLING WITH DEBUG

Now that we know where INT 10H starts, we can directly disassemble it. Type U 0210:08A9 (or whatever address your computer returned) and press enter. You should get the following (I slightly modified it to make it more easily readable):

 

0210:08A9   CMP     BYTE PTR CS:[08A7],02

0210:08AF   JNZ     08B6

0210:08B1   CALL    0806

0210:08B4   JB      0915

0210:08B6   CMP     BYTE PTR CS:[08A7],01

0210:08BC   JZ      091B

0210:08BE   CMP     AH,00

0210:08C1   JZ      08F3

0210:08C3   CMP     AH,1C

0210:08C6   JA      08D3

0210:08C8   CMP     AH,04

 

I know what you're saying: Whoop-dee-doo. I have no idea what this does. Neither do I, at first.

First of all, this is only the first 128 bytes of the interrupt or so, because that's all that debug.exe can handle at once. If you type U again and press enter, it will pick up where it left off and disassemble the next few instructions up to 128 bytes.  Let's take an in-depth look at the code we have now:

 

0210:08A9 CMP     BYTE PTR CS:[08A7],02

0210:08AF JNZ     08B6

The CMP instruction compares the byte at CS:08A7 with the value 2. JNZ is the same as JNE which means jump if not equal, so we can safely assume that the code from this point forward until offset 08B6 will not get executed if the byte at 0210:08A7 is equal to 2. We can easily rewrite this as an if statement in C/C++ code:

 

if(*((BYTE *)(0x00029A7)) != 2)

{

      //execute statements beginning at 0210:08B6

}

//continue executing at 0210:08B1

 

Notice that we have to hard-code the address, convert it to a pointer, t hen dereference it to get what we want. Let's try and make this a little more readable:

 

const BYTE *pbValue = (BYTE *)0x00029A7;

 

if(*pbValue != 2)

{

      //execute statements beginning at 0210:08B6

}

//continue executing at 0210:08B1

 

Isn't that easier to read? But wait...where'd the 0x00029A7 come from? Why didn't I just put 0210:08A7? Answer: You can't do that in C/C++. They use linear addresses, meaning that they completely ignore segments and just use 32-bit offsets.  Why? Let me explain:

If you recall from a previous issue of Programs Under the Hood, I mentioned that with the way the 8086's segmentation works you could only use 1Mb of memory. This is still the case with Intel processors today, in order to maintain backwards compatibility with 16-bit applications. To get around this severe limitation, 32-bit programs nowadays set all segment registers to 0 and instead use 32-bit offsets, allowing up to 4Gb of memory to be used. (If you want to use more memory, you need a 64-bit processor, which allows theoretical limit of about 16.8 million terabytes!)

So how does one convert a 16-bit segment-offset address into a 32-bit linear address and vice-versa? Luckily, there's a simple formula for this:

 

LinAddr = segment*16 + offset

SegAddr = {offset = addr % FFFFH; segment = (addr - offset) >> 16};

 

Let's try converting the address we encountered in our program. Our segment is 0210H, our offset is 08A7H, so:

 

LinAddr = (segment)*16 + (offset)

LinAddr = (0210H)*10H + (08A7H)

LinAddr = 02100H + 08A7H

LinAddr = 00029A7H

 

We will be doing this with every address we encounter from now on. (I must warn you, though, that since most linear addresses have up to 4096 equivalent segment addresses, you probably won't get the same address out that you put in.)

Going back to the disassembly, we encounter the following instruction:

 

0210:08B1   CALL    0806

0210:08B4   JB      0915

Okay...we know what this does. It calls a function located at 0120:0806, right? So how do we figure out what that function does? It'd take too long to disassemble the whole thing here, so I'll just say that it copies a buffer in DS:SI into the screen text buffer at 0xB000:0000 if AX is a certain value. Now for the jump statement: jump if below to offset 0915. I honestly don't know enough about the function to tell you why it's there. As far as programming convention goes, it shouldn't.

0210:08B6   CMP     BYTE PTR CS:[08A7],01

0210:08BC   JZ      091B

Again, we reference the mysterious byte at 0210:08A7. This time we check to see if it's equal to 1. If it is, we jump to offset 091B.

 

0210:08BE   CMP     AH,00

0210:08C1   JZ      08F3

0210:08C3   CMP     AH,1C

0210:08C6   JA      08D3

0210:08C8   CMP     AH,04

0210:08CB   JZ      08ED

More mundane comparisons, as you can see. If you haven't noticed, we're comparing AH to different values, then jumping to different locations. Is this 1) a switch block, or 2) a series of if-else statements?  Look closely at the second comparison. What is the jump instruction? JA, or jump if above. Switch statements cannot contain relative comparisons like case a < 5 and so on. Only if statements can, so this is a series of if-else statements. (If it were a switch block, then all of the jump statements would be JZ, which tests equality.) Moving on:

 

0210:08CD C4C4          LES     AX,SP

0210:08CF 42            INC     DX

0210:08D0 EB43          JMP     0915

 

I'm sure you can figure out the INC and JMP, but what about this mysterious LES? What does it do? To be brief, it loads (in this case) ES:AX with the 4-byte pointer in memory pointed to by SS:SP. Examples:

 

;ES:DI loaded from DWORD at SS:[BP+04]

les   di,[bp+04]

;DS:SI loaded from DWORD at DS:[DX]

lds   si,dx

;FS:EAX loaded from 48-bit pointer at DS:[2*EBX+ECX+6].

;To use FS or GS, as well as this more complex memory

;addressing scheme, you need at least a 32-bit processor.

lfs   eax,[2*ebx+ecx+6]

 

The rest of the function goes on for quite a while, so I think here's a good place to stop. Before I finish this, I'll leave you with some steps to disassembling programs by hand:

 

TIPS AND TRICKS FOR DISASSEMBLING STUFF

  1. Keep track of memory references. If a program reads from or writes to a memory address, write that down and see what's there. Sometimes you'll be surprised and find an actual string; most times it'll be some binary number.
  2. Write down all addresses of function calls as you come across them.
  3. Write down all addresses of jumps as you come across them, and make sure you note whether they are conditional or unconditional. Follow all unconditional forward jumps. Chances are, they jump over a data section, which will give you erroneous instructions if you try to disassemble it.
  4. Draw yourself a model of the stack and keep track of where everything is. Sometimes you can figure out the purpose of some of the variables just by seeing how they're used.
  5. Go back and disassemble all functions as if each of them were their own program (i.e. go to step 1 and repeat everything.) If you figure out what a function does, provide a name for it and write it down along with its address, arguments that may be passed to it, and return values.
  6. Go back and disassemble from the beginning of each address jumped to. For example, if you find an instruction that says JZ 0389, you would disassemble beginning at 0389.
  7. WATCH YOUR ADDRESSES. Being off by one byte changes everything. For example, observe the following:

Actual code:

jz near  -0641H

and      ax,es:[di]

 

Disassembled code if off by one byte:

test     bh,bh

int      26H

and      ax,ds:[di]

 

See the difference? With some other instructions, this could be really risky. You could overwrite your own variables, call interrupt functions you never meant to call, and basically put your computer at risk. Moral of the story: Watch your addresses.

 

That's all for now. Next time, I'll show you some functions we need to write to get this disassembler project off the ground. Oh, and by the way, because of time constraints, I'm writing it in C/C++.


Trackback(0)
feed0 Comments

Write comment
 
 
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger
 

security image
Write the displayed characters


busy