Programs Under the Hood Part 6: Getting Functional

Posted by: dargueta in Untagged  on

   I realize that I’ve been throwing a lot at you all at once. I now promise to go slower, easier, and put in more diagrams. Today we’re going to learn about the stack, stack frames and functions.

USING A STACK
Just about every program needs a stack, or a section of memory that is used sort of like temporary storage. It’s not like allocating memory, because the stack is already part of the program. The function of the stack is to:
  • Hold arguments passed to a function, as well as the return address
  • Hold all the local variables of a function. Once the function exits, these are gone. Poof.
  • Temporarily save data needed later in the program
So how does the stack work? If you don’t know about memory segmentation or CPU registers, please see my previous blogs. If you do remember, keep reading.
The stack is essential when using functions. But how are functions called? Here’s a generic outline:
  • Arguments (if any) are pushed onto the stack. If the function calling convention specifies, they may be placed in registers for faster access instead (c.f. __fastcall on MSDN). The stack, however, is normally used (c.f. __stdcall on MSDN).
  • The CALL instruction is executed. This does two things in one instruction:
    • The return address is pushed onto the stack. This is the address that the program will resume execution at when the function exits. Unlike function arguments, which may go in registers, this always goes on the stack.
    • The processor jumps to the address indicated and begins execution.
  • Somewhere along the line the processor hits the RET (return from function) instruction. It pops the return address off the stack into the instruction pointer, and voilà! Execution resumes back where we started. It’s interesting to note that badly behaved functions can change the return address by directly modifying the stack either intentionally or accidentally, causing execution to resume at the wrong place. We’ll examine this in greater detail a bit later.
 A typical C program will at some point call some function. For example, this line of code 

printf("ERROR %08X: %sn", dwErrorNumber, szString);

might get compiled into something like this:
(All addresses and values used in this example are completely arbitrary.)

;push address of szString
push        1A2C485F
;push the contents of dwErrorNumber onto the stack. (Here we assume that
;dwErrorNumber is located at address 0D30F914.) Notice that you have to
;declare how large the memory operand is.
push        DWORD PTR [0D30F914]
;push the address of the string constant. Remember that even string
;constants that don’t have variable names MUST be in memory somewhere.
push        001475FE
;call the printf function
call        0008F398
;when printf returns, execution resumes here.

I want you to notice two things:

  • What’s with the DWORD PTR thingy? Whenever you have a memory operand whose size can’t be figured out from context, you need to declare explicitly how big it is. They take the form of: BYTE PTR, WORD PTR, DWORD PTR…etc. Examples:
;don’t need declarator because EAX is 32 bits by definition, and operand sizes must match
mov                               eax,[48101E90]
;this constant can be any size, so we need to know how big the memory operand is.
add                                DWORD PTR [01253014], 57h
;don’t need declarator here either, registers are both 16 bits by definition
sub                 ax,cx

 

  • The arguments are pushed in reverse order. That way, the first argument is at the top of the stack, and the last argument is at the bottom.  I’ll show you what I mean:
Imagine the program stack as a stack of dishes. When you add a plate to the stack, we say you’ve pushed the plate onto the stack. When you take a plate off, we say you’ve popped a plate off the stack. Since you can’t take a plate out from the bottom, the first plate you put in the stack is the last plate you’ll take off. Programmers call this a LIFO (Last In, First Out) data structure.
Before I show you the pretty little diagrams I made, I should mention one thing about Intel processors – you can’t push anything smaller than a word (16 bits) onto the stack. Why? Because with that restriction, every word would end up beginning at an even address. The early Intel processors could only read words from even addresses; reading a word from an odd address would require reading two consecutive bytes separately and then slapping them together, a time-consuming process. You can push dwords onto the stack (80386 and later), and even qwords if you have a 64-bit processor.

Why does MS Word keep adding these random horizontal lines?
(I’ve divided the stack into words so that it’s easier to read. Dwords just take up two spaces on the diagrams instead of one. )

An empty stack:                                                     A stack with 1A2C485F pushed onto it.
Stack with one variable pushed onto it

Stack now has all the arguments pushed.     Stack after CALL instruction executed.

Stack with all arguments pushed Stack with call instruction executing

 By taking a quick look at the diagrams, you realize three things: SP keeps moving, BP doesn’t, and the stack is growing downwards. What’s with that?
  • Why does the stack begin at a high address and grow downward? Well, in the days when all programs were COM programs (i.e. 64Kb in size at most), they had to contain the stack, data, and code all in one segment. Data and code is fine because it doesn’t change size…but the stack does. Our buddies at Intel decided that programmers could take advantage of that extra space at the end of the segment and use it to implement their stack. Starting at a low address and going upwards towards FFFFh would restrict the size of the stack if the starting address selected was too high. In addition, if the program ever changed size, the stack would have to be moved. The solution: start the stack at the end of the segment and have it grow downwards towards the code. If BP isn’t correctly set, then the stack could overwrite some code or data, but that’s okay. That isn’t a security hole at all
  • SP (Stack Pointer) points to the top of the stack. When you push a value, the processor decrements SP/ESP and places the desired value on the stack.
  • BP (Base Pointer) points to the bottom of the stack. This is how the processor knows when it has run out of stack space. If you try pushing a value and SP ends up pointing to a lower address than BP, the processor freaks and throws a Stack Overflow Exception. This means you’re out of stack space. To go with our plate analogy, this is akin to trying to put a plate on top of the stack that has already hit your ceiling. On the other hand, if you try to pop a value off an empty stack, the processor will throw a Stack Underflow Exception. This is the same as trying to take a plate off the stack of your plates, and finding that there are none left.
 (I hate Word...another unasked-for line.)

USING THE STACK IN ASSEMBLY LANGUAGE
You really need to know only two instructions: PUSH and POP. Each takes one operand.
  • Use PUSH with a register, constant, or memory location whose contents you want to push onto the stack. The contents of the operand you specify are unchanged after the push. (The only exception is when pushing SP, which is pointless because it always changes on a push or pop.)
  • For POP, the single operand is the register or memory location you want to dump the top record of the stack into. Anything previously in the register or memory location is overwritten.
 There are a few other stack instructions that you don’t really need to know:
  • PUSHA, PUSHAD, PUSHAQ – These instructions push every register on the stack, using 16-, 32- and 64-bit versions, respectively. The only register not included is SP, which is discarded.
  • POPA, POPAD, POPAQ – Pops all registers off of the stack, using 16-, 32- and 64-bit versions respectively.
  • PUSHF, PUSHFD– Pushes the processor’s flags register onto the stack in 16-, or 32-bit versions. When using a 32-bit processor use PUSHFD, and use only PUSHF for 16-bit processors. The processor flags are used a lot in if-then-else constructs. You can directly modify some flags only by using PUSHF/POPF and PUSHFD/POPFD instructions, but doing so may give unexpected results. Be careful when messing with these. NEVER MIX THE TWO!
  • POPF, POPFD – Pops a word or dword from the stack into the processor’s flags register. The size of the instruction must match the processor “size”, i.e. you shouldn’t use the 16-bit PUSHF on a 32-bit processor. Use PUSHFD instead.
 
GETTING FRAMED – STACK FRAMES AND ACCESSING VARIABLES
How does a function access its arguments? Remember that the stack is not special memory. You can read, write, and (though it’s a bad idea) execute from it, just like any other memory location. We prefer to use the instructions PUSH and POP to modify the stack, though. We could pop the arguments off and on the stack as we need them, but then the order would change and keeping track of what is where would be difficult. Besides, to get to argument n, you’d have to pop off the first n – 1 arguments, use n, then push it back along with the other arguments you didn’t need. Not only is that slow, it’s cumbersome.
The answer: use a pointer and just add a signed offset to access each argument. That way, all you need to do is know 1) the size of each argument; 2) the order in which the arguments were pushed onto the stack; with this knowledge, you can figure out which argument is where in relation to BP, and then add or subtract a value from it to get to the desired variable, as shown in the diagram below.

In order to do this, a few things must happen. Take the following function in C/C++:
 

DWORD
MakeWindowsCrash(DWORD dwCrashCode, LPCSTR pszErrorMessage)
{
      WORD        dwSecondsToHang = 0xFFFF;
      CHAR        pchKeyboardBuffer[32];
      // code to make windows crash...
      return      0x0000DEAD;
}

This function – which probably does exist somewhere – takes two arguments, a DWORD (four bytes) and a pointer to a string, also four bytes. The combined size of the local variables is 1 WORD (two bytes) plus a 32-element array of CHARs (one byte each), for a total of 34 bytes. Let’s go step by step through the call procedure: (Apologies for MS Word's retarded formatting)
1.       We push the arguments onto the stack in reverse order. That means we push pszErrorMessage, then dwErrorCode.
2.      Execute the CALL instruction.
a.       The processor pushes the address of the next instruction onto the stack automatically. This is the return address, where execution will resume after MakeWindowsCrash returns.
b.      The processor jumps to the address we provided in the CALL instruction and begins execution of the function.
3.      Now we’re inside the code of MakeWindowsCrash. To access the arguments and local variables, we’ll need to create a stack frame, a simple method of using offsets from a base pointer to access variables in memory.
4.      Execute function code, do whatever, lah-dee-dah.
5.      Remove stack frame
6.      Return to calling function.

The general code for creating a stack frame is as follows:
;16-bit version                     ;32-bit version
push        bp                      push        ebp
mov         bp,sp                   mov         ebp,esp
sub         sp,XXX                  sub         esp,XXX

XXX is a number representing the total size in bytes of the local variables. So XXX would be 34 (decimal)  for MakeWindowsCrash. I probably need to answer three questions now:
  1. Why push BP? Well, we need to eventually restore it to what it was before our function was called, otherwise the variables will be at different offsets than expected, and the program will crash, hang, etc.
  2. Why set BP equal to SP? Usually BP starts out at 0; for multisegment EXE programs this is fine, but for COM programs this is deadly. A stack minimum address of 0 means that the entire segment could be used as a stack…overwriting code. Programs that use the stack should set BP to point to an address beyond the end of the code and data block; this ensures that an exception will be thrown by the processor if an attempt is made to push anything beyond the stack limit. Anyway, setting BP equal to SP makes it easier to calculate the offset to an argument; for a large program, counting from the beginning of a segment to offset 0xFE3D - or whatever - would become very annoying for assembly language programmers. (Remember, this instruction set was developed in the 70s, when assembly language programming by hand was far more common than it is today.)
There’s also another, not-so-obvious reason. If you take a look at the following example diagram, notice that the offsets are only two digits long, which means that the offset encoded in a read/write instruction only needs to be a single byte long. If you count from address 0, it’s four digits long, requiring two bytes to access the same variables. For functions that frequently access their variables, this cuts down on program size quite a bit.



3.  Why do we subtract from SP if we don’t push anything else onto the stack? If, for example, we were to call a function, ShowBlueScreen, from MakeWindowsCrash without decrementing SP, our local variables, i.e. dwSecondsToHang and lpszKeyboardBuffer would at least get partially overwritten, because the arguments for ShowBlueScreen would get pushed into the same memory locations, overwriting whatever was there before. By subtracting from SP, we trick the processor into thinking that more stuff has been pushed onto the stack than there really has been, and we can push our arguments to ShowBlueScreen onto the stack with no trouble at all.
 A few examples of accessing variables and arguments using the stack:

;load LocalVar#1 into EAX
mov         
 eax,[bp-0Ah]
;load first argument into edx
mov         
 edx,[bp]
;load second argument into esi
mov          esi,[bp+04]


Couldn’t we just use SP as the index register, and use BP for something else? No, because the designers at Intel decided not to implement that; using just BP makes designing the processor easier. The only registers you can use as pointers are SI, DI, BP, and BX in 16-bit mode, and EAX, EBX, ECX, EDX, ESI, EDI, EBP in 32-bit mode.

GETTING RID OF A STACK FRAME
;16-bit version                     ;32-bit version
add         sp,XXX                  add         esp,XXX
pop         bp                      pop         ebp

Basically we just need to reverse what we when creating the stack frame. We don’t need to use MOV because setting BP equal to SP and then changing it immediately when popping off the stack is pointless.

Well, that’s all for now. I’m not entirely sure when the next blog will be (I’m headed off to college in two weeks), but hopefully it’ll be soon. Questions and comments are welcome.
Trackback(0)
Comments (0)add comment

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

busy