Programs Under the Hood...Our Friend the CPU Posted by: dargueta in Untagged  on
(Part 3)

Hi, everyone, it's me again with the latest installment of Programs Under the Hood. Today we're going to get to know the inner workings of the processor, and start learning a little assembly language.

INTRODUCING THE GENERIC INTEL CPU
Since the 8086 in 1974, the Intel processors have retained a lot of the same characteristics. Pretty much the only thing that has changed that we need to worry about, aside from speed, is the size of the instruction set. The newer the processor, the more instructions (generally) it recognizes. Some of them are useful, such as CPUID, ENTER and LEAVE. Others, like CMPXCHG16B, are rarely used. I, of course, will only teach you the useful stuff.The generic Intel processor has little sections of really fast memory called the registers. These are internal to the processor; you can think of them as scratchpads capable of holding a fixed number of bytes of data. Each of these registers has a name so the assembly language programmer can access them. The 16-bit general registers are: AX, BX, CX, and DX.You can use these for whatever you want. Each of these registers is further subdivided into two 8-bit registers. To access the high byte of each 16-bit register, simply replace the X in the register name with an H. To access the low byte, replace the X with an L. Simple, right? So the low byte of the AX register is AL, and the high byte is AH. Register division diagram

Starting with the 80386, each of the 16-bit registers was extended to 32 bits. Now you have EAX, EBX, ECX, and EDX. The low word of EAX is…EAL? No! It’s still AX. The low byte of EAX is the same as the low byte of AX, which is AL. (If you’re confused, look at the diagram.) What about the high word? Sadly, you can’t access it directly. You either need to write to the entire register, or use a mask and shift bits around. As a side note, the general registers are the only registers that can be subdivided into smaller registers.
The Intel processor has more registers that are not so general-purpose. There are two kinds: the segment registers and the pointer registers. The segment registers point to a segment in memory, just like their name suggests. They’re each 16 bits and they’re named CS, DS, ES, and SS. Beginning with the 80386, two more 16-bit segment registers were added, FS and GS. CS is the code segment register, and points to the current segment which the processor is executing.* DS points to a data segment, SS to the stack segment, and ES to an extra segment. FS and GS in the later processors are even more extra segment pointers.

*There is also a hidden register that points to the current instruction in the current code segment called IP, which stands for instruction pointer. You can’t access it directly. Together, they make a full pointer, CS:IP. Read on in the next issue to see what I mean by “pointer”.

I know what you’re thinking…in a COM program, these registers all have the same value, because the program takes up one segment. In an EXE program, they usually point to different segments.But wait, there’s more. The pointer registers are used for pointing to memory locations. (What a surprise.) Like the general registers, they were originally 16 bits, but later extended to 32 bits. They are: SI (source index), DI (destination index), SP (stack pointer), and BP (base pointer), with 32-bit versions being ESI, EDI, etc. So what’re they for? SI and DI are typically used as pointers to buffers and such. SP and BP are both used for the stack; SP points to the top of the stack, and BP points to the bottom of the stack.

Wait a minute…doesn’t SS point to the stack? What do we need SP or BP for? Aha…now we’re getting into segmentation, or the division of memory into segments. That'll be left for the next blog.


TESTING THE WATERS
Let’s look at a few assembly language statements and pick them apart: 

mov   ah,09

What do you think that does? It means move the byte 09h into register AH. MOV is the mnemonic, or name of the instruction, AH is the destination operand, and the number 09h is the source operand. MOV actually copies data from one place to another because it leaves the source untouched. Different instructions have different numbers of operands; they can have anywhere from none to two. Very few have three, and they’re all floating-point instructions that you probably won’t use for a great while. With that knowledge in mind, try and guess what this does: 

add   ax,bx 

If you guessed that it adds BX to AX, you’re partially right. Where does it put the result? In the destination operand, in this case AX. In C-style pseudocode, this is equivalent to  AX += BX. What if I wanted to add the contents of, say, ECX and the 32-bit integer at memory location 48C3h? Easy. You can use either of the following, depending on where you want to store the result:

add   ecx,[48c3h] ;store result of addition in ecx
add   [48c3h],ecx ;store result of addition at DS:[48c3h]

Try figuring out this one: 

mov   ax,920fh
mov   bx,0dc3eh
MOV   [9342h],Ah  ;<--letter case doesn’t matter
add   al,[bp]     ;<--you can use only BX,BP,SI and DI for this
mov   [bx+si],al  ;<--you can also add any combination of
                      those registers, but only two at once.

This does nothing useful, really, but you kind of get the idea of how this is going to work. Some more instructions we’re going to use are SUB, AND, OR, XOR. These all take two arguments, just like MOV and ADD.

Well, that’s it for today. I didn’t get to do everything that I wanted to do, but I’ll keep plugging along. Next time we’ll get to know debug.exe a little better, and I promise you then we’ll write our first somewhat useful program.
Trackback(0)
feed3 Comments
John
June 25, 2008
Votes: +0

I think that will require at least one more read to understand smilies/smiley.gif How far are you going to take us with these blogs?

report abuse
vote down
vote up
dargueta
June 26, 2008
Votes: +0

I'm going to go through the entire process of writing this program, and then I intend to start a new series called Creating an Operating System.

report abuse
vote down
vote up
Jordan
June 26, 2008
Votes: +0

So far your blogs are genius. I can't wait to read the next (and then the next series).

report abuse
vote down
vote up

Write comment
 
 
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger
 

security image
Write the displayed characters


busy