| Programs Under the Hood...Our Friend the CPU Posted by: dargueta in Untagged on Jun 25, 2008 |
(Part 3)
Hi, everyone, it's me again with the latest installment of Programs Under the Hood. Today we're going to get to know the inner workings of the processor, and start learning a little assembly language.
INTRODUCING THE GENERIC INTEL CPU
Since the 8086 in 1974, the Intel processors have retained a lot of the same characteristics. Pretty much the only thing that has changed that we need to worry about, aside from speed, is the size of the instruction set. The newer the processor, the more instructions (generally) it recognizes. Some of them are useful, such as CPUID, ENTER and LEAVE. Others, like CMPXCHG16B, are rarely used. I, of course, will only teach you the useful stuff. The generic Intel processor has little sections of really fast memory called the registers. These are internal to the processor; you can think of them as scratchpads capable of holding a fixed number of bytes of data. Each of these registers has a name so the assembly language programmer can access them. The 16-bit general registers are: AX, BX, CX, and DX. You can use these for whatever you want. Each of these registers is further subdivided into two 8-bit registers. To access the high byte of each 16-bit register, simply replace the X in the register name with an H. To access the low byte, replace the X with an L. Simple, right? So the low byte of the AX register is AL , and the high byte is AH. 
Starting with the 80386, each of the 16-bit registers was extended to 32 bits. Now you have EAX, EBX, ECX, and EDX. The low word of EAX is…EAL? No! It’s still AX. The low byte of EAX is the same as the low byte of AX, which is AL. (If you’re confused, look at the diagram.) What about the high word? Sadly, you can’t access it directly. You either need to write to the entire register, or use a mask and shift bits around. As a side note, the general registers are the only registers that can be subdivided into smaller registers. The Intel processor has more registers that are not so general-purpose. There are two kinds: the segment registers and the pointer registers. The segment registers point to a segment in memory, just like their name suggests. They’re each 16 bits and they’re named CS, DS, ES, and SS. Beginning with the 80386, two more 16-bit segment registers were added, FS and GS. CS is the code segment register, and points to the current segment which the processor is executing.* DS points to a data segment, SS to the stack segment, and ES to an extra segment. FS and GS in the later processors are even more extra segment pointers.
*There is also a hidden register that points to the current instruction in the current code segment called IP, which stands for instruction pointer. You can’t access it directly. Together, they make a full pointer, CS:IP. Read on in the next issue to see what I mean by “pointer”.
I know what you’re thinking…in a COM program, these registers all have the same value, because the program takes up one segment. In an EXE program, they usually point to different segments. But wait, there’s more. The pointer registers are used for pointing to memory locations. (What a surprise.) Like the general registers, they were originally 16 bits, but later extended to 32 bits. They are: SI (source index), DI (destination index), SP (stack pointer), and BP (base pointer), with 32-bit versions being ESI, EDI, etc. So what’re they for? SI and DI are typically used as pointers to buffers and such. SP and BP are both used for the stack; SP points to the top of the stack, and BP points to the bottom of the stack.
TESTING THE WATERS
Let’s look at a few assembly language statements and pick them apart:
mov ah,09
What do you think that does? It means move the byte 09h into register AH. MOV is the mnemonic, or name of the instruction, AH is the destination operand, and the number 09h is the source operand. MOV actually copies data from one place to another because it leaves the source untouched. Different instructions have different numbers of operands; they can have anywhere from none to two. Very few have three, and they’re all floating-point instructions that you probably won’t use for a great while. With that knowledge in mind, try and guess what this does:
add ax,bx
If you guessed that it adds BX to AX, you’re partially right. Where does it put the result? In the destination operand, in this case AX. In C-style pseudocode, this is equivalent to AX += BX. What if I wanted to add the contents of, say, ECX and the 32-bit integer at memory location 48C3h? Easy. You can use either of the following, depending on where you want to store the result:
add ecx,[48c3h] ;store result of addition in ecx
add [48c3h],ecx ;store result of addition at DS:[48c3h]
Try figuring out this one:
mov ax,920fh
mov bx,0dc3eh
MOV [9342h],Ah ;<--letter case doesn’t matter
add al,[bp] ;<--you can use only BX,BP,SI and DI for this
mov [bx+si],al ;<--you can also add any combination of
those registers, but only two at once.
This does nothing useful, really, but you kind of get the idea of how this is going to work. Some more instructions we’re going to use are SUB, AND, OR, XOR. These all take two arguments, just like MOV and ADD.
Well, that’s it for today. I didn’t get to do everything that I wanted to do, but I’ll keep plugging along. Next time we’ll get to know debug.exe a little better, and I promise you then we’ll write our first somewhat useful program.
Hi, everyone, it's me again with the latest installment of Programs Under the Hood. Today we're going to get to know the inner workings of the processor, and start learning a little assembly language.
INTRODUCING THE GENERIC INTEL CPU

Starting with the 80386, each of the 16-bit registers was extended to 32 bits. Now you have EAX, EBX, ECX, and EDX. The low word of EAX is…EAL? No! It’s still AX. The low byte of EAX is the same as the low byte of AX, which is AL. (If you’re confused, look at the diagram.) What about the high word? Sadly, you can’t access it directly. You either need to write to the entire register, or use a mask and shift bits around. As a side note, the general registers are the only registers that can be subdivided into smaller registers.
*There is also a hidden register that points to the current instruction in the current code segment called IP, which stands for instruction pointer. You can’t access it directly. Together, they make a full pointer, CS:IP. Read on in the next issue to see what I mean by “pointer”.
I know what you’re thinking…in a COM program, these registers all have the same value, because the program takes up one segment. In an EXE program, they usually point to different segments.
Wait a minute…doesn’t SS point to the stack? What do we need SP or BP for? Aha…now we’re getting into segmentation, or the division of memory into segments. That'll be left for the next blog.
TESTING THE WATERS
mov ah,09
What do you think that does? It means move the byte 09h into register AH. MOV is the mnemonic, or name of the instruction, AH is the destination operand, and the number 09h is the source operand. MOV actually copies data from one place to another because it leaves the source untouched. Different instructions have different numbers of operands; they can have anywhere from none to two. Very few have three, and they’re all floating-point instructions that you probably won’t use for a great while. With that knowledge in mind, try and guess what this does:
If you guessed that it adds BX to AX, you’re partially right. Where does it put the result? In the destination operand, in this case AX. In C-style pseudocode, this is equivalent to AX += BX. What if I wanted to add the contents of, say, ECX and the 32-bit integer at memory location 48C3h? Easy. You can use either of the following, depending on where you want to store the result:
add ecx,[48c3h] ;store result of addition in ecx
Try figuring out this one:
mov ax,920fh
those registers, but only two at once.
This does nothing useful, really, but you kind of get the idea of how this is going to work. Some more instructions we’re going to use are SUB, AND, OR, XOR. These all take two arguments, just like MOV and ADD.
Well, that’s it for today. I didn’t get to do everything that I wanted to do, but I’ll keep plugging along. Next time we’ll get to know debug.exe a little better, and I promise you then we’ll write our first somewhat useful program.
Set as favorite
Bookmark
Email This
Hits: 257
Trackback(0)
Write comment
How far are you going to take us with these blogs?