NASM Manual NITC

Chapter 1 - Basics of Computer Organization

Machine Language

Machine language consists of instructions in the form of 0s and 1s. Every CPU has its own machine language. It is very difficult to write programs using the combination of 0s and 1s. So we rely upon either assembly language or high level language for writing programs.

Assembly Language

An assembly language is a low-level programming language for microprocessors and other programmable devices.Assembly language uses a mnemonic to represent each low-level machine instruction.

Why Assembly Language ?

The symbolic programming of Assembly Language is easier to understand and saves a lot of time and effort of the programmer. When you study assembly lan- guage, you will get a better idea of computer organization and how a program executes in a computer. A program written in assembly language will be more efficient than the same program written in a high level language. Some portions of Linux kernel and some system softwares are written in assembly language. In programming languages like C, C++ we can even embed assembly language instructions into it using functions like asm( ); (Refer to Chapter 8 of The Intel Microprocessors by Barry B. Brey for more details)

What is NASM ?

The Netwide Assembler is an assembler and disassembler for the Intel X86 ar- chitecture(explained in subsequent section).It can be used to write 16-bit, 32-bit (IA-32) and 64-bit (x86-64) programs. NASM is considered to be one of the most popular assemblers for Linux.

Computer Architecture

The basic operational design of a computer is called architecture.It is a set of rules and methods that describe the functionality,organization and implementation of computer systems.X86 architecture follows Von Newmann architecture which is based on stored program concept.

Von Neumann Architecture

A Von Neumann architecture machine, designed by physicist and mathematician John von Neumann (19031957) is a theoretical design for a stored program com- puter that serves as the basis for almost all modern computers. A von Neumann machine consists of a central processor with an arithmetic/logic unit and a control unit, a memory, mass storage, and input and output.

X86 Architecture NASM

The x86 architecture is an Instruction Set Architecture (ISA) series for computer processors. Developed by Intel Corporation, x86 architecture defines how a proces- sor handles and executes different instructions passed from the operating system (OS) and software programs. The x in x86 denotes ISA version. Key features include:

Provides a logical framework for executing instructions through a processor.
Allows software programs and instructions to run on any processor in the Intel 8086 family.
Provides procedures for utilizing and managing the hardware components of a central processing unit (CPU).

Von Neumann Architecture

Historically there have been 2 types of Computers: Fixed Program Computers Their function is very specific and they cant be programmed, e.g. Calculators. Stored Program Computers These can be programmed to carry 36 out many dif- ferent tasks, applications are stored on them, hence the name.

Processor

Processor is the brain of the computer. It performs all mathematical, logical and control operations of a computer. It is that component of the computer which executes all the instructions given to it in the form of programs. It interacts with I/O devices, memory (RAM) and secondary storage devices and thus implements the instructions given by the user.

The term processor is used interchangeably with the term central processing unit (CPU), although strictly speaking, the CPU is not the only processor in a com- puter.

Intel Core i9

Registers

Registers are the most immediately accessible memory units for the processor. They are the fastest among all the types of memory. They reside inside the processor and the processor can access the contents of any register in a single clock cycle. It is the working memory for a processor, i.e , if we want the processor to perform any task it needs the data to be present in any of its registers.

What is a clock cycle?

In computers, the clock cycle is the amount of time between two pulses of an oscillator. It is a single increment of the central processing unit (CPU) clock during which the smallest unit of processor activity is carried out. The clock cycle helps in determining the speed of the CPU, as it is considered the basic unit of measuring how fast an instruction can be executed by the computer processor.

The series of processors released on or after 80186 like 80186, 80286, 80386, Pentium etc are referred to as x86 or 80x86 processors. The processors released on or after 80386 are called I386 processors. They are 32 bit processors internally and externally. So their register sizes are generally 32 bit. In this section we will go through the I386 registers.

Intel maintains its backward compatibility of instruction sets, i.e, we can run a program designed for an old 16 bit machine in a 32bit machine. That is the reason why we can install 32-bit OS in a 64 bit PC. The only problem is that, the program will not use the complete set of registers and other available resources and thus it will be less efficient.

A register may hold an instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). A register must be large enough to hold an instruction - for example, in a 64-bit computer, a register must be 64 bits in length.

General purpose registers are used to store temporary data within the microprocessor.There are eight general purpose registers. They are EAX, EBX, ECX, EDX, EBP, ESI, EDI, ESP. We can refer to the lower 8 and 16 bits of these registers (see image). This is to maintain the backward compatibility of instruction sets. These registers are also known as scratchpad area as they are used by the processor to store intermediate values in a calculation and also for storing address loca- tions.
The General Purpose Registers are used for :
- EAX: Accumulator Register Contains the value of some operands in some operations (E.g.: multiplication).
- EBX: Base Register Pointer to some data in Data Segment.
- ECX: Counter Register Acts as loop counter, used in string oper- ations etc.
- EDX: Used as pointer to I/O ports.
- ESI: Source Index Acts as source pointer in string operations. It can also act as a pointer in Data Segment (DS).
- EDI: Destination Index- Acts as destination pointer in string op- erations. It can also act as a pointer in Extra Segment (ES).
- ESP: Stack Pointer Always points to the top of system stack.
- EBP: Base Pointer It points to the starting of system stack (ie.bottom/base of stack).
Flags and EIP
FLAGS are special purpose registers inside the CPU that contains the status of CPU / the status of last operation executed by the CPU. Some of the bits in FLAGS need special mention:
- Carry Flag: When a processor do a calculation, if there is a carry then the Carry Flag will be set to 1.
- Zero Flag: If the result of the last operation was zero, Zero Flag will be set to 1, else it will be zero.
- Sign Flag : If the result of the last signed operation is negative then the Sign Flag is set to 1, else it will be zero.
- Parity Flag: If there are odd number of ones in the result of the last operation, parity flag will be set to 1.
- Interrupt Flag: If interrupt flag is set to 1, then only it will listen to external interrupts.
EIP:
EIP is the instruction pointer, it points to the next instruction to be executed. In memory there are basically two classes of things stored:
- Data
- Program
When we start a program, it will be copied into the main memory and EIP is the pointer which points to the starting of this program in memory and execute each instruction sequentially. Branch statements like JMP, RET, CALL, JNZ (we will see shortly) alter the value of EIP.
Segment Registers

In x86 processors for accessing the memory basically there are two types of registers used Segment Register and Offset. Segment register contains the base address of a particular data section and Offset will contain how many bytes should be displaced from the segment register to access the particular data. CS contains the base address of Code Segment and EIP is the offset. It keeps on updating while executing each instruction. SS or Stack Segment contains the address of top most part of system stack. ESP and EBP will be the offset for that. Stack is a data structure that follows LIFO ie. Last-In-First-Out. There are two main operations associated with stack: push and pop. If we need to insert an element into a stack, we will push it and when we give the pop instruction, we will get the last value which we have pushed. Stack grows downward. So SP will always points to the top of stack and if we push an element, ESP (Stack Pointer) will get reduced by sufficient number of bytes and the data to be stored will be pushed over there. DS, ES, FS and GS acts as base registers for a lot of data operations like array addressing, string operations etc. ESI, EDI and EBX can act as offsets for them. Unlike other registers, Segment registers are still 16 bit wide in 32-bit processors.

In modern 32 bit processor the segment address will be just an entry into a descriptor table in memory and using the offset it will get the exact memory locations through some manipulations. This is called segmentation. Here in the memory ,the Stack Segment starts from the memory location 122 and grows downwards. Here the Stack Pointer ESP is pointing to the location 119. Now if we pop 1 byte of data from stack, we will get (01010101)2 and the ESP will get increased by one. Now suppose we have (01101100 11001111)2 in the register ax and we execute the push command:
push ax

Then the ESP will get reduced by two units (because we need two store two bytes of data) and ax will be copied over there: Here we can see that now ESP points to the lower byte of data from the 2 bytes data which we have pushed. In x86 architecture when we push or save some data in memory, the lower bytes of it will be addressed immediately and thus it is said to follow Little Endian Form. MIPS architecture follows Big Endian Form.

Little Endian and Big Endian

Endianness refers to the sequential order in which bytes are arranged into larger numerical values when stored in memory or when transmitted over digital links. Endianness is of interest in computer science because two con- flicting and incompatible formats are in common use: words may be repre- sented in big-endian or little-endian format, depending on whether bits or bytes or other components are ordered from the big end (most significant bit) or the little end (least significant bit).

Big Endian Byte Order:The most significant byte (the "big end") of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.

Little Endian Byte Order:The least significant byte (the "little end") of the data is placed at the byte with the lowest address. The rest of the data is placed in order in the next three bytes in memory.

Suppose an integer is stored as 4 bytes(32-bits), then a variable y with value 0x01234567( Hexa-decimal representation) is stored as four bytes 0x01, 0x23, 0x45, 0x67, on Big-endian while on Little-Endian (Intel x86), it will be stored in reverse order:

Bus

Bus is a name given to any communication medium, that transfers data between two components. (A bus is a subsystem that is used to connect computer components and transfer data between them.)A bus may be parallel or serial. Parallel buses transmit data across multiple wires. Serial buses transmit data in bit-serial format. We can classify the buses associated with the processor into three.

Data Bus :

It is the bus used to transfer data between the processor and memory or any other I/O devices (for both reading and writing). As the size of data bus increases, it can transfer more data in a single stretch. The size of data bus in common processors by Intel are given below.

Processor Bus size

8088, 80188 8 bit

8086, 80816, 80286, 80386SX 16 bit

80386DX, 80486 32 bit

80586, Pentium Pro and later processors 64 bit

Processor	Bus size
8088, 80188	8 bit
8086, 80816, 80286, 80386SX	16 bit
80386DX, 80486	32 bit
80586, Pentium Pro and later processors	64 bit

Address Bus :

The address bus is used by the CPU or a direct memory access (DMA) enabled device to locate the physical address to communicate read/write commands. Memory management Unit (MMU) or Memory Control Circuitry (MCC) or Memory Control Unit (MCU) is the set of elec- tronic circuits present in the motherboard which helps the processor in reading or writing the data to or from a location in the RAM. All address buses are read and written by the CPU in the form of bits.An address bus is measured by the amount of memory a system can re- trieve. A system with a 32-bit address bus can address 4 gigabytes of memory space.

The maximum size of RAM which can be used in a PC is determined by the size of the address bus. If the size of address bus is n bits, it can address a maximum of 2n bytes of RAM. This is the reason why even if we add more than 4 GB of RAM in a 32 bit PC, we cannot find more than 4 GB of available memory.

Processor	Address Bus Width	Maximum Addressable RAM size
8088, 8086, 80186, 80286, 80188	20	16 KB
80386SX, 80286	24	16 MB
80486, 80386DX, Pentium,Pentium Override	32	4 GB
Pentium II, Pentium Pro	36	64 GB

Control Bus :
Control bus contains information which controls the operations of pro- cessor or any other memory or I/O device.

For example, the data bus is used for both reading and writing purpose and how is it that the Memory or MMU knows the data has to be writ- ten to a location or has to be read from a location, when it encounters an address in the address bus? This ambiguity is being cleared using read and write bits in the control bus. When the read bit is enabled, the data in the EDB (external data bus or simply data bus) will be written to that location. When the write bit is enabled, MMU will write the data in the EDB into the address location in the address bus.

Interrupts

Interrupts are the most critical routines executed by a processor. Interrupts may be triggered by external sources or due to internal operations.

In linux based systems 80h is the interrupt number for OS generated in- terrupts and in windows based systems it is 21h. The Operating System Interrupt is used for implementing systems calls.

Whenever an interrupt occurs, processor will stop its present work, preserve the values in registers into memory and then execute the ISR (Interrupt Service Routine) by referring to an interrupt vector table. ISR is the set of instructions to be executed when an interrupt occurs. By referring to the interrupt vector table, the processor can get which ISR it should execute for the given interrupt.After executing the ISR, processor will restore the registers to its previous state and continue the process that it was executing before. Almost all the I/O devices work by utilizing the interrupt requests.

An interrupt is a signal from a device, such as the keyboard, to the CPU, telling processor to immediately stop whatever it is currently doing and do something else. For example, the keyboard controller sends an interrupt when a key is pressed. To know how to call on the kernel when a specific interrupt arise, the CPU has a vector table setup by the OS, and stored in memory. There are 256 interrupt vectors on x86 CPUs, numbered from 0 to 255 which act as entry points into the kernel.The number of interrupt vectors or entry points supported by a CPU differs based on the CPU architecture.

System call

System calls are Application Programmer's interface to the kernel space. In a NASM program, input has to be taken from the standard input device (Key- board) and output has to be given to the standard output device (monitor). This is implemented using the Operating System's read(capital?) and write system call respectively. Interrupt no: 80h(in hexadecimal) is given to the software generated interrupt in Linux Systems. Applications implement the System Calls using this interrupt. When an application triggers int 80h, then OS will understand that it is a request for a system call and it will refer the general purpose registers to find out and execute the exact Interrupt Service Routine (ie. System Call here).

The standard convention to use a system call is,

System call number is stored in eax register.
Other parameters needed to implement the system call is stored in other general purpose registers.
Trigger the 80h interrupt using the instruction INT 80h.

Then OS will implement the system call.
The important system calls Read,Write and Exit in NASM is explained in detail in the upcoming sections.

System memory in Linux can be divided into two distinct regions: kernel space and user space. Kernel space is where the kernel (i.e., the core of the operating system) executes (i.e., runs) and provides its services.