Chapter 2 - Introduction to NASM
Sections in NASM
A NASM program is divided into three sections.
- section .text :
This section contains the executable code from where execution starts.
It is analogous to the main() function in C.
- section .bss :
Here, variables are declared without initialisation.
- section .data :
Variables are declared and initialised in this section.
For declaring space in the memory the following directives are used,
- RESx:
reserve just space in memory for a variable without giving any
initial values.
- Dx:
declaring space in the memory for any variable and also providing
the initial values at that moment. Where x can be replaced with
x |
Meaning |
Bytes |
---|
b |
BYTE |
1 |
w |
WORD |
2 |
d |
DOUBLE WORD |
4 |
q |
QUAD WORD |
8 |
t |
TEN BYTE |
20 |
Examples:
section .data
var1: db 10 ;Reserve one byte in memory for storing var1 and var1=10
str1: db Hello World!
section .bss
var1: db resb 1
var2: db resq 1
var3 : db 1,2,3,4
string: db Hello
string2: db H,e,l,l,o
Here both string and string2 are identical. They are 5 bytes long and stores
the string Hello. Each character in the string will be first converted to ASCII
code and that numeric value will be stored in each byte location.
TIMES
It is used to create and initialize large arrays with a common
initial value for all its elements.
Eg: var: times 100 db 1
Creates an array of 100 bytes and each element
will be initialized with the value 1.
Dereferencing in NASM
To access the data stored at an address, the dereferencing operator used is
'[' ']'.
Examples:
mov eax, [var]
;Value at address location var would be copied to eax
Mov eax,var ;Address location var is copied to eax
Type casting
It is required for the operands for which the assembler cannot predict the
number of memory locations to dereference to get the data(like INC , MOV
etc). For other instructions (like ADD, SUB etc) it is not mandatory. The
directives used for specifying the datatype are: BYTE, WORD, DWORD,
QWORD, TWORD.
Eg:
MOV dword[ebx], 1
INC BYTE[label]
ADD eax, dword[label]
X86 Instruction Set
- MOV Move/Copy
Copy the content of one register/memory to another or change the value
of a register/ memory variable to an immediate value.
Syntax: mov dest, src
- src should be a register / memory operand.
- Both src and dest cannot together be memory operands.
Eg:
mov eax, ebx ;Copy the content of ebx to eax
mov ecx, 109 ;Changes the value of ecx to 109
mov al, bl
mov byte[var1], al ;Copy the content of al register to the variable var1
in memory
mov word[var2], 200
mov eax, dword[var3]
- MOVZX Move and Extend
Copy and extend a variable from a lower spaced memory / register
location to a higher one.
syntax : mov src, dest
- size of dest should be greater than or equal to size of src.
- src should be a register / memory operand.
- Both src and dest cannot together be memory operands.
- Works only with signed numbers.
Eg:
movzx eax, ah
movzx cx, al
- For extending signed numbers we use instructions like CBW (Con-
vert Byte to Word), CWD (Convert Word to Double).
- CBW extends the AL registerto AX
- CWD extends the AX registerto DX:AX registerpair
- ADD Addition
Sytax :
add dest, src ; dest = dest + src;
Used to add the values of two register/ memory var and store the result
in the first operand.
- src should be a register / memory operand.
- Both src and dest cannot together be memory operands.
- Both the operands should have the same size.
Eg:
add eax, ecx ; eax = eax + ecx
add al, ah ; al = al + ah
add ax, 5
add edx, 31h
- SUB Subtraction
Sytax :
sub dest, src ; dest = dest - src;
Used to subtract the values of two register/ memory var and store the
result in the first operand.
- src should be a register / memory operand.
- Both src and dest cannot together be memory operands.
- Both the operands should have the same size.
Eg:
sub eax, ecx ; eax = eax - ecx
sub al, ah ; al = al - ah
sub ax, 5
sub edx, 31h
- INC Increment operation
Used to increment the value of a register/ memory variable by 1.
Eg:
INC eax ; eax++
INC byte[var]
INC al
- DEC Decrement operation
Used to decrement the value of a register/ memory variable by 1.
Eg:
DEC eax ; eax--
DEC byte[var]
DEC al
- MUL Multiplication
Syntax : mul src
Used to multiply the value of a register/ memory variable with the
EAX / AX / AL reg. MUL works according to the following rules.
- If src is 1 byte then AX = AL * src.
- If src is 1 word (2 bytes) then DX:AX = AX * src (ie. Upper 16
bits of the result (AX*src) will go to DX and the lower 16 bits
will go to AX).
- If src is 2 words long(32 bit) then EDX:EAX = EAX * src (ie.
Upper 32 bits of the result will go to EDX and the lower 32 bits
will go to EAX).
- IMUL Multiplication of signed numbers
IMUL instruction works with the multiplication of signed numbers. it
can be used mainly in three different forms.
Syntax :
- imul src
- imul dest, src
- imul dest, src1, src2
- If we use imul as in (1) then its working follows the same rules of MUL.
- If we use that in (2) form then dest = dest * src.
- If we use that in (3) form then dest = src1 * scr2.
- DIV Division
Synatx : div src
Used to divide the value of EDX:EAX or DX:AX or AX register with
register/ memory variable in src. DIV works according to the following
rules.
- If src is 1 byte then AX will be divide by src, remainder will go to
AH and quotient will go to AL.
- If src is 1 word (2 bytes) then DX:AX will be divided by src,
remainder will go to DX and quotient will go to AX.
- If src is 2 words long(32 bit) then EDX:EAX will be divide by src,
remainder will go to EDX and quotient will go to EAX.
- NEG Negation of Signed numbers.
Sytax : NEG op1
NEG Instruction negates a given register/ memory variable.
- CLC - Clear Carry
This instruction clears the carry flag bit in CPU FLAGS.
- ADC Add with Carry
Syntax : ADC dest, src
ADC is used for the addition of large numbers. Suppose we want to
add two 64 bit numbers. We keep the first number in EDX:EAX (ie.
most significant 32 bits in EDX and the others in EAX) and the second
number in EBX:ECX.
Then we perform addition as follows
Eg:
clc ; Clearing the carry FLAG
add eax, ecx ; Normal addition of eax with ecx
adc edx, ebx ; Adding with carry for the higher bits.
- SBB Subtract with Borrow
Syntax : SBB dest, src
SBB is analogous to ADC and it is used for the subtraction of large
numbers. Suppose we want to subtract two 64 bit numbers. We keep
the first numbers in EDX:EAX and the second number in EBX:ECX.
Then we perform subtraction as follows
Eg:
clc ; Clearing the carry FLAG
sub eax, ecx ; Normal subtraction of ecx from eax
sbb edx, ebx ; Subtracting with carry for the higher bits.
Branching In NASM
- JMP Unconditionally Jump to label
JMP is similar to the goto label statements in C / C++. It is used to
jump control to any part of our program without checking any conditions.
- CMP Compares the Operands
Syntax :CMP op1, op2
When we apply CMP instruction over two operands say op1 and op2
it will perform the operation op1 - op2 and will not store the result.
Instead it will affect the CPU FLAGS. It is similar to the SUB operation,
without saving the result. For example if op1 6 = op2 then the Zero
Flag(ZF) will be set to 1.
NB: For generating conditional jumps in NASM we will first perform
the CMP operation between two register/ memory operands and then
we use the following jump operations which checks the CPU FLAGS.
Conditional Jump Instructions:
Instruction |
Working |
JZ |
Jump If Zero Flag is Set |
JNZ |
Jump If Zero Flag is Unset |
JC |
Jump If Carry Flag is Set |
JNC |
Jump If Carry Flag is UnSet |
JP |
Jump If Parity Flag is Set |
JNP |
Jump If Parity Flag is UnSet |
JO |
Jump If Overflow Flag is Set |
JNO |
Jump If Overflow Flag is UnSet |
Advanced Conditional Jump Instructions:
In 80x86 processors Intel has added some enhanced versions of the
conditional operations which are much more easier to use compared to
traditional Jump instructions. They are easy to perform comparison
between two variables.
First we need to use CMP op1, op2 before even using these set of Jump
instructions. There is separate class for comparing the signed and un-
signed numbers.
- For Unsigned numbers:
Instruction |
Working |
JE |
Jump if op1 = op2 |
JNE |
Jump if op1 ≠ op2 |
JA (jump if above) |
Jump if op1 > op2 |
JNA |
Jump if op1 ≤ op2 |
JB (jump if below) |
Jump if op1 < op2 |
JNB |
Jump if op1 ≥ op2 |
- For Signed numbers:
Instruction |
Working |
JE |
Jump if op1 = op2 |
JNE |
Jump if op1 ≠ op2 |
JG (jump if above) |
Jump if op1 > op2 |
JNG |
Jump if op1 ≤ op2 |
JL (jump if below) |
Jump if op1 < op2 |
JNL |
Jump if op1 ≥ op2 |
- LOOP instruction
Syntax: loop label
When we use Loop instruction ecx register acts as the loop variable.
Loop instruction first decrements the value of ecx and then check if
the new value of ecx 6 = 0. If so it will jump to the label following that
instruction. Else control jumps to the very next statement.
Converting Standard C/C++ Control Structures to NASM:
- If-else
C Code:
if( eax ≤ 5 )
eax = eax + ebx;
else
ecx = ecx + ebx;
Equivalent NASM Code:
- For loop
C Code:
eax = 0;
for(ebx = 1 to 10)
eax = eax + ebx;
Equivalent NASM Code:
- While loop
C Code:
sum = 0;
ecx = n;
while( ecx >= 0 )
sum = sum + ecx;
ecx − −;
Equivalent NASM Code:
Boolean Operators
- AND (Bitwise Logical AND)
Syntax : AND op1, op1
Performs bitwise logical AND operation of op1 and op2 , assign the
result to op1.
op1 = op1 & op2; //Equivalent C Statement
Let x = 10101001b and y = 10110010b be two 8-bit binary numbers.
Then x & y
x |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
y |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
x AND y |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
- OR (Bitwise Logical OR)
Syntax: OR op1, op1
Performs bitwise logical OR operation of op1 and op2 , assign the result
to op1.
op1 = op1 || op2;//Equivalent C Statement
Let x = 10101001b and y = 10110010b be two 8-bit binary numbers.
Then x || y
x |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
y |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
x OR y |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
- XOR (Bitwise Logical Exclusive OR)
Syntax: XOR op1, op1
Performs bitwise logical XOR operation of op1 and op2 , assign the
result to op1.
op1 = op1 ⊕ op2; //Equivalent C Statement
Let x = 10101001b and y = 10110010b be two 8-bit binary numbers.
Then x ⊕ y
x |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
y |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
0 |
x XOR y |
0 |
0 |
0 |
1 |
1 |
0 |
1 |
1 |
- NOT (Bitwise Logical Negation)
Syntax: NOT op1
Performs bitwise logical NOT of op1 and assign the result to op1.
op1 = ∼ op1; //Equivalent C Statement
x |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
NOT x |
0 |
1 |
0 |
1 |
0 |
1 |
1 |
0 |
- TEST (Logical AND, affects only CPU FLAGS)
Syntax: TEST op1, op2
- It performs the bitwise logical AND of op1 and op2 but it wont
save the result to any registers. Instead the result of the operation
will affect CPU FLAGs.
- It is similar to the CMP instruction in usage.
- SHL Shift Left
Syntax : SHL op1, op2
op1 = op1 << op2; //Equivalent C Statement
- SHL performs the bitwise left shift. op1 should be a register /
memory variable but op2 must be an immediate(constant) value.
- It will shift the bits of op1, op2 number of times towards the left
and put the rightmost op2 number of bits to 0.
Example:
shl al, 3
al |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
al << 3 |
0 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
- SHR Shift Right
Syntax : SHR op1, op2
op1 = op1 >> op2; //Equivalent C Statement
- SHR performs the bitwise right shift. op1 should be a register/
memory variable but op2 must be an immediate(constant) value.
- It will shift the bits of op1, op2 number of times towards the right
and put the leftmost op2 number of bits to 0.
Example:
shr al, 3
al |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
al >> 3 |
0 |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
- ROL Rotate Left
Syntax : ROL op1, op2
ROL performs the bitwise cyclic left shift. op1 could be a register/
memory variable but op2 must be an immediate(constant) value.
Example :
rol al, 3
al |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
rol al, 3 |
0 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
- ROR Rotate Right
Syntax : ROR op1, op2
ROR performs the bitwise cyclic right shift. op1 could be a register/
memory variable but op2 must be an immediate(constant) value.
Example:
ror eax, 5
al |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
1 |
ror al, 3 |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
- RCL Rotate Left with Carry
Syntax : RCL op1, op2
Its working is same as that of rotate left except it will consider the
carry bit as its left most extra bit and then perform the left rotation.
- RCR Rotate Right with Carry
Syntax : RCR op1, op2
Its working is same as that of rotate right except it will consider the
carry bit as its left most extra bit and then perform the right rotation.
Stack Operations
- PUSH
Syntax: PUSH register/constant
Pushes a value into system stack. It decreases the value of ESP and
copies the value of a register / constant into the system stack.
Examples
PUSH ax ; ESP = ESP - 2 and copies value of ax to [EBP]
PUSH eax ; ESP = ESP - 4 and copies value of ax to [EBP]
PUSH ebx
PUSH dword 5
PUSH word 258
- POP
Pops off a value from the system stack.POP Instruction takes the value
stored in the top of system stack to a register and then increases the
value of ESP.
Examples:
POP bx ; ESP= ESP + 2
POP ebx ; ESP= ESP + 4
POP eax
- PUSHA
Pushes the value of all general purpose registers. PUSHA is used to
save the value of general purpose registers especially when calling some
subprograms which will modify their values.
- POPA
Pops off the value of all general purpose registers which we have pushed
before using PUSHA instruction.
- PUSHF
Pushes all the CPU FLAGS.
- POPF
POP off and restore the values of all CPU Flags which have been pushed
before using PUSHF instructions.
NB: It is important to pop off the values pushed into the stack properly.
Even a minute mistake in any of the PUSH / POP instruction could
make the program not working.
- Pre-processor Directives in NASM
In NASM %define acts similar to the
C's preprocessor directive define. This can be used to declare constants.
Eg:
%define SIZE 100
Comments in NASM
Only single line comments are available in NASM. A semicolon (;) is inserted
in front of the line to be commented.
Ex: ; A program to find the largest of 2 numbers
NASM Installation
NASM is freely available on internet. You can visit www.nasm.us . It's
documentation is also available there. In order to install NASM in windows
you can download it as an installation package from the site and install it
easily.
In Ubuntu Linux you can give the command :
sudo apt-get install nasm
and in fedora you can use the command:
su -c 'yum install nasm'
in a terminal and easily install nasm.
Compilation
- Assembling the source file
nasm -f elf filename.asm
This creates an object file, filename.o in the current working directory.
- Creating an executable file
For a 32 bit machine
ld filename.o -o output_filename
For 64 bit machine
ld -melf_i386 filename.o -o output_filename
This creates an executable file of the name output filename.
- Program execution
./output_filename
For example, if the program to be run is first.asm
nasm -f elf first.asm
ld first.o -o output
./output
Some errors that we often make
- Forgetting exit system call
We may sometime find that we have written the program perfectly and
it is giving correct output except that it shows a "Segmentation fault"
in the end.
This can be because we have forgotten to give an exit system call at
the end. It's important to tell the operating system exactly where it
should begin execution and where it should stop. The program on con-
tinuing sequential execution of the next address in memory, it could
have encountered anything. We don't know what the kernel tried to
execute but it caused it to choke and terminate the process for us in
stead leaving us the error message of 'Segmentation fault'. Calling
exit system call at the end of all our programs will mean the kernel
knows exactly when to terminate the process and return memory back
to the general pool thus avoiding an error.
- Adding and subtracting 30h
We often find that the certain symbols are displayed instead of numbers
that should have been printed. There is also a possibility that the
input used is wrong in the program.
This can be a problem of ASCII conversion. The numbers we type
through keyboard are stored as ASCII values. 30h ( in hexadecimal or
48 in decimal) is the ASCII code for a digit '0'. Then 31h, 32h, ... 39h
correspond to '1', '2',... '9'. So to get the exact number for calculation
must subtract 30h from the input. When we want to display it on the
page we add 30h so that the number is converted to its corresponding
ASCII value.
Declaration and initialisation of string and its length one after the other
It is always better to declare the length of the string which is already
defined beneath the string. Otherwise when we have declarations and
initialisation of more than one string we may find that some of the
strings are displayed more than once.
When the strings and its length are not given one after the other in sec-
tion .data, the compiler executes every statements between the string
initialisation and length initialisation. Even if some of the strings may
not be asked to be displayed but if they are initialised between another
string and its length initialisation, they are displayed. Therefore make
sure nothing is declared between string and its length declaration. Try
the following code and correct it accordingly.