Professional Assembly Language

Richard Blum

Mentioned 7

Professional Assembly Language Every high level language program (such as C and C++) is converted by a compiler into assembly language before it is linked into an executable program. This book shows you how to view the assembly language code generated by the compiler and understand how it is created. With that knowledge you can tweak the assembly language code generated by the compiler or create your own assembly language routines. This code-intensive guide is divided into three sections — basics of the assembly language program development environment, assembly language programming, and advanced assembly language techniques. It shows how to decipher the compiler-generated assembly language code, and how to make functions in your programs faster and more efficient to increase the performance of an application. What you will learn from this book: The benefits of examining the assembly language code generated from your high-level language program How to create stand-alone assembly language programs for the Linux Pentium environment Ways to incorporate advanced functions and libraries in assembly language programs How to incorporate assembly language routines in your C and C++ applications Ways to use Linux system calls in your assembly language programs How to utilize Pentium MMX and SSE functions in your applications

More on Amazon.com

Mentioned in questions and answers.

I want to learn some practical assembly language having just learned the basic concepts in class. Are there any decent books or tutorials (nasm, etc) that would be recommended?

I agree that PC Assembly Language is very good. Other good ones using GAS are:

I have no prior knowledge of assembly programming, and would like to learn how to code x86 assembly on a Linux platform. However, I'm having a hard time finding a good resource to teach myself with.

The Art of Assembly book looks good, but it teaches HLA. I'm not interested in having to learn one way, then relearning it all over again. It also seems like RISC architectures have better resources for assembly, but unfortunately I do not have a RISC processor to learn with. Does anyone have any suggestions?

don't forget to grab a copy of Guide-Assembly-Language-Programming-in-Linux book.

Even though many people I know at school hated this book, I will link it anyway:

http://www.amazon.com/Professional-Assembly-Language-Programmer/dp/0764579010

The main reason I used this book is because it uses x86 on Linux with the GNU assembler. That last point helped since I had to use that assembler in our school's lab, and if you aren't aware - the syntax is different from Intel syntax.

Also, I would just add that learning how high level languages are compiled into assembly language really helped me move along.

I decided to learn assembly language because I came to know that learning it has many benefits, we can directly interact with the hardware, we can learn how computers better, and much more. When I started to learn it first, I came to know that it was a little bit weird and not like other programming languages so I thought that maybe I will find it hard to learn. So, I'm just asking what are the basic prerequisites for learning assembly language. For information, I have already learnt programming languages like C, C++, C#, PHP.

You don't really need any prerequisites if you pick right book.

I learned assembly language (basics, just not need more) as my first programming language myself (without any tutor) with Assembly Language Step-by-Step: Programming with Linux 3rd Edition. It teaches basics, but after reading this book you can read without any problem any other advanced assembly books.

You have to tell us what machine's assembly you want to learn. ARM, x86(_64), Sparc, etc are all different ISAs.

If you just want an introduction to the world of assembly programming in general, Randal Hyde's Art of Assembly is a good one (although what you write isn't exactly assembly, but more of a mix between high and low level languages, it will introduce you to the concept nicely).

If you have set your sights on x86, I can recommend this book: Professional Assembly Language. Apart from that book, sandpile.org is a great resource.

For x86, the choice of environment also matters. Here is a great tutorial for windows assembly programming by the University of Illinois Urbana Champaign ACM student chapter - SIGWINDOWS. For Unix, a great tutorial I have met is this one. A great, more general resource is Reverse Engineering for Beginners by Dennis Yurichev. This book, is targeted at both windows and unix environments, and although it concerns reverse engineering, it can help you learn a great deal about the machinations of programs running on your computer.

For ARM, this article serves as a great introduction. This article is also another great introduction to the matter

the codes:

extern inline int strncmp(const char * cs, const char * ct, int count)
{
register int __res;
__asm__("cld\n"
"1:\tdecl %3\n\t"
"js 2f\n\t"
"lodsb\n\t"
"scasb\n\t"
"jne 3f\n\t"
"testb %%al, %%al\n\t"
"jne 1b\n"
"2:\txorl %%eax,%%eax\n\t"
"jmp 4f\n"
"3:\tmovl $1,%%eax\n\t"
"j1 4f\n\t"
"negl %%eax\n"
"4:"
:"=a" (__res):"D" (cs), "S" (ct), "c" (count):"si","di","cx");
return __res;
}

I don't understand the f in "js 2f\n\t" and the b in "jne 1b\n", How to understand this ? which book I should look? Thank you.

In this context f means forward and b means backward. So js 2f means jump forward to label 2, if sign set.

You'll want to look into gcc inline assembly. I can't seem to find any reference online to include this bit, but I know you can find it in Professional Assembly Language.

Why can't we use named labels ? To quote from the book:

If you have another asm section in your C code, you cannot use the same labels again, or an error message will result due to duplicate use of labels.

So what can we do ?

The solution is to use local labels. Both conditional and unconditional branches allow you to specify a number as a label, along with a directional flag to indicate which way the processor should look for the numerical label. The first occurrence of the label found will be taken.

About modifiers:

Use the f modifier to indicate the label is forward from the jump instruction. To move backward, you must use the b modifier.

Assembly is a family of very low-level programming languages, just above machine code. In assembly, each statement corresponds to a single machine code instruction. These instructions are represented as mnemonics in the given assembly language and are converted into executable machine code by a utility program referred to as an assembler; the conversion process is referred to as assembly, or assembling the code.

Language design

Basic elements

There is a large degree of diversity in the way that assemblers categorize statements and in the nomenclature that they use. In particular, some describe anything other than a machine mnemonic or extended mnemonic as a pseudo-operation (pseudo-op). A typical assembly language consists of three types of instruction statements that are used to define program operations:

  • Opcode mnemonics
  • Data sections
  • Assembly directives

Opcode mnemonics and extended mnemonics

Instructions (statements) in assembly language are generally very simple, unlike those in high-level language. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (value coded in the instruction itself), registers specified in the instruction or implied, or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand. For example, the System/360 assemblers use B as an extended mnemonic for BC with a mask of 15 and NOP for BC with a mask of 0.

Extended mnemonics are often used to support specialized uses of instructions, often for purposes not obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but do have instructions that can be used for the purpose. In 8086 CPUs the instruction xchg ax,ax is used for nop, with nop being a pseudo-opcode to encode the instruction xchg ax,ax. Some disassemblers recognize this and will decode the xchg ax,ax instruction as nop. Similarly, IBM assemblers for System/360 and System/370 use the extended mnemonics NOP and NOPR for BC and BCR with zero masks. For the SPARC architecture, these are known as synthetic instructions

Some assemblers also support simple built-in macro-instructions that generate two or more machine instructions. For instance, with some Z80 assemblers the instruction ld hl,bc is recognized to generate ld l,c followed by ld h,b. These are sometimes known as pseudo-opcodes.

Tag Use

Use the tag for assembly language programming questions, on any processor. You should also use a tag for your processor or instruction set architecture (, , , , , etc). Consider a tag for your assembler as well (, , , etcetera).

If your question is about inline assembly in C or other programming languages, see . For questions about .NET assemblies, use instead. For Java ASM, use the tag instead.

Resources

Beginner's Resources

Assembly Language tutorials, guides, and reference material

I'm reading in Professional Assembly Language by Richard Blum that when you enter a call you should copy the value of the ESP register to EBP, and he also provided the following template:

function_label:
    pushl %ebp
    movl %esp, %ebp
    < normal function code goes here>
    movl %ebp, %esp
    popl %ebp
    ret

I don't understand why this is necessary. When you push something inside the function, you obviously intend to pop it back, thus restoring ESP to it's original value.

So why have this template?
And what's the use of the EBP register anyway?

I'm obviously missing something, but what is it?

When you push something inside the function, you obviously intend to pop it back

That's just part of the reason for using stack. The far more common usage is the one that's missing from your snippet, storing local variables. The next common code you see after setting up EBP is a substraction on ESP, equivalent to the amount of space required for local variable storage. That's of course easy to balance as well, just add the same amount back at the function epilogue. It gets more difficult when the code is also using things like C99 variable length arrays or the non-standard but commonly available _alloca() function. Being able to restore ESP from EBP makes this simple.

More to the point perhaps, it is not necessary to setup the stack frame like this. Most any x86 compiler supports an optimization option called "frame pointer omission". Turned on with GCC's -fomit-frame-pointer, /Oy on MSVC. Which makes the EBP register available for general usage, that can be very helpful on x86 with its dearth of cpu registers.

That optimization has a very grave disadvantage though. Without the EBP register pointing at the start of a stack frame, it gets very difficult to perform stack walks. That matters when you need to debug your code. A stack trace can be very important to find out how your code ended up crashing. Invaluable when you get a "core dump" of a crash from your customer. So valuable that Microsoft agreed to turn off the optimization on Windows binaries to give their customers a shot at diagnosing crashes.

Recently I'm learning assembly and now i have some confusion. I learned it from Professional Assembly language.

My System's arch:

#uname -m
x86_64

This is my code:

.section .data
output:
   .asciz "This is section %d\n"
.section .text
.globl _start
_start:
    pushq $1
    pushq $output
    call printf
    addq $8, %rsp
    call overhere
    pushq $3
    pushq $output
    call printf
    addq $8, %rsp
    pushq $0
    call exit
overhere:
    pushq %rbp
    movq %rsp, %rbp
    pushq $2
    pushq $output
    call printf
    addq $8, %rsp
    movq %rbp, %rsp
    popq %rbp
    ret 

I assemble, link and run it like this, getting the error message shown:

#as -o calltest.o calltest.s
#ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -o calltest calltest.o
#./calltest 
Segmentation fault

How do I make it work?

x86_64 has another kind of passing arguments, see: http://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI

This is how your example would work:

.section .data
output:
   .asciz "This is section %d\n"
.section .text
.globl _start
_start:
    movq $output, %rdi      # 1st argument
    movq $1, %rsi           # 2nd argument
    xorl %eax, %eax         # no floating point arguments
    call printf
    call overhere
    movq $output, %rdi      # 1st argument
    movq $3, %rsi           # 2nd argument
    xorl %eax, %eax         # no floating point arguments
    call printf
    xor %edi, %edi
    call exit
overhere:
    pushq %rbp
    movq %rsp, %rbp
    movq $output, %rdi      # 1st argument
    movq $2, %rsi           # 2nd argument
    xorl %eax, %eax         # no floating point arguments
    call printf
    movq %rbp, %rsp
    popq %rbp
    ret