Reversing

Eldad Eilam, Elliot J. Chikofsky

Mentioned 19

Beginning with a basic primer on reverse engineering-including computer internals, operating systems, and assembly language-and then discussing the various applications of reverse engineering, this book provides readers with practical, in-depth techniques for software reverse engineering. The book is broken into two parts, the first deals with security-related reverse engineering and the second explores the more practical aspects of reverse engineering. In addition, the author explains how to reverse engineer a third-party software library to improve interfacing and how to reverse engineer a competitor's software to build a better product. * The first popular book to show how software reverse engineering can help defend against security threats, speed up development, and unlock the secrets of competitive products * Helps developers plug security holes by demonstrating how hackers exploit reverse engineering techniques to crack copy-protection schemes and identify software targets for viruses and other malware * Offers a primer on advanced reverse-engineering, delving into "disassembly"-code-level reverse engineering-and explaining how to decipher assembly language

More on Amazon.com

Mentioned in questions and answers.

I decided to learn Assembly language. The main reason to do so is being able to understand disassembled code and maybe being able to write more efficient parts of code (for example, through c++), doing somethings like code caves, etc. I saw there are a zillion different flavors of assembly, so, for the purposes I mention, how should I start? What kind of assembly should I learn? I want to learn by first doing some easy programs (i.e. a calculator), but the goal itself will be to get accostumed with it so I can understand the code shown, for example, by IDA Pro.

I'm using windows (if that makes any difference).

edit: So, seems everyone is pointing towards MASM. Although I get the point that it has high level capabilities, all good for the assembly code programmer, that's not what I'm looking for. It seems to have if, invoke, etc instructions not shown in popular deassemblers (like IDA). So what I'd like to hear if possible, is the opinion of anyone that uses ASM for the purposes I am asking (reading deassembled exe's code in IDA), not just "general" assembly programmers.

edit: OK. I am already learning assembly. I am learning MASM, not using the high level stuff that doesn't matter to me. What I'm doing right now is trying out my code on __asm directives in c++, so I can try out things way faster than if I had to do everything from scratch with MASM.

Are you doing other dev work on windows? On which IDE? If it's VS, then there's no need for an additional IDE just to read disassembled code: debug your app (or attach to an external app), then open the disassembly window (in the default settings, that's Alt+8). Step and watch memory/registers as you would through normal code. You might also want to keep a registers window open (Alt+5 by default).

Intel gives free manuals, that give both a survey of basic architecture (registers, processor units etc.) and a full instruction reference. As the architecture matures and is getting more complex, the 'basic architecture' manuals grow less and less readable. If you can get your hands on an older version, you'd probably have a better place to start (even P3 manuals - they explain better the same basic execution environment).

If you care to invest in a book, here is a nice introductory text. Search amazon for 'x86' and you'd get many others. You can get several other directions from another question here.

Finally, you can benefit quite a bit from reading some low-level blogs. These byte-size info bits work best for me, personally.

To do what you're wanting to do, I just took the Intel Instruction Set Reference (might not be the exact one I used, but it looks sufficient) and some simple programs I wrote in Visual Studio and started throwing them into IDAPro/Windbg. When I out-grew my own programs, the software at crackmes was helpful.

I'm assuming that you have some basic understanding of how programs execute on Windows. But really, for reading assembly, there's only a few instructions to learn and a few flavors of those instructions (e.g., there's a jump instruction, jump has a few flavors like jump-if-equal, jump-if-ecx-is-zero, etc). Once you learn the basic instructions it's pretty simple to get the gist of the program execution. IDA's graph view helps, and if you're tracing the program with Windbg, it's pretty simple to figure out what the instructions are doing if you're not sure.

After a bit of playing like that, I bought Hacker Disassembly Uncovered. Generally, I stay away from books with the word "Hacker" in the title, but I really liked how this one went really in-depth about how compiled code looked disassembled. He also goes into compiler optimizations and some efficiency stuff that was interesting.

It all really depends on how deeply you want to be able to understand the program, too. If you're reverse engineering a target looking for vulnerabilities, if you're writing exploit code, or analyzing packed malware for capabilities, you'll need more of a ramp-up time to really get things going (especially for the more advanced malware). On the other hand, if you just want to be able to change your character's level on your favorite video game, you should be doing fine in a relatively short amount of time.

I found Hacking: The Art of Exploitation to be an interesting and useful way into this topic... can't say that I have ever used the knowledge directly, but that's really not why I read it. It gives you a much richer appreciation of the instructions that your code compiles to, which has occasionally been useful in understanding subtler bugs.

Don't be put off by the title. Most of the first part of the book is "Hacking" in the Eric Raymond sense of the word: creative, surprising, almost sneaky ways to solve tough problems. I (and maybe you) was a lot less interested in the security aspects.

(I don't know about you but I was excited with assembly)

A simple tool for experimenting with assembly is already installed in your pc.

Go to Start menu->Run, and type debug

debug (command)

debug is a command in DOS, MS-DOS, OS/2 and Microsoft Windows (only x86 versions, not x64) which runs the program debug.exe (or DEBUG.COM in older versions of DOS). Debug can act as an assembler, disassembler, or hex dump program allowing users to interactively examine memory contents (in assembly language, hexadecimal or ASCII), make changes, and selectively execute COM, EXE and other file types. It also has several subcommands which are used to access specific disk sectors, I/O ports and memory addresses. MS-DOS Debug runs at a 16-bit process level and therefore it is limited to 16-bit computer programs. FreeDOS Debug has a "DEBUGX" version supporting 32-bit DPMI programs as well.

Tutorials:


If you want to understand the code you see in IDA Pro (or OllyDbg), you'll need to learn how compiled code is structured. I recommend the book Reversing: Secrets of Reverse Engineering

I experimented a couple of weeks with debug when I started learning assembly (15 years ago).
Note that debug works at the base machine level, there are no high level assembly commands.

And now a simple example:

Give a to start writing assembly code - type the below program - and finally give g to run it.

alt text


(INT 21 display on screen the ASCII char stored in the DL register if the AH register is set to 2 -- INT 20 terminates the program)

I'm trying to learn about reverse engineering, using Minesweeper as a sample application. I've found this MSDN article on a simple WinDbg command that reveals all the mines but it is old, is not explained in any detail and really isn't what I'm looking for.

I have IDA Pro disassembler and the WinDbg debugger and I've loaded winmine.exe into both of them. Can someone provide some practical tips for either of these programs in terms of finding the location of the data structure that represents the mine field?

In WinDbg I can set breakpoints, but it is difficult for me to imagine at what point to set a breakpoint and at what memory location. Similarly, when I view the static code in IDA Pro, I'm not sure where to even begin to find the function or data structure that represents the mine field.

Are there any Reverse Engineers on Stackoverflow that can point me in the right direction?

"In WinDbg I can set breakpoints, but it is difficult for me to imagine at what point to set a breakpoint and at what memory location. Similarly, when I view the static code in IDA Pro, I'm not sure where to even begin to find the function or datastructure that represents the mine field."

Exactly!

Well, you can look for routines like random() that will be called during the construction of the mines table. This book helped me a lot when I was experimenting with reverse engineering. :)

In general, good places for setting break points are calls to message boxes, calls to play a sound, timers and other win32 API routines.

BTW, I am scanning minesweeper right now with OllyDbg.

Update: nemo reminded me a great tool, Cheat Engine by Eric "Dark Byte" Heijnen.

Cheat Engine (CE) is a great tool for watching and modifying other processes memory space. Beyond that basic facility, CE has more special features like viewing the disassembled memory of a process and injecting code into other processes.

(the real value of that project is that you can download the source code -Delphi- and see how those mechanisms were implemented - I did that many years ago :o)

There's always skepticism from non-programmers when honest developers learn the techniques of black hat hackers. Obviously though, we need to learn many of their tricks so we can keep our own security up to par.

To what extent do you think an honest programmer needs to know the methods of malicious programmers?

Definitely learn the dark side. Even if you don't learn the actual techniques, at least make the effort to learn what's possible.

alt textalt text

Good resources to learn the tricks of the trade are Reversing: Secrets of Reverse Engineering and Hacking: The Art of Exploitation. They're written for both sides - these could be used to LEARN how to hack, but they also give ways to prevent these kinds of attacks.

A different question, i.e. Best .NET obfuscation tools/strategy, asks whether obfuscation is easy to implement using tools.

My question though is, is obfuscation effective? In a comment replying to this answer, someone said that "if you're worried about source theft ... obfuscation is almost trivial to a real cracker".

I've looked at the output from the Community Edition of Dotfuscator: and it looks obfuscated to me! I wouldn't want to maintain that!

I understand that simply 'cracking' obfuscated software might be relatively easy: because you only need to find whichever location in the software implements whatever it is you want to crack (typically the license protection), and add a jump to skip that.

If the worry is more than just cracking by an end-user or a 'pirate' though: if the worry is "source theft" i.e. if you're a software vendor, and your worry is another vendor (a potential competitor) reverse-engineering your source, which they could then use in or add to their own product ... to what extent is simple obfuscation an adequate or inadequate protection against that risk?


1st edit:

The code in question is about 20 KLOC which runs on end-user machines (a user control, not a remote service).

If obfuscation really is "almost trivial to a real cracker", I'd like some insight into why it's ineffective (and not just "how much" it's not effective).


2nd edit:

I'm not worried about someone's reversing the algorithm: more worried about their repurposing the actual implementation of the algorithm (i.e. the source code) into their own product.

Figuring that 20 KLOC is several month's work to develop, would it take more or less than this (several months) to deobfuscate it all?

Is it even necessary to deobfuscate something in order to 'steal' it: or might a sane competitor simply incorporate it wholesale into their product while still obfuscated, accept that as-is it's a maintenance nightmare, and hope that it needs little maintenance? If this scenario is a possibility then is obfuscated .Net code any more vulnerable to this than compiled machine code is?

Is most of the obfuscation "arms race" aimed mostly at preventing people people from even 'cracking' something (e.g. finding and deleting the code fragment which implements licensing protection/enforcement), more than at preventing 'source theft'?

I've discussed why I don't think Obfuscation is an effective means of protection against cracking here:
Protect .NET Code from reverse engineering

However, your question is specifically about source theft, which is an interesting topic. In Eldad Eiliams book, "Reversing: Secrets of Reverse Engineering", the author discusses source theft as one reason behind reverse engineering in the first two chapters.

Basically, what it comes down to is the only chance you have of being targeted for source theft is if you have some very specific, hard to engineer, algorithm related to your domain that gives you a leg up on your competition. This is just about the only time it would be cost-effective to attempt to reverse engineer a small portion of your application.

So, unless you have some top-secret algorithm you don't want your competition to have, you don't need to worry about source theft. The cost involved with reversing any significant amount of source-code out of your application quickly exceeds the cost of re-writing it from scratch.

Even if you do have some algorithm you don't want them to have, there isn't much you can do to stop determined and skilled individuals from getting it anyway (if the application is executing on their machine).

Some common anti-reversing measures are:

  • Obfuscating - Doesn't do much in terms of protecting your source or preventing it from being cracked. But we might as well not make it totally easy, right?
  • 3rd Party Packers - Themida is one of the better ones. Packs an executable into an encrypted win32 application. Prevents reflection if the application is a .NET app as well.
  • Custom Packers - Sometimes writing your own packer if you have the skill to do so is effective because there is very little information in the cracking scene about how to unpack your application. This can stop inexperienced RE's. This tutorial gives some good information on writing your own packer.
  • Keep industry secret algorithms off the users machine. Execute them as a remove service so the instructions are never executed locally. The only "fool-proof" method of protection.

However, packers can be unpacked, and obfuscation doesn't really hinder those who want to see what you application is doing. If the program is run on the users machine then it is vulnerable.

Eventually its code must be executed as machine code and it is normally a matter of firing up debugger, setting a few breakpoints and monitoring the instructions being executed during the relevant action and some time spent poring over this data.


You mentioned that it took you several months to write ~20kLOC for your application. It would take almost an order of magnitude longer to reverse those equivalent 20kLOC from your application into workable source if you took the bare minimum precautions.

This is why it is only cost-effective to reverse small, industry specific algorithms from your application. Anything else and it isn't worth it.

Take the following fictionalized example: Lets say I just developed a brand new competing application for iTunes that had a ton of bells and whistles. Let say it took several 100k LOC and 2 years to develop. One key feature I have is a new way of serving up music to you based off your music-listening taste.

Apple (being the pirates they are) gets wind of this and decides they really like your music suggest feature so they decide to reverse it. They will then hone-in on only that algorithm and the reverse engineers will eventually come up with a workable algorithm that serves up the equivalent suggestions given the same data. Then they implement said algorithm in their own application, call it "Genius" and make their next 10 trillion dollars.

That is how source theft goes down.

No one would sit there and reverse all 100k LOC to steal significant chunks of your compiled application. It would simply be too costly and too time consuming. About 90% of the time they would be reversing boring, non-industry-secretive code that simply handled button presses or handled user input. Instead, they could hire developers of their own to re-write most of it from scratch for less money and simply reverse the important algorithms that are difficult to engineer and that give you an edge (ie, music suggest feature).

I'd like to learn assembler. However, there are very few resources for doing assembler with OS X.

Is there anyone out there who has programmed in assembly on a Mac? Where did you learn?

And, is there any reason I shouldn't be doing assembly? Do I risk (significantly) crashing my computer irreparably?

If you're using a PowerPC Mac, look into gcc inline assembler. Otherwise, look into nasm. I can't give any decent references to PPC ASM (they're few and far between), but I suggest the following things to learn x86 asm:

Also, if you're not in kernel mode then there's no chance of screwing anything up, really, and even if you are in kernel mode it's hard to really destroy anything.

Edit: Also, get gcc and such from XCode not Macports or somesuch. You're in for a world of malformed Mach-O files if you don't. Not fun to diagnose file format issues when you're just starting asm hacking.

I just finished learning assembly language. But I couldn't understand what could I implement (for practice/like small project). Would be great if its something useful for anyone.

One of my favorite hobbies is Reverse Engineering.

It requires a solid knowledge of assembly and the use of disassemblers/debuggers to walk through compiled code. This allows you to alter, understand and reverse compiled programs. Each new program is like a puzzle waiting to be solved!

For example, a lot of people reverse games like Minesweeper when they are first starting out.

Here is a screenshot of a key section of code in Minesweeper I reversed awhile back (comments on right-hand side): alt text

This was located by placing a breakpoint on calls to the rand() function and stepping backwards in the callstack. After some digging it becomes obvious that:

  1. Minefield Height is located in 0x1005338
  2. Minefield Width is located in 0x1005334
  3. Minefield Baseaddress is located at 0x1005340

With this knowledge it becomes easy to determine the location of any given mine in the minefield by:

cellAddress = mapBaseAddress + (32 * (y+1)) + (x+1);

Then, with a simple loop and some calls to ReadProcessMemory() you've got the ultimate Minesweeper hack!

Reading hand-written assembly is far easier than reading machine generated assembly. Modern compilers do some magical and crazy things to the code for optimization that can sometimes be difficult to follow. So, this will definitely push your assembly knowledge!

There are tons of activities that can branch off from this:

  1. Reverse hidden API's in libraries
  2. Write advanced game hacks using DLL Injection, Code Caves, Function Hooking and more!
  3. Understand the limitations of various protection schemes employed by software
  4. Reverse a fileformat that isn't published or known and write code to read this format for interoperability purposes.
  5. Write emulators for various systems (including older game systems!)
  6. Understand how a well-known program does a particular task.
  7. Reverse malware and viruses to see how and what they do.

And more!

If you are interested, I highly suggest the book: Reversing: Secrets of Reverse Engineering

I've read and finished both Reversing: Secrets of Reverse Engineering and Hacking: The Art of Exploitation. They both were illuminating in their own way but I still feel like a lot of the techniques and information presented within them is outdated to some degree.

When the infamous Phrack Article, Smashing the Stack for Fun and Profit, was written 1996 it was just before what I sort of consider the Computer Security "golden age".

Writing exploits in the years that followed was relatively easy. Some basic knowledge in C and Assembly was all that was required to perform buffer overflows and execute some arbitrary shell code on a victims machine.

To put it lightly, things have gotten a lot more complicated. Now security engineers have to contend with things like Address Space Layout Randomization (ASLR), Data Execution Prevention (DEP), Stack Cookies, Heap Cookies, and much more. The complexity of writing exploits went up at least an order of magnitude.

You can't event run most of the buffer overrun exploits in the tutorials you'll find today without compiling with a bunch of flags to turn off modern protections.

Now if you want to write an exploit you have to devise ways to turn off DEP, spray the heap with your shell-code hundreds of times and attempt to guess a random memory location near your shellcode. Not to mention the pervasiveness of managed languages in use today that are much more secure when it comes to these vulnerabilities.

I'm looking to extend my security knowledge beyond writing toy-exploits for a decade old system. I'm having trouble locating resources that help address the issues of writing exploits in the face of all the protections I outlined above.

What are the more advanced and prevalent papers, books or other resources devoted to contending with the challenges of writing exploits for modern systems?

every c program is converted to machine code, if this binary is distributed. Since the instruction set of a computer is well known, is it possible to get back the C original program?

You can never get back to the exact same source since there is no meta-data about that saved with the compiled code.

But you can re-create code out from the assembly-code.

Check out this book if you are interested in these things: Reversing: Secrets of Reverse Engineering.

Edit

Some compilers-101 here, if you were to define a compiler with another word and not as technical as "compiler", what would it be?

Answer: Translator

A compiler translates the syntax / phrases you have written into another language a C compiler translates to Assembly or even Machine-code. C# Code is translated to IL and so forth.

The executable you have is just a translation of your original text / syntax and if you want to "reverse it" hence "translate it back" you will most likely not get the same structure as you had at the start.

A more real life example would be if you Translate from English to German and the from German back to English, the sentance structure will most likely be different, other words might be used but the meaning, the context, will most likely not have changed.

The same goes for a compiler / translator if you go from C to ASM, the logic is the same, it's just a different way of reading it ( and of course its optimized ).

I have a small utility that was originally written in VS2005.

I need to make a small change, but the source code for one of the dlls has been lost somewhere.

Is there a free or reasonably priced tool to reverse engineer the dll back to C++ code.

You might also want to have a look at OllyDbg which is a 32-bit assembler level analysing debugger. It's to analyze binary code in scenarios when you do not have a source code. It is light weight debugger. OllyDbg is a shareware so you can download & use it for free..!!

Visit OllyDbg is home page here

PS: Back in the day crackers used SoftICE from NuMega for debugging into an executable & grab a snapshot at the values of registers. SoftICE was an advanced debugger. It was definitely the favorite tool for the crackers. I don't know about the present status of the product. NuMega's site had no information about it. I may overlooked it but I could not find it. I recommend that you get your hands on a legacy version (4.0x) of SoftICE & apply the WindowsXP patch for SoftICE. Working with SoftICE is something of an "experience".

Further Read: Reversing: Secrets of Reverse Engineering by Eldad Eilam

My question is pretty straightforward: You are an executable file that outputs "Access granted" or "Access denied" and evil persons try to understand your algorithm or patch your innards in order to make you say "Access granted" all the time.

After this introduction, you might be heavily wondering what I am doing. Is he going to crack Diablo3 once it is out? I can pacify your worries, I am not one of those crackers. My goal are crackmes.

Crackmes can be found on - for example - www.crackmes.de. A Crackme is a little executable that (most of the time) contains a little algorithm to verify a serial and output "Access granted" or "Access denied" depending on the serial. The goal is to make this executable output "Access granted" all the time. The methods you are allowed to use might be restricted by the author - no patching, no disassembling - or involve anything you can do with a binary, objdump and a hex editor. Cracking crackmes is one part of the fun, definately, however, as a programmer, I am wondering how you can create crackmes that are difficult.

Basically, I think the crackme consists of two major parts: a certain serial verification and the surrounding code.

Making the serial verification hard to track just using assembly is very possible, for example, I have the idea to take the serial as an input for a simulated microprocessor that must end up in a certain state in order to get the serial accepted. On the other hand, one might grow cheap and learn more about cryptographically strong ways to secure this part. Thus, making this hard enough to make the attacker try to patch the executable should not be tha t hard.

However, the more difficult part is securing the binary. Let us assume a perfectly secure serial verification that cannot be reversed somehow (of course I know it can be reversed, in doubt, you rip parts out of the binary you try to crack and throw random serials at it until it accepts). How can we prevent an attacker from just overriding jumps in the binary in order to make our binary accept anything?

I have been searching on this topic a bit, but most results on binary security, self verifying binaries and such things end up in articles that try to prevent attacks on an operating system using compromised binaries. by signing certain binaries and validate those signatures with the kernel.

My thoughts currently consist of:

  • checking explicit locations in the binary to be jumps.
  • checksumming parts of the binary and compare checksums computed at runtime with those.
  • have positive and negative runtime-checks for your functions in the code. With side-effects on the serial verification. :)

Are you able to think of more ways to annoy a possible attacker longer? (of course, you cannot keep him away forever, somewhen, all checks will be broken, unless you managed to break a checksum-generator by being able to embed the correct checksum for a program in the program itself, hehe)

You're getting into "Anti-reversing techniques". And it's an art basically. Worse is that even if you stomp newbies, there are "anti-anti reversing plugins" for olly and IDA Pro that they can download and bypass much of your countermeasures.

Counter measures include debugger detection by trap Debugger APIs, or detecting 'single stepping'. You can insert code that after detecting a debugger breakin, continues to function, but starts acting up at random times much later in the program. It's really a cat and mouse game and the crackers have a significant upper hand.

Check out... http://www.openrce.org/reference_library/anti_reversing - Some of what's out there.

http://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817/ - This book has a really good anti-reversing info and steps through the techniques. Great place to start if you're getting int reversing in general.

How would you maintain the legacy applications that:

  1. Has no unit tests have big methods

  2. with a lot of duplicated logic have

  3. have No separation of concern
  4. have a lot of quick hacks and hard coded strings
  5. have Outdated and wrong documentation
  6. Requirements are not properly documented! This has actually resulted in disputes between the testers, developers and the clients in the past. Of course there are some non-functional requirements such as shouldn't be slow, don't clash and other business logics that are known to the application users. But beyond the most common-sense scenario and the most common-sense business workflow, there is little guidance on what should be ( or not) done.

???

You need this book.

alt text

I basically agree with everything Paul C said. I'm not a TDD priest, but anytime you're touching a legacy codebase -- especially one with which you're not intimately familiar -- you need to have a solid way to retest and make sure you've followed Hippocrates: First, do no harm. Testing, good unit and regression tests in particular, are about the only way to make that play.

I highly recommend picking up a copy of Reversing: Secrets of Reverse Engineering Software if it's a codebase with which you're unfamiliar. Although this book goes to great depths that are outside your current needs (and mine, for that matter), it taught me a great deal about how to safely and sanely work with someone else's code.

I am trying to identify private apis of Quartz framework. I have a list of bunch of private private APIs but don't have their signature.

I would like to put one of them on table for discussion, so that we can have idea of how to reverse engineer to find the correct signature.

void CGContextDrawImages(????);

I got it from following assembly

 _CGContextDrawImages:
    +0  0008986a  55                      pushl       %ebp
    +1  0008986b  89e5                    movl        %esp,%ebp
    +3  0008986d  57                      pushl       %edi
    +4  0008986e  56                      pushl       %esi
    +5  0008986f  53                      pushl       %ebx
    +6  00089870  81ecbc010000            subl        $0x000001bc,%esp
   +12  00089876  e800000000              calll       0x0008987b
   +17  0008987b  5b                      popl        %ebx
   +18  0008987c  8b4508                  movl        0x08(%ebp),%eax
   +21  0008987f  85c0                    testl       %eax,%eax
   +23  00089881  740c                    je          0x0008988f
   +25  00089883  8b4508                  movl        0x08(%ebp),%eax
   +28  00089886  81780854585443          cmpl        $0x43545854,0x08(%eax)
   +35  0008988d  7424                    je          0x000898b3
   +37  0008988f  8b5508                  movl        0x08(%ebp),%edx
   +40  00089892  89542408                movl        %edx,0x08(%esp)
   +44  00089896  8d836ce16f00            leal        0x006fe16c(%ebx),%eax
   +50  0008989c  89442404                movl        %eax,0x04(%esp)
   +54  000898a0  8d8341ae6f00            leal        0x006fae41(%ebx),%eax
   +60  000898a6  890424                  movl        %eax,(%esp)
   +63  000898a9  e8816f0f00              calll       _CGPostError
   +68  000898ae  e9c4120000              jmp         0x0008ab77
   +73  000898b3  8b4510                  movl        0x10(%ebp),%eax
   +76  000898b6  85c0                    testl       %eax,%eax
   +78  000898b8  0f84b9120000            je          0x0008ab77
   +84  000898be  8b450c                  movl        0x0c(%ebp),%eax
   +87  000898c1  85c0                    testl       %eax,%eax
   +89  000898c3  0f84ae120000            je          0x0008ab77
   +95  000898c9  8b7d18                  movl        0x18(%ebp),%edi
   +98  000898cc  85ff                    testl       %edi,%edi
  +100  000898ce  0f84a3120000            je          0x0008ab77
  +106  000898d4  31f6                    xorl        %esi,%esi
  +108  000898d6  31ff                    xorl        %edi,%edi
  +110  000898d8  8b4d10                  movl        0x10(%ebp),%ecx
  +113  000898db  8b04b1                  movl        (%ecx,%esi,4),%eax
  +116  000898de  85c0                    testl       %eax,%eax
  +118  000898e0  740d                    je          0x000898ef
  +120  000898e2  47                      incl        %edi
  +121  000898e3  890424                  movl        %eax,(%esp)
  +124  000898e6  e82f61fbff              calll       0x0003fa1a
  +129  000898eb  85c0                    testl       %eax,%eax
  +131  000898ed  7506                    jne         0x000898f5
  +133  000898ef  46                      incl        %esi
  +134  000898f0  397518                  cmpl        %esi,0x18(%ebp)
  +137  000898f3  75e3                    jne         0x000898d8
  +139  000898f5  85ff                    testl       %edi,%edi
  +141  000898f7  0f847a120000            je          0x0008ab77
  +147  000898fd  397518                  cmpl        %esi,0x18(%ebp)
  +150  00089900  7743                    ja          0x00089945
  +152  00089902  8b4518                  movl        0x18(%ebp),%eax
  +155  00089905  89442418                movl        %eax,0x18(%esp)
  +159  00089909  8b5514                  movl        0x14(%ebp),%edx
  +162  0008990c  89542414                movl        %edx,0x14(%esp)
  +166  00089910  8b4d10                  movl        0x10(%ebp),%ecx
  +169  00089913  894c2410                movl        %ecx,0x10(%esp)
  +173  00089917  8b450c                  movl        0x0c(%ebp),%eax
  +176  0008991a  8944240c                movl        %eax,0x0c(%esp)
  +180  0008991e  8b5508                  movl        0x08(%ebp),%edx
  +183  00089921  8b4234                  movl        0x34(%edx),%eax
  +186  00089924  89442408                movl        %eax,0x08(%esp)
  +190  00089928  8b423c                  movl        0x3c(%edx),%eax
  +193  0008992b  89442404                movl        %eax,0x04(%esp)
  +197  0008992f  8b4218                  movl        0x18(%edx),%eax
  +200  00089932  890424                  movl        %eax,(%esp)
  +203  00089935  e8c264ffff              calll       0x0007fdfc
  +208  0008993a  3dee030000              cmpl        $0x000003ee,%eax
  +213  0008993f  0f85b6110000            jne         0x0008aafb
  +219  00089945  8b7514                  movl        0x14(%ebp),%esi
  +222  00089948  85f6                    testl       %esi,%esi
  +224  0008994a  0f84c4110000            je          0x0008ab14
  +230  00089950  8b5508                  movl        0x08(%ebp),%edx
  +233  00089953  8b4234                  movl        0x34(%edx),%eax
  +236  00089956  890424                  movl        %eax,(%esp)
  +239  00089959  e883d7f9ff              calll       0x000270e1
  +244  0008995e  85c0                    testl       %eax,%eax
  +246  00089960  0f9585bbfeffff          setne       0xfffffebb(%ebp)
  +253  00089967  8b83f1677600            movl        0x007667f1(%ebx),%eax
  +259  0008996d  f30f104004              movss       0x04(%eax),%xmm0
  +264  00089972  f30f118504ffffff        movss       %xmm0,0xffffff04(%ebp)
  +272  0008997a  f30f1008                movss       (%eax),%xmm1
  +276  0008997e  f30f118d00ffffff        movss       %xmm1,0xffffff00(%ebp)
  +284  00089986  f30f104008              movss       0x08(%eax),%xmm0
  +289  0008998b  f30f1185f8feffff        movss       %xmm0,0xfffffef8(%ebp)
  +297  00089993  f30f10480c              movss       0x0c(%eax),%xmm1
  +302  00089998  f30f118dfcfeffff        movss       %xmm1,0xfffffefc(%ebp)
  +310  000899a0  eb1a                    jmp         0x000899bc
  +312  000899a2  f30f108dfcfeffff        movss       0xfffffefc(%ebp),%xmm1
  +320  000899aa  0f2ec8                  ucomiss     %xmm0,%xmm1
  +323  000899ad  7a06                    jp          0x000899b5
  +325  000899af  0f84c2110000            je          0x0008ab77
  +331  000899b5  c685bbfeffff00          movb        $0x00,0xfffffebb(%ebp)
  +338  000899bc  8b4510                  movl        0x10(%ebp),%eax
  +341  000899bf  898524ffffff            movl        %eax,0xffffff24(%ebp)
  +347  000899c5  8b7d0c                  movl        0x0c(%ebp),%edi
  +350  000899c8  8b5514                  movl        0x14(%ebp),%edx
  +353  000899cb  899528ffffff            movl        %edx,0xffffff28(%ebp)
  +359  000899d1  c7852cffffff00000000    movl        $0x00000000,0xffffff2c(%ebp)
  +369  000899db  8d4dc4                  leal        0xc4(%ebp),%ecx
  +372  000899de  898d94feffff            movl        %ecx,0xfffffe94(%ebp)
  +378  000899e4  8b8524ffffff            movl        0xffffff24(%ebp),%eax
  +384  000899ea  8b08                    movl        (%eax),%ecx
  +386  000899ec  85c9                    testl       %ecx,%ecx
  +388  000899ee  0f84e1100000            je          0x0008aad5
  +394  000899f4  0f57c0                  xorps       %xmm0,%xmm0
  +397  000899f7  0f2e4708                ucomiss     0x08(%edi),%xmm0
  +401  000899fb  7a06                    jp          0x00089a03
  +403  000899fd  0f84d2100000            je          0x0008aad5
  +409  00089a03  0f2e470c                ucomiss     0x0c(%edi),%xmm0
  +413  00089a07  7a06                    jp          0x00089a0f
  +415  00089a09  0f84c6100000            je          0x0008aad5
  +421  00089a0f  8b5514                  movl        0x14(%ebp),%edx
  +424  00089a12  85d2                    testl       %edx,%edx
  +426  00089a14  754e                    jne         0x00089a64
  +428  00089a16  f30f108504ffffff        movss       0xffffff04(%ebp),%xmm0
  +436  00089a1e  f30f1145d8              movss       %xmm0,0xd8(%ebp)
  +441  00089a23  f30f108d00ffffff        movss       0xffffff00(%ebp),%xmm1
  +449  00089a2b  f30f114dd4              movss       %xmm1,0xd4(%ebp)
  +454  00089a30  f30f1085f8feffff        movss       0xfffffef8(%ebp),%xmm0
  +462  00089a38  f30f1145dc              movss       %xmm0,0xdc(%ebp)
  +467  00089a3d  f30f108dfcfeffff        movss       0xfffffefc(%ebp),%xmm1
  +475  00089a45  f30f114de0              movss       %xmm1,0xe0(%ebp)
  +480  00089a4a  8b45d4                  movl        0xd4(%ebp),%eax
  +483  00089a4d  8945b4                  movl        %eax,0xb4(%ebp)
  +486  00089a50  8b45d8                  movl        0xd8(%ebp),%eax
  +489  00089a53  8945b8                  movl        %eax,0xb8(%ebp)
  +492  00089a56  8b45dc                  movl        0xdc(%ebp),%eax
  +495  00089a59  8945bc                  movl        %eax,0xbc(%ebp)
  +498  00089a5c  8b45e0                  movl        0xe0(%ebp),%eax
  +501  00089a5f  8945c0                  movl        %eax,0xc0(%ebp)
  +504  00089a62  eb49                    jmp         0x00089aad
  +506  00089a64  8b8528ffffff            movl        0xffffff28(%ebp),%eax
  +512  00089a6a  0f2e4008                ucomiss     0x08(%eax),%xmm0
  +516  00089a6e  7a06                    jp          0x00089a76
  +518  00089a70  0f845f100000            je          0x0008aad5
  +524  00089a76  0f2e400c                ucomiss     0x0c(%eax),%xmm0
  +528  00089a7a  7a06                    jp          0x00089a82
  +530  00089a7c  0f8453100000            je          0x0008aad5
  +536  00089a82  8d55b4                  leal        0xb4(%ebp),%edx
  +539  00089a85  89c1                    movl        %eax,%ecx
  +541  00089a87  8b00                    movl        (%eax),%eax
  +543  00089a89  89442404                movl        %eax,0x04(%esp)
  +547  00089a8d  8b4104                  movl        0x04(%ecx),%eax
  +550  00089a90  89442408                movl        %eax,0x08(%esp)
  +554  00089a94  8b4108                  movl        0x08(%ecx),%eax
  +557  00089a97  8944240c                movl        %eax,0x0c(%esp)
  +561  00089a9b  8b410c                  movl        0x0c(%ecx),%eax
  +564  00089a9e  89442410                movl        %eax,0x10(%esp)
  +568  00089aa2  891424                  movl        %edx,(%esp)
  +571  00089aa5  e8f751fbff              calll       0x0003eca1
  +576  00089aaa  83ec04                  subl        $0x04,%esp
  +579  00089aad  f30f1045c0              movss       0xc0(%ebp),%xmm0
  +584  00089ab2  f30f118518ffffff        movss       %xmm0,0xffffff18(%ebp)
  +592  00089aba  f30f104dbc              movss       0xbc(%ebp),%xmm1
  +597  00089abf  f30f118d14ffffff        movss       %xmm1,0xffffff14(%ebp)
  +605  00089ac7  8b07                    movl        (%edi),%eax
  +607  00089ac9  89442404                movl        %eax,0x04(%esp)
  +611  00089acd  8b4704                  movl        0x04(%edi),%eax
  +614  00089ad0  89442408                movl        %eax,0x08(%esp)
  +618  00089ad4  8b4708                  movl        0x08(%edi),%eax
  +621  00089ad7  8944240c                movl        %eax,0x0c(%esp)
  +625  00089adb  8b470c                  movl        0x0c(%edi),%eax
  +628  00089ade  89442410                movl        %eax,0x10(%esp)
  +632  00089ae2  8b8594feffff            movl        0xfffffe94(%ebp),%eax
  +638  00089ae8  890424                  movl        %eax,(%esp)
  +641  00089aeb  e8b151fbff              calll       0x0003eca1
  +646  00089af0  83ec04                  subl        $0x04,%esp
  +649  00089af3  f30f1045c4              movss       0xc4(%ebp),%xmm0
  +654  00089af8  f30f118508ffffff        movss       %xmm0,0xffffff08(%ebp)
  +662  00089b00  8b75c8                  movl        0xc8(%ebp),%esi
  +665  00089b03  f30f104dd0              movss       0xd0(%ebp),%xmm1
  +670  00089b08  f30f118d10ffffff        movss       %xmm1,0xffffff10(%ebp)
  +678  00089b10  f30f1045cc              movss       0xcc(%ebp),%xmm0
  +683  00089b15  f30f11850cffffff        movss       %xmm0,0xffffff0c(%ebp)
  +691  00089b1d  f30f108518ffffff        movss       0xffffff18(%ebp),%xmm0
  +699  00089b25  f30f1145c0              movss       %xmm0,0xc0(%ebp)
  +704  00089b2a  f30f108d14ffffff        movss       0xffffff14(%ebp),%xmm1
  +712  00089b32  f30f114dbc              movss       %xmm1,0xbc(%ebp)
  +717  00089b37  8b45b4                  movl        0xb4(%ebp),%eax
  +720  00089b3a  89442410                movl        %eax,0x10(%esp)
  +724  00089b3e  8b45b8                  movl        0xb8(%ebp),%eax
  +727  00089b41  89442414                movl        %eax,0x14(%esp)
  +731  00089b45  8b45bc                  movl        0xbc(%ebp),%eax
  +734  00089b48  89442418                movl        %eax,0x18(%esp)
  +738  00089b4c  8b45c0                  movl        0xc0(%ebp),%eax
  +741  00089b4f  8944241c                movl        %eax,0x1c(%esp)
  +745  00089b53  8b45c4                  movl        0xc4(%ebp),%eax
  +748  00089b56  890424                  movl        %eax,(%esp)
  +751  00089b59  8b45c8                  movl        0xc8(%ebp),%eax
  +754  00089b5c  89442404                movl        %eax,0x04(%esp)
  +758  00089b60  8b45cc                  movl        0xcc(%ebp),%eax
  +761  00089b63  89442408                movl        %eax,0x08(%esp)
  +765  00089b67  8b45d0                  movl        0xd0(%ebp),%eax
  +768  00089b6a  8944240c                movl        %eax,0x0c(%esp)
  +772  00089b6e  e823faf7ff              calll       0x00009596
  +777  00089b73  84c0                    testb       %al,%al
  +779  00089b75  7437                    je          0x00089bae
  +781  00089b77  8b9524ffffff            movl        0xffffff24(%ebp),%edx
  +787  00089b7d  8b02                    movl        (%edx),%eax
  +789  00089b7f  f30f108508ffffff        movss       0xffffff08(%ebp),%xmm0
  +797  00089b87  f30f1145c4              movss       %xmm0,0xc4(%ebp)
  +802  00089b8c  8975c8                  movl        %esi,0xc8(%ebp)
  +805  00089b8f  f30f108d10ffffff        movss       0xffffff10(%ebp),%xmm1
  +813  00089b97  f30f114dd0              movss       %xmm1,0xd0(%ebp)
  +818  00089b9c  f30f10850cffffff        movss       0xffffff0c(%ebp),%xmm0
  +826  00089ba4  f30f1145cc              movss       %xmm0,0xcc(%ebp)
  +831  00089ba9  e9fc0e0000              jmp         0x0008aaaa
  +836  00089bae  f30f108508ffffff        movss       0xffffff08(%ebp),%xmm0
  +844  00089bb6  f30f1145c4              movss       %xmm0,0xc4(%ebp)
  +849  00089bbb  8975c8                  movl        %esi,0xc8(%ebp)
  +852  00089bbe  f30f108d10ffffff        movss       0xffffff10(%ebp),%xmm1
  +860  00089bc6  f30f114dd0              movss       %xmm1,0xd0(%ebp)
  +865  00089bcb  f30f10850cffffff        movss       0xffffff0c(%ebp),%xmm0
  +873  00089bd3  f30f1145cc              movss       %xmm0,0xcc(%ebp)
  +878  00089bd8  f30f108d18ffffff        movss       0xffffff18(%ebp),%xmm1
  +886  00089be0  f30f114dc0              movss       %xmm1,0xc0(%ebp)
  +891  00089be5  f30f108514ffffff        movss       0xffffff14(%ebp),%xmm0
  +899  00089bed  f30f1145bc              movss       %xmm0,0xbc(%ebp)
  +904  00089bf2  8b45b4                  movl        0xb4(%ebp),%eax
  +907  00089bf5  89442410                movl        %eax,0x10(%esp)
  +911  00089bf9  8b45b8                  movl        0xb8(%ebp),%eax
  +914  00089bfc  89442414                movl        %eax,0x14(%esp)
  +918  00089c00  8b45bc                  movl        0xbc(%ebp),%eax
  +921  00089c03  89442418                movl        %eax,0x18(%esp)
  +925  00089c07  8b45c0                  movl        0xc0(%ebp),%eax
  +928  00089c0a  8944241c                movl        %eax,0x1c(%esp)
  +932  00089c0e  8b45c4                  movl        0xc4(%ebp),%eax
  +935  00089c11  890424                  movl        %eax,(%esp)
  +938  00089c14  8b45c8                  movl        0xc8(%ebp),%eax
  +941  00089c17  89442404                movl        %eax,0x04(%esp)
  +945  00089c1b  8b45cc                  movl        0xcc(%ebp),%eax
  +948  00089c1e  89442408                movl        %eax,0x08(%esp)
  +952  00089c22  8b45d0                  movl        0xd0(%ebp),%eax
  +955  00089c25  8944240c                movl        %eax,0x0c(%esp)
  +959  00089c29  e81acaf9ff              calll       0x00026648
  +964  00089c2e  84c0                    testb       %al,%al
  +966  00089c30  0f846d010000            je          0x00089da3
  +972  00089c36  80bdbbfeffff00          cmpb        $0x00,0xfffffebb(%ebp)
  +979  00089c3d  0f8583000000            jne         0x00089cc6
  +985  00089c43  8b4508                  movl        0x08(%ebp),%eax
  +988  00089c46  890424                  movl        %eax,(%esp)
  +991  00089c49  e81aaff9ff              calll       0x00024b68
  +996  00089c4e  f30f108518ffffff        movss       0xffffff18(%ebp),%xmm0
 +1004  00089c56  f30f1145c0              movss       %xmm0,0xc0(%ebp)
 +1009  00089c5b  f30f108d14ffffff        movss       0xffffff14(%ebp),%xmm1
 +1017  00089c63  f30f114dbc              movss       %xmm1,0xbc(%ebp)
 +1022  00089c68  8b45b4                  movl        0xb4(%ebp),%eax
 +1025  00089c6b  89442404                movl        %eax,0x04(%esp)
 +1029  00089c6f  8b45b8                  movl        0xb8(%ebp),%eax
 +1032  00089c72  89442408                movl        %eax,0x08(%esp)
 +1036  00089c76  8b45bc                  movl        0xbc(%ebp),%eax
 +1039  00089c79  8944240c                movl        %eax,0x0c(%esp)
 +1043  00089c7d  8b45c0                  movl        0xc0(%ebp),%eax
 +1046  00089c80  89442410                movl        %eax,0x10(%esp)
 +1050  00089c84  8b4508                  movl        0x08(%ebp),%eax
 +1053  00089c87  890424                  movl        %eax,(%esp)
 +1056  00089c8a  e8a75efaff              calll       0x0002fb36
 +1061  00089c8f  8b9524ffffff            movl        0xffffff24(%ebp),%edx
 +1067  00089c95  8b02                    movl        (%edx),%eax
 +1069  00089c97  f30f108508ffffff        movss       0xffffff08(%ebp),%xmm0
 +1077  00089c9f  f30f1145c4              movss       %xmm0,0xc4(%ebp)
 +1082  00089ca4  8975c8                  movl        %esi,0xc8(%ebp)
 +1085  00089ca7  f30f108d10ffffff        movss       0xffffff10(%ebp),%xmm1

08a486 8b4590 movl 0x90(%ebp),%eax ...

I dont expect an exact answer but a way to decode the signature. Any Idea/direction would also be useful to carry on with research.

Thank you for your time. I really appreciate.

This is not an easy task and might require tools other than gdb. I read a couple interesting RE tutorials and even though they are not specific to OSX they still provide interesting insight and examples on deciphering function parameters:

Reversing (Undocumented) Windows API Functions

Matt's Cracking Guide

Secrets of Reverse Engineering: Appendix C - Deciphering Program Data

Reverse Engineering and Function Calling by Address

I have made a binary executable file disassembled using disassembler like IDA Pro. Now, I plan to recognize type and data structure information as much as possible. Is there any resource reference or ideas to help me finish the task?

Thank you!~

EDIT:

Thanks very much for tips below. Besides type and data structure information, any ideas about class object recognition?

The already mentioned Reversing: Secrets of Reverse engineering by Eldad Eilam has some nice descriptions of how various control flow and data structures look in the assembly. However, since you specifically mention classes, I would like to plug my article on Visual C++ implementation. A lot of it applies to other compilers as well.

BTW, I would recommend starting with small functions/classes and identifying them in the binary. If you are using Visual C++ and compile your code with debug info (Debug build or /Zi on command line), IDA (at least recent versions) will detect and offer to load the PDB symbols. That will make identification of your code easier.

I'm into hacking challenges (like rankk.com) and some of the challenges require disassembly and little modifications of PE files.

I'm looking for a disassembler/debugger that is able to dump the strings, walk the assembler code and allow modifications.

My knowledge in this field is very limited so I'm looking for something relatively easy to use and preferably free.

Any suggestions?

I like OllyDbg. (with a good companion :)

When studying the first programming course at college we learnt that time is introduced as the seed value of the rand function in order to give out random values every time the code runs. If I can fix time and play a game that gives random levels each time you hit play, will I always get the same level? And if yes is there anyway to do this?

If the game uses a pseudo-random number generator that is seeded from the runtime timestamp then yes, if you manage to set the time to the same value each time the game is started then you should get the same levels.

Probably the way to do it though would be to intercept the calls to the get system time system call and set the time to a specific value at that time and let the rest of it go unaltered.

You could try to give it a go with IDA Pro (https://www.hex-rays.com/products/ida/) or some other disassembler/debugger. I also found this book an interesting read with respect to hacking with IDA Pro (http://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817)

I am trying to reverse engineer a disassembled binary. I don't understand what it is doing when it makes a call such as:

push $0x804a254

What makes it even more confusing is that that address is not and address of an instruction nor is it in the symbol table. What is it doing?

That instruction simply pushes 32-bit constant (0x804a254) in the stack.

That instruction alone is not enough for us to tell how it is later used. Could you provide more dissasembly of the code? Especially I would like to see where this value is popped out, and how this value is later being used.

Before starting any reverse engineering I would recommend reading this book (Reverse Engineering secrets) and then X86 instruction set manual (Intel or AMD). I am assuming that you are Reverse Engineering for x86 CPU.

I need an opinion from somebody who has some experince with assuring file integrity. I am trying to protect the integrity of my file with a crc checksum. My primary goal is to make harder bypassing a licence file check (which consist in disassembling the executable and removing a conditional jump).

I came up with the following idea:

unsigned long crc_stored = 4294967295;
char* text_begin = (char*)0xffffffffffffffff;
char* text_end = (char*)0xffffffffffffffff;

int main(){
    unsigned long crc = calc_checksum(text_begin, text_end);
    if (crc == crc_stored)
        //file is ok
}

I edit the .data section of the elf binary in the following way: text_begin and text_end will contain the begin and end address of the .text section, and crc_stored the crc checksum of the .text section.

I would like to know whether this is a proper way of doing this, or there are better methods?

Edit: Karoly Horvath has right. Let's say I use the crc check to decrypt some code. I would like to know which is the best way ro checksum protect the executable. Olaf also has right. I can use a sha algorithm. The question is the same.

Edit2: please stop saying that any protection can bypassed. I know and I just want to make it harder. Please answer the question if you can.

Let me see. You have code that does this:

int main() {
  if (!license_ok()) { exit(1); }
  // do something useful
}

You are worried that someone will disassemble your code, and patch out the conditional jump, so you are proposing to change the code this way instead:

int main() {
  if (calc_checksum() != stored_crc) { exit(1); }
  if (!license_ok()) { exit(1); }
  // do something useful
}

I hope you see that this "solution" is not really a solution at all (if someone is capable of patching out one conditional jump, surely he is just as capable of patching out two such jumps).

You can find ideas for a more plausible / robust solution in one of the many books on the subject.

I want to know what should i learn to become a reverse engineer?..Like Geohot i want to be able to revese engineer ps3 and iphone to make jailbreaks ..:D

I highly recommend this book: Reversing - Secrets of Reverse Engineering

This image gives a good picture about Virtual Adress space. But it only says half of the story. It only gives complete picture of User Adress space ie.. lower 50% (or 75% in some cases).

What about the rest 50% (or 25%) which is occupied by the kernel. I know kernel also has so many different things like kernel modules , device drivers, core kernel itself. There must be some kind of layout right?

What is its layout? If you say its Operating System dependent. I would say, there are two major operating systems Windows & Linux. Please give answer for any one these.

alt text

Memory Layout of Windows Kernel. Picture taken from Reversing: Secrets of Reverse Engineering

alt text