Advanced Windows Debugging

Mario Hewardt, Daniel Pravat

Mentioned 26

An eminently practical guide to debugging, one of the most vexing problems facing every Windows developer.

More on Amazon.com

Mentioned in questions and answers.

What are the major reasons for using WinDbg vs the Visual Studio debugger?

And is it commonly used as a complete replacement for the Visual Studio debugger, or more for when the need arises.

If you are wondering why you should use windbg over Visual Studio, then you need to read Advanced Windows Debugging. Any time you need to debug a truly ugly problem windbg has better technology to do it with than Visual Studio. Windbg has a more powerful scripting language and allows you to write DLLs to automate difficult problems. It will install gflags.exe, which gives you better control over the heap for debugging memory overwrites.

You don't actually need to run the install, you can just copy the files over and be ready to go. Also it installs adsplus.vb, so you can take mini-dumps of running processes. It is also very easy to setup to perform remote debugging. There is nothing better than being able to debug a problem from your own desk instead of fighting the 15" monitor that flickers on a test PC.

For day to day code writing I use Visual Studio, but once you need to start debugging problems from other computers or find yourself in a very ugly situation, windbg is the only way to go. Spending some time learning windbg is a great investment. Also if you look at crash dumps there are two great resources, http://www.dumpanalysis.org/blog and http://blogs.msdn.com/ntdebugging/default.aspx that do all their debugging using windbg.

I'm working on a multithreaded C++ application that is corrupting the heap. The usual tools to locate this corruption seem to be inapplicable. Old builds (18 months old) of the source code exhibit the same behaviour as the most recent release, so this has been around for a long time and just wasn't noticed; on the downside, source deltas can't be used to identify when the bug was introduced - there are a lot of code changes in the repository.

The prompt for crashing behaviuor is to generate throughput in this system - socket transfer of data which is munged into an internal representation. I have a set of test data that will periodically cause the app to exception (various places, various causes - including heap alloc failing, thus: heap corruption).

The behaviour seems related to CPU power or memory bandwidth; the more of each the machine has, the easier it is to crash. Disabling a hyper-threading core or a dual-core core reduces the rate of (but does not eliminate) corruption. This suggests a timing related issue.

Now here's the rub:
When it's run under a lightweight debug environment (say Visual Studio 98 / AKA MSVC6) the heap corruption is reasonably easy to reproduce - ten or fifteen minutes pass before something fails horrendously and exceptions, like an alloc; when running under a sophisticated debug environment (Rational Purify, VS2008/MSVC9 or even Microsoft Application Verifier) the system becomes memory-speed bound and doesn't crash (Memory-bound: CPU is not getting above 50%, disk light is not on, the program's going as fast it can, box consuming 1.3G of 2G of RAM). So, I've got a choice between being able to reproduce the problem (but not identify the cause) or being able to idenify the cause or a problem I can't reproduce.

My current best guesses as to where to next is:

  1. Get an insanely grunty box (to replace the current dev box: 2Gb RAM in an E6550 Core2 Duo); this will make it possible to repro the crash causing mis-behaviour when running under a powerful debug environment; or
  2. Rewrite operators new and delete to use VirtualAlloc and VirtualProtect to mark memory as read-only as soon as it's done with. Run under MSVC6 and have the OS catch the bad-guy who's writing to freed memory. Yes, this is a sign of desperation: who the hell rewrites new and delete?! I wonder if this is going to make it as slow as under Purify et al.

And, no: Shipping with Purify instrumentation built in is not an option.

A colleague just walked past and asked "Stack Overflow? Are we getting stack overflows now?!?"

And now, the question: How do I locate the heap corruptor?


Update: balancing new[] and delete[] seems to have gotten a long way towards solving the problem. Instead of 15mins, the app now goes about two hours before crashing. Not there yet. Any further suggestions? The heap corruption persists.

Update: a release build under Visual Studio 2008 seems dramatically better; current suspicion rests on the STL implementation that ships with VS98.


  1. Reproduce the problem. Dr Watson will produce a dump that might be helpful in further analysis.

I'll take a note of that, but I'm concerned that Dr Watson will only be tripped up after the fact, not when the heap is getting stomped on.

Another try might be using WinDebug as a debugging tool which is quite powerful being at the same time also lightweight.

Got that going at the moment, again: not much help until something goes wrong. I want to catch the vandal in the act.

Maybe these tools will allow you at least to narrow the problem to certain component.

I don't hold much hope, but desperate times call for...

And are you sure that all the components of the project have correct runtime library settings (C/C++ tab, Code Generation category in VS 6.0 project settings)?

No I'm not, and I'll spend a couple of hours tomorrow going through the workspace (58 projects in it) and checking they're all compiling and linking with the appropriate flags.


Update: This took 30 seconds. Select all projects in the Settings dialog, unselect until you find the project(s) that don't have the right settings (they all had the right settings).

I have same problems in my work (we also use VC6 sometimes). And there is no easy solution for it. I have only some hints:

  • Try with automatic crash dumps on production machine (see Process Dumper). My experience says Dr. Watson is not perfect for dumping.
  • Remove all catch(...) from your code. They often hide serious memory exceptions.
  • Check Advanced Windows Debugging - there are lots of great tips for problems like yours. I recomend this with all my heart.
  • If you use STL try STLPort and checked builds. Invalid iterator are hell.

Good luck. Problems like yours take us months to solve. Be ready for this...

After being troubled by an issue that I simply did not have the knowledge to debug, I've just decided that I have to learn how to use Windbg. My only problem: I have no clue where to start :-( I'm not really a WinApi-Guy, having use languages that abstract the Windows Api away from me usually.

So I just wonder: What is the best souce (Book, Website) to learn Windbg for someone who knows programming but not much about the inner depths of Windows? (And yes, I do read oldnewthing every day :))

Debugging .NEt Applications has a chapter on how to use WinDbg

How do I use WinDbg for analyzing a dump file?

This is a really broad question.

  1. The first step is to load the dump file into a WinDbg instance.
  2. Next, you need to make sure you have a symbols setup.
  3. Finally, you can run the command !analyze -v to get a basic analysis performed on it. You need to have symbol information available for your code to make dump files worthwhile.

The website Memory Dump, Software Trace, Debugging, Malware, Victimware and Intelligence Analysis Portal has been very informative for me. I also really enjoyed the book, Advanced Windows Debugging by Mario Hewardt and Daniel Pravat.

I've been doing some work on high memory issues, and I've been doing a lot of heap analysis in windbg, and I was curious what the different columns really mean in "!heap -flt -s xxxx" command.

I read What do the 'size' numbers mean in the windbg !heap output?, and I looked in my "Windows Internals" book, but I still had a bunch of questions. So the columns and my questions are below.

**HEAP_ENTRY** - What does this pointer really point to? How is it different than UserPtr?
**Size** - What does this size mean? How is it different than UserSize?
**Prev** - This just appears to be the negative offset to get to the previous heap entry. Still not sure exactly how it's used.
**Flags** - Is there any documentation on these flags?
**UserPtr** - What is the user pointer? In all cases I've seen it's always 8 bytes higher than the HEAP_ENTRY, but I don't really know what it points to.
**UserSize** - This appears to be the size of the actual allocation.
**state** - This just tells you what state of this heap entry is (free, busy, etc....)

Example:
HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
  0015eeb0 0044 0000  [07]   0015eeb8    00204 - (busy)

From looking at the !heap documentation in the Debugging Tools for Windows help file and the heap docs on MSDN and a great excerpt from Advanced Windows Debugging, here's what I've been able to put together:

  • HEAP_ENTRY: pointer to entry within the heap. As you found, there is an 8 byte header which contains the data for the HEAP_ENTRY structure. The size of the HEAP_ENTRY structure is 8 bytes which defines the "heap granularity" size. This is used for determining the...
  • SIZE: size of the entry in terms of the granularity (i.e. the allocation size / 8)
  • FLAGS: these are defined in winbase.h with explanations found the in MSDN link.
  • USERPTR: the actual pointer to the allocated (or freed) object

I found Windbg is very useful during development and debugging. but mostly i use windbg in use mode debugging.

  1. What kernel debugging can do in windbg? or When should I use windbg's kernel debugging?

  2. Is there a toturial about kernel debugging in windbg?

Thanks in advance.

you usually use kernel debugging when you need to debug low level device drivers interacting directly with the hardware.
It's more complicated to debug in kernel mode, among other things for a live kernel debug session you have to run the debugger on a different system than the one being debugged . for the majority of developers user mode is enough to do most of the work.
Advanced Windows Debugging is a very good book about debugging with wndbg (includes discussions about kernel debugging).

the dump analysis site has many tutorials including kernel debugging scenarios

I have a bug I am chasing (I think its a deadlock). When I run the code it hangs without the debugger flagging an error, so after a while I try pressing the pause (break all) button. The debugger then reports "The process appears to be deadlocked...". I then can see that all the threads are held up at lines saying EnterCriticalSection except for one which is already inside a critical section. When I look at the thread that is inside the C.S. with the debugger I see a green arrow, accompanied by a tiny blue circle pointing at a line with GetWindowText... as below:

// stuff A
{
    GetWindowText(editwin[a].child_window_handle,existing_text,MAX_TEXT_SIZE-1);
}
// stuff B

If I hover the mouse over the green arrow I see the text "this is the next statement to execute when this thread returns from the current function". Now this has stumped me because I don't know if it means that it is stuck inside "stuff A" and is waiting to come back or its stuck inside GetWindowText and has somehow got stuck inside that. The arguments to GetWindowText all look sensible to me. If I click on "step into" I get the message "Unable to step. The process has been soft broken".

EDIT: stuff A is in fact the statement:

if (buf_ptr != NULL)

As the previous responses suggest, your code is stuck inside "Stuff A".

Can I suggest another tool for your tool-belt?

I usually find it much easier to debug native synchronization problems using WinDbg. just launch your program in WinDbg, point to the correct symbols and all the info will be right there for your investigation using the !locks, !cs and k commands.

If you're new to WinDbg, you'll find that the internet is full with information about it. I recommend reading Advanced Windows Debugging as well.

It's a little bit difficult to start, comparing to the user friendly VS Debugger but every minute you'll invest in learning how to use it will save you hours of debugging further down the road.

What reading would you recommend on general debugging techniques? I am more interested in principles and best practices than in specific platform solutions. For the record I mainly work with .NET (F#, C#), and dabble in Haskell and Ocaml.

One of these Friday evenings we talked about debugging with my colleague on our walk home. I was surprised to learn that one can view and modify the state of live objects from the VisualStudio debugger. He also mentioned that another developer he knew, a "Java guru," had once shown him some debugging magic and given an article or booklet on debugging, which challenged my colleague's initial "there's nothing to it" attitude. Having spent more time than I wished hunting bugs, I am ready to be challenged as well. Are there any links you would recommend?

It takes a native approach (win32) but Advanced Windows Debugging is a great book.

I highly recommend the excellent book Debugging by David Agans.

While not specifically about programming, the principles are universal. One of the techniques in here provided the biggest quantum leap in my diagnostic capabilities, namely, backing out your fix to prove that just your fix has corrected the problem.

I am working with a legacy VB6/COM application which sometimes causes Windows 7 to crash. I have now generated a .dmp file of one of these crashes using the ProcDump tool from Sysinternals. However, I have never worked with dump files before. Which resources would you recommend for getting started with dump file analysis?

There are some books such as "Advanced Windows Debugging" or the books on this guy's blog that can help.

There is also knowing assembler that you will need.

Use WinDbg or any other debugger. The above mentioned book is for WinDbg specifically though.

What is good book for industry level C++ programming? I am not looking for a beginners C++ book that talks about datatypes and control structures. I am looking for a more advanced book. For example, how to build system applications using C++. Any kind of guidance will be very helpful.

If you're looking for books on refining your craft in C++ as a language, you don't get much better than Scott Meyers' Effective C++ and More Effective C++ and Herb Sutter's Exceptional C++, More Exceptional C++ and Exceptional C++ Style. All are packed with invaluable information on bringing your facility with the language from the intermediate to the advanced level.

System-level programming is specific to operating system, so the books diverge based on your platform. Ones I've found very helpful (albeit not C++ specific) are: Windows System Programming, by Johnson M. Hart, Advanced Windows Debugging, by Mario Hewardt and Daniel Pravat, and Linux System Programming, by Robert Love.

All of these books (as well as Peter Alexander's excellent suggestion of Modern C++ Design) are available on O'Reilly's Safari service, which is a pretty cost-effective way of doing a lot of technical reading on the cheap and well worth checking out if you're considering going on a studying binge.

Lakos' Large Scale C++ Software Design is quite a good intermediate-advanced level book about C++ software architecture. It's a little out of date - predating widespread use of templates for example - but it is quite a good book on the subject.

Lakos worked for Mentor Graphics in the 1980s when first generation workstations were the technology du jour. This was an era when the difference in performance and memory footprint between C and C++ apps was regarded as significant. This 'old school' approach discusses efficient C++ systems architecture in some depth, which is a bit of a unique selling point for this book.

These are the best two books I have seen and read

Advanced C++ Programing Styles and Idioms

C++ Common Knowledge

Modern C++ Design by Andrei Alexandrescu is probably the most advanced C++ book out there. It's more about very advanced design patterns rather than building software.

I run Debug Diagnostic tool on my windows server 2008. It created dump files. Because my asp.net application gives an exception time to time.

I need step by step instruction how to debug dump file. Do I need to debug in production server or locally? What do I need to have? How to configure?

How to determine where in application is the problem?

Is there any other ways how to read Dump files?

Here is the post I found it has pretty good explanation about dump files but they didn't explain the debugging part

http://blogs.msdn.com/b/tess/archive/2009/03/20/debugging-a-net-crash-with-rules-in-debug-diag.aspx

I think you might want to buy Advanced Windows Debugging. There isn't anything nearly good enough on the web for you to learn this well.

http://www.amazon.com/Advanced-Windows-Debugging-Mario-Hewardt/dp/0321374460/ref=sr_1_1?ie=UTF8&qid=1305567440&sr=8-1

You can check out the book site here:

http://www.advancedwindowsdebugging.com/

For some other resources

Possible Duplicate:
Memory leak tool for C++ under Windows

I used to work on a Mac project and one thing I really enjoyed about XCode was its profiler. I found many a bug by just running my program with various settings of that profiler. Most notably, it would show me which parts of my program consumed memory, it would show me if it leaked memory and it would show me when it would do that. If I was working with a GUI application, it would even show me screenshots of what I was doing when those allocations/leaks/deallocations occurred.

Nowadays, I am working on a Windows/C++ project using Visual Studio and I suspect the project to consume too much memory and possibly leak some memory, too. Using XCode, I would just fire up that profiler and immediately know what's happening. In Visual Studio however, I can find no such thing (there is a somewhat awkward performance profiler, but CPU time is not my concern here).

So, how would you go about searching for leaks and code with too much memory consumption?

See Application Verifier, LeakDiag, UMDH, and Debugging Tools for Windows in general.

All of which are free.

For a guide on how to use them, see Advanced Windows Debugging.

In this answer the user suggests using Symbol Servers.

Can anyone explain how they work and how to set it up (if possible) with TFS 2008?

Thanks

Check out Setting up Source Server for TFS Builds. You can also point to a symbol server in Visual Studio by going to Tools>Options>Debugging>Symbols.

Also check out Advanced Windows Debugging book. It talks about settings up a symbol server.

We have a front end written in Visual Basic 6.0 that calls several back end DLLs written in mixed C/C++. The problem is that each DLL appears to have its own heap and one of them isn’t big enough. The heap collides with the program stack when we’ve allocated enough memory. Each DLL is written entirely in C, except for the basic DLL wrapper, which is written in C++. Each DLL has a handful of entry points. Each entry point immediately calls a C routine. We would like to increase the size of the heap in the DLL, but haven’t been able to figure out how to do that. I searched for guidance and found these MSDN articles:

http://msdn.microsoft.com/en-us/library/hh405351(v=VS.85).aspx

These articles are interesting but provide conflicting information. In our problem it appears that each DLL has its own heap. This matches the “Heaps: Pleasures and Pains” article that says that the C Run-Time (C RT) library creates its own heap on startup. The “Managing Heap Memory” article says that the C RT library allocated out of the default process heap. The “Memory management options in Win32” article says the behavior depends on the version of the C RT library being used.

We’ve temporarily solved the problem by allocating memory from a private heap. However, in order to improve the structure of this very large complex program, we want to switch from C with a thin C++ wrapper to real C++ with classes. We’re pretty certain that the new and free operator won’t allocate memory from our private heap and we’re wondering how to control the size of the heap C++ uses to allocate objects in each DLL. The application needs to run in all versions of desktop Windows-NT, from 2000 through 7.

The Question

Can anyone point us to definitive and correct documentation that explains how to control the size of the heap C++ uses to allocate objects?

Several people have asserted that stack corruption due to heap allocations overwriting the stack are impossible. Here is what we observed. The VB front end uses four DLLs that it dynamicly loads. Each DLL is independant of the others and provides a handful of methods called by the front end. All the DLLs comunicate via data structures written to files on disk. These data structures are all structured staticlly. They contain no pointers, just value types and fixed sized arrays of value types. The problem DLL is invoked by a single call where a file name is passed. It is designed to allocate about 20MB of data structures required to do complete its processing. It does a lot of calculation, writes the results to disk, releases the 20MB of data structures, and returns and error code. The front end then unloads the DLL. While debugging the problem under discussion, we set a break point at the beginning of the data structure allocation code and watched the memory values returned from the calloc calls and compared them with the current stack pointer. We watched as the allocated blocks approached the the stack. After the allocation was complete the stack began to grow until it overlapped the heap. Eventually the calculations wrote into the heap and corrupted the stack. As the stack unwound it tried to return to an invalid address and crashed with a segmentation fault.

Each of our DLLs is staticly linked to the CRT, so that each DLL has its own CRT heap and heap manager. Microsoft says in http://msdn.microsoft.com/en-us/library/ms235460(v=vs.80).aspx:

Each copy of the CRT library has a separate and distinct state. As such, CRT objects such as file handles, environment variables, and locales are only valid for the copy of the CRT where these objects are allocated or set. When a DLL and its users use different copies of the CRT library, you cannot pass these CRT objects across the DLL boundary and expect them to be picked up correctly on the other side.
Also, because each copy of the CRT library has its own heap manager, allocating memory in one CRT library and passing the pointer across a DLL boundary to be freed by a different copy of the CRT library is a potential cause for heap corruption.

We don't pass pointers between DLLs. We aren't experiencing heap corruption, we are experiencing stack corruption.

OK, the question is:

Can anyone point us to definitive and correct documentation that explains how to control the size of the heap C++ uses to allocate objects?

I am going to answer my own question. I got the answer from reading Raymond Chen's blog The Old New Thing, specifically There's also a large object heap for unmanaged code, but it's inside the regular heap. In that article Raymond recommends Advanced Windows Debugging by Mario Hewardt and Daniel Pravat. This book has very specific information on both stack and heap corruption, which is what I wanted to know. As a plus it provides all sorts of information about how to debug these problems.

I am currently in an university operating system class and we are working on the windows kernel, more precisely WRK, the windows research kernel, for our projects. WRK is based off of win2k3 server.

I am however having a real hard time dredging up resources to help learn the basics of OS development, Windows kernel development and just generally getting around the Windows API.

We are using the book Microsoft Internals by Russinovich but I was wondering if any of you had some great resources to recommend to me, whether book, online guides or some old class notes. Thanks!

The third edition of Tanenbaum's Modern Operating Systems has a chapter devoted to the Vista kernel. I haven't looked into that chapter (I only read the Linux one), but as far as big-picture stuff, it's fantastic. I'm not sure what level of detail you're looking for, but that might be a good resource to check out.

What specifically are you looking for? Online resources? For that, OSROnline is one of the better websites. Alot of kernel development knowledge is found in the MS and the OSR Mailing lists, that's another place to check that might be better than Stack overflow.

Specifically books, there is the Programming WDM,Developing drivers with KMDF and Advance Windows Debugging. The last specifically will not teach you so much about the kernel and more how to navigate inside it, something you will do quite often if you are writing drivers or researching parts of it.

In order to write drivers, the easiest way is probably to take Windows Driver samples and hack at them, stare the results with windbg and learn more.

Very general: Is there an easy way to tell which line of code last freed a block of memory when an access violation occurs?

Less general: My understanding of profilers is that they override the allocation and deallocation processes. If this is true, might they happen to store the line of code that last freed a section of memory so that when it later crashes because of an access violation, you know what freed it last?

Specifics: Windows, ANSI C, using Visual Studio

Yes!

Install the Windows Debugging Tools and use Application Verifier.

  1. File -> Add Application, select your .exe
  2. Under Basics, select Memory and Heaps.
  3. Run the debug build of your program under ntsd (ntsd yourprogram.exe).
  4. Reproduce the bug.

Now when you make the crash happen, you will get additional information in the debugger from AppVerifier. Use !avrf (may take a long time to run (minutes)) and it will try to give you as much useful information as possible.

You can all use the dps command on the memory address to get all the stored stack info (allocation, deallocation, etc).

You can also use the !heap command on the memory address:

0:004> !heap -p -a 0x0C46CFE0

Which will dump information as well.

Further Reading:

Are there any?

Mostly the easier to use GUI -- it has many more debugging features than VS.

BTW, I highly recommend Advanced Windows Debugging to learn about it and other advanced debugging tools and techniques.

I'm having a blast tracking down some heap corruption. I've enabled standard page heap verification with

gflags /p /enable myprogram.exe

and this succeeds in confirming the corruption:

===========================================================
VERIFIER STOP 00000008: pid 0x1040: corrupted suffix pattern 

    10C61000 : Heap handle
    19BE0CF8 : Heap block
    00000010 : Block size
    00000000 : 
===========================================================

When I turn on full page heap verification (gflags /p /enable myprogram.exe /full) in anticipation that this will cause an error to occur at the time the corruption is introduced, I get nothing more.

I started to get my hopes up while reading Advanced Windows Debugging: Memory Corruption Part II—Heaps, which is a chapter from Advanced Windows Debugging. I installed WinDbg, and downloaded debug symbols for user32.dll, kernel32.dll, ntdll.dll according to http://support.microsoft.com/kb/311503. Now when the program halts in the debugger I can issue this command to see information about the heap page:

0:000> dt _DPH_BLOCK_INFORMATION 19BE0CF8-0x20
ntdll!_DPH_BLOCK_INFORMATION
   +0x000 StartStamp       : 0xabcdaaaa
   +0x004 Heap             : 0x90c61000 
   +0x008 RequestedSize    : 0x10
   +0x00c ActualSize       : 0x38
   +0x010 FreeQueue        : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x010 TraceIndex       : 0
   +0x018 StackTrace       : (null) 
   +0x01c EndStamp         : 0xdcbaaaaa

I am dismayed by the (null) stack trace. Now, http://msdn.microsoft.com/en-us/library/ms220938%28VS.80%29.aspx says:

The StackTrace field will not always contain a non-null value for various reasons. First of all stack trace detection is supported only on x86 platforms and second, even on x86 machines the stack trace detection algorithms are not completely reliable. If the block is an allocated block the stack trace is for the allocation moment. If the block was freed, the stack trace is for the free moment.

But I wonder if anyone has any thoughts on increasing the chances of seeing the stack trace from the allocation moment.

Thanks for reading!

Ah ha! Turns out I needed to enable more gflags options:

gflags /i myprogram.exe +ust

Which has this effect:

ust - Create user mode stack trace database

Seems straightforward when I see parameter description. Silly me. But I also seem to need to set the size of the trace database before it will take effect:

gflags /i myprogram.exe /tracedb 512

...or whatever (in MB).

I've been assigned the job of testing a small Windows application for the company I work for. I'm a little experienced with testing web applications using the Google Chrome Developer Tools. Apart from that, I don't know much.

For the moment, I manual test keeping an eye on the Windows Task Manager for memory and CPU usage.

What other basic tools should I be using to do manual (as opposed to unit testing) Windows application testing?

There're a number of tools that can be handy:

Process Explorer from SysInternals is much more useful that the task manager.

Off top of my head, here are a few things that you can do without modifying the code or writing test code:

  • see if there're memory leaks or corruptions (use Application Verifier + WinDbg)
  • inject failures (that is, at some point modify a status/error code/pointer/some other variable in the debugger as if a piece of code failed to open a file or allocate memory or do something else) and see if the app gracefully handles that

Play with SysInternals tools.

Also, it may be a good idea to buy this book to familiarize yourself with Windows: http://www.amazon.com/Windows%C2%AE-Internals-Including-Windows-Developer/dp/0735625301/

There're also a few good ones on debugging Windows applications, like this one: http://www.amazon.com/Advanced-Windows-Debugging-Mario-Hewardt/dp/0321374460/
Among the other things it explains how to automatically collect crash dumps from your applications (using Windows Error Reporting AKA WER) and then inspect them in the debugger. I found that useful.

I've yet to find a good resource for debugging RELEASE mode binaries or dumps in windbg.

I understand that debugging becomes more limited with compiler optimization enabled. But sometimes I don't have a choice--for example, crash dump analysis on a non-reproducible issue.

It'd be really nice if there were some write-up that describes what IS possible (or what to watch out for) with release binaries. Does anyone know of such a resource?

I'm looking for something like this, but with much more detail. I was hoping Advanced Windows Debugging would have something on it, but no such luck.

If you have PDBs, most things are possible (I debugged Windows OS DLLs solely in release mode for years!).

The thing to realize is that WinDbg will now lie to you far more often - that is, it will display what it sees, which is not always what the actual value is. For example, if you try to run dv on frame 15 on amd64, there is no way that the values displayed will be accurate, since the compiler stored the info in a register.

The other difference, is that functions will now be inlined, so the last stack of the frame may not be the actual last frame, it may be a small function that has been copy-pasted into the bigger function.

I wonder if there is any tool to investigate peak heap contents?

For example, I have an application written on C++ (MSVS2005) and I want to know peak heap consumption and it's contents.

Regards, Maksim

You can explore a process's heap allocation and usage using WinDBG (see !heap command), part of the free collection of Microsoft's Debugging Tools for Windows. Google around for help on the usage, although the best reference I found was the standard reference book Advanced Windows Debugging.

Debugging is a methodical process of finding and fixing bugs in a computer program or a piece of electronic hardware, thus making it behave as expected.

Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge in another. Many books have been written about debugging, as it involves numerous aspects, including interactive debugging, control flow, integration testing, log files, monitoring (application, system), memory dumps, profiling, Statistical Process Control, and special design tactics to improve detection while simplifying changes.

Four key techniques for debugging are syntax checking, adding comments, stepping and using breakpoints.

Syntax checking

Many good tools exist, including online-only tools, to check the syntax of your code. Checking the syntax means checking your code obeys the basic rules of the programming language or tool being used (e.g., missing end brackets, no end to an if statement). Syntax checking is done automatically in compiled languages (eg, C, C++, pascal) but not in interpreted or scripted languages (eg, javascript, perl, HTML). Some code editors include syntax highlighting or validation. Syntax checks can also be carried out for some data files or stylesheets, for example the JSON or CSS that your code uses.

Syntax checks will quickly help find spelling mistakes, missing or repeated statements, invalid expressions, and may also give warnings or suggested improvements. Syntax checkers are also known as linters, or code validators. Checking for valid syntax before running can identify errors quickly.

Stepping

Program stepping refers to using a tool to running your code line by line or a section at a time, examining the results including variables, the result of expressions, and the order that the program's steps are executed in. This is particularly useful in programs which do not give an error, or contain infinite loops.

Breakpoints

Breakpoints are particular places in your code in which you want to temporarily stop the code in order to check if it is running correctly so far, for example to check whether a value typed in was correctly stored in a variable you would add a breakpoint immediately after that line, then check the result. Using several different breakpoints allows you to very quickly find the area of the code which is causing the problem.

Breakpoints can be created using a debugging tool, or manually a very simple form of breakpoint could be adding a pop-up messages that waits for you to respond with OK, and can display a message containing program information (e.g., line number, function's name, values of variables).

Using comments

Adding comment to your code is good practice and allows you to describe the purpose of a short piece of code in human-readable form. Programming languages ignore all lines containing comments, but they can help you later to update your code or resolve problems if you add them as you first begin coding.

Applications and Tools for Debugging:

Learning Sources:

Books:

Tutorials:

I am studying how to use MiniDumpWriteDump() method to create minidumps. After I read some articles, I got the feeling that all I can do is to provide some callback function and various flags to tell the OS what I want to dump. Then OS will collect various info such as call stacks into a dump file.

But is this all I can do? I don't want to use the so-called APIs, it makes me feel like swimming in the bathtub, not the ocean. Is there any way else to examine the computer memory freely? Could anyone provide some reference to achieve that?

Many thanks.

You can, however see another process's memory you do need to be in kernel mode. The API makes it easy to do from User mode. Your choice.

Kernel mode stuff and useful links I've grabbed quickly:

I know I'm reaching for straws here, but this one is a mystery... any pointers or help would be most welcome, so I'm appealing to those more intelligent than I:

We have a crash exhibited in our release binaries only. The crash takes place as the binary is bringing itself down and terminating sub-libraries upon which it depends. Its ability to be reproduced is dependent on the machine- some are 100% reliable in reproducing the crash, some don't exhibit the issue at all, and some are in between. The crash is deep within one of the sublibraries, and there is a good likelihood the stack is corrupt by the time the rubble can be brought into a debugger (MSVC 2008 SP1) to be examined. Running the binary under the debugger prevents the bug from happening, as does remote debugging, as does (of all things) connecting to the machine via VNC. We have tried to install the Microsoft Driver Development Kit, and doing so also squelches the bug.

What would be the next best place to look? What tools would be best in this circumstance? Does it sound like a race condition, or something else?

Try AppVerifier and GFlags together to find Page Heap corruption.

You'll likely need WinDbg as your debugger instead of Visual Studio to debug.

I also recommend this book on advanced Windows debugging for tracking down crashes such as the one you are hitting.

One of my windows application in [.net 3.5] is installed in windows 8.1, to upload images.

I open the application, and I am using a dll to browse the images present in local disk to select and upload them.

Once after browsing the selection of image is done, if I create a "New Folder" in my system, the application crashes - it gives me exception as :

a problem caused the program to stop working correctly. windows will close the program and notify...

I cross checked the event log, and here is a respective log added for the same :

Faulting application name: DesktopPhotoUploader.exe, version: 1.0.0.0, time stamp: 0x529f6471
Faulting module name: ntdll.dll, version: 6.3.9600.16408, time stamp: 0x523d5305
Exception code: 0xc0000374
Fault offset: 0x00000000000f387c
Faulting process id: 0x8d0
Faulting application start time: 0x01cf2c7f30046a99
Faulting application path: C:\Users\AppData\Local\Apps\2.0\7HWTE4KV.OXA\9K6HG17J.XZB\desk..tion_5f682daadb7f3a73_0002.0000_11d13f4927f45bcc\DesktopPhotoUploader.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: 8ca29b6c-9872-11e3-8255-00219b71cec5
Faulting package full name: 
Faulting package-relative application ID: 

Please let me what can be a reason for this?

Exception 0xc0000374 is STATUS_HEAP_CORRUPTION. It indicates your application manipulates the heap in an incorrect way and corrupts it. Is a bug in your code. You can analyze the dump to understand the problem. I recommend you get a copy of Advanced Windows Debugging, it has ample chapters dedicated to heap corruption. A common technique is to use GFlags, see Detecting Heap Corruption Using GFlags and Dumps.

please adisve on below:

1) What is the lightest way to attach to running native windows application process, get list of threads and see what DDLs are used?

2) What is the lightest way to attach to running .NET application process, get list of threads and see what DDLs are used?

Regards, Ron

Do you use Visual Studio? If so, you can attach VS to any running process using the Debug | Attach To Process menu items. You can then break into the process and start examining stacks, threads, modules, etc.

If you want to delve deeper, you could download the Windows SDK and install the Debugging tools. This will give you KD and WinDBG - a console debugger and slightly more friendly multi-pane MDI-style debugging app respectively. Using these tools you can access to most of the core debugging infrastructure built into Windows.

However, note that this is not for the feint of heart and will require considerable time and effort to master. To really become a debugging guru, you'll also need to deeply understand the architecture of the kernel & OS and many core OS data structures.

Thus you might find the following books useful:

For .NET:

For Windows and/or .NET:

For Advanced Windows internals debugging

Enjoy! :)