Large-scale C++ Software Design

John Lakos

Mentioned 72

In designing large-scale C++ applications, you are entering a dimension barely skimmed by most C++ books, particularly considering experience with small programming projects does not scale up to larger projects. This book unites high-level design concepts with specific C++ programming details to reveal practical methods for planning and implementing high-quality large C++ systems. You will learn the importance of physical design in large systems, how to structure your software as an acyclic hierarchy of components, and techniques for reducing link-time and compile-time dependencies. Then the book turns to logical design issues--architecting a component, designing a function, and implementing an object--all in the context of a large-project environment.

More on Amazon.com

Mentioned in questions and answers.

This question attempts to collect the few pearls among the dozens of bad C++ books that are published every year.

Unlike many other programming languages, which are often picked up on the go from tutorials found on the Internet, few are able to quickly pick up C++ without studying a well-written C++ book. It is way too big and complex for doing this. In fact, it is so big and complex, that there are very many very bad C++ books out there. And we are not talking about bad style, but things like sporting glaringly obvious factual errors and promoting abysmally bad programming styles.

Please edit the accepted answer to provide quality books and an approximate skill level — preferably after discussing your addition in the C++ chat room. (The regulars might mercilessly undo your work if they disagree with a recommendation.) Add a short blurb/description about each book that you have personally read/benefited from. Feel free to debate quality, headings, etc. Books that meet the criteria will be added to the list. Books that have reviews by the Association of C and C++ Users (ACCU) have links to the review.

Note: FAQs and other resources can be found in the C++ tag info and under . There is also a similar post for C: The Definitive C Book Guide and List

Beginner

Introductory, no previous programming experience

  • Programming: Principles and Practice Using C++ (Bjarne Stroustrup) (updated for C++11/C++14) An introduction to programming using C++ by the creator of the language. A good read, that assumes no previous programming experience, but is not only for beginners.

Introductory, with previous programming experience

  • C++ Primer * (Stanley Lippman, Josée Lajoie, and Barbara E. Moo) (updated for C++11) Coming at 1k pages, this is a very thorough introduction into C++ that covers just about everything in the language in a very accessible format and in great detail. The fifth edition (released August 16, 2012) covers C++11. [Review]

  • A Tour of C++ (Bjarne Stroustrup) (EBOOK) The “tour” is a quick (about 180 pages and 14 chapters) tutorial overview of all of standard C++ (language and standard library, and using C++11) at a moderately high level for people who already know C++ or at least are experienced programmers. This book is an extended version of the material that constitutes Chapters 2-5 of The C++ Programming Language, 4th edition.

  • Accelerated C++ (Andrew Koenig and Barbara Moo) This basically covers the same ground as the C++ Primer, but does so on a fourth of its space. This is largely because it does not attempt to be an introduction to programming, but an introduction to C++ for people who've previously programmed in some other language. It has a steeper learning curve, but, for those who can cope with this, it is a very compact introduction into the language. (Historically, it broke new ground by being the first beginner's book to use a modern approach at teaching the language.) [Review]

  • Thinking in C++ (Bruce Eckel) Two volumes; is a tutorial style free set of intro level books. Downloads: vol 1, vol 2. Unfortunately they’re marred by a number of trivial errors (e.g. maintaining that temporaries are automatically const), with no official errata list. A partial 3rd party errata list is available at (http://www.computersciencelab.com/Eckel.htm), but it’s apparently not maintained.

* Not to be confused with C++ Primer Plus (Stephen Prata), with a significantly less favorable review.

Best practices

  • Effective C++ (Scott Meyers) This was written with the aim of being the best second book C++ programmers should read, and it succeeded. Earlier editions were aimed at programmers coming from C, the third edition changes this and targets programmers coming from languages like Java. It presents ~50 easy-to-remember rules of thumb along with their rationale in a very accessible (and enjoyable) style. For C++11 and C++14 the examples and a few issues are outdated and Effective Modern C++ should be preferred. [Review]

  • Effective Modern C++ (Scott Meyers) This is basically the new version of Effective C++, aimed at C++ programmers making the transition from C++03 to C++11 and C++14.

  • Effective STL (Scott Meyers) This aims to do the same to the part of the standard library coming from the STL what Effective C++ did to the language as a whole: It presents rules of thumb along with their rationale. [Review]

Intermediate

  • More Effective C++ (Scott Meyers) Even more rules of thumb than Effective C++. Not as important as the ones in the first book, but still good to know.

  • Exceptional C++ (Herb Sutter) Presented as a set of puzzles, this has one of the best and thorough discussions of the proper resource management and exception safety in C++ through Resource Acquisition is Initialization (RAII) in addition to in-depth coverage of a variety of other topics including the pimpl idiom, name lookup, good class design, and the C++ memory model. [Review]

  • More Exceptional C++ (Herb Sutter) Covers additional exception safety topics not covered in Exceptional C++, in addition to discussion of effective object oriented programming in C++ and correct use of the STL. [Review]

  • Exceptional C++ Style (Herb Sutter) Discusses generic programming, optimization, and resource management; this book also has an excellent exposition of how to write modular code in C++ by using nonmember functions and the single responsibility principle. [Review]

  • C++ Coding Standards (Herb Sutter and Andrei Alexandrescu) “Coding standards” here doesn't mean “how many spaces should I indent my code?” This book contains 101 best practices, idioms, and common pitfalls that can help you to write correct, understandable, and efficient C++ code. [Review]

  • C++ Templates: The Complete Guide (David Vandevoorde and Nicolai M. Josuttis) This is the book about templates as they existed before C++11. It covers everything from the very basics to some of the most advanced template metaprogramming and explains every detail of how templates work (both conceptually and at how they are implemented) and discusses many common pitfalls. Has excellent summaries of the One Definition Rule (ODR) and overload resolution in the appendices. A second edition is scheduled for 2017. [Review]


Advanced

  • Modern C++ Design (Andrei Alexandrescu) A groundbreaking book on advanced generic programming techniques. Introduces policy-based design, type lists, and fundamental generic programming idioms then explains how many useful design patterns (including small object allocators, functors, factories, visitors, and multimethods) can be implemented efficiently, modularly, and cleanly using generic programming. [Review]

  • C++ Template Metaprogramming (David Abrahams and Aleksey Gurtovoy)

  • C++ Concurrency In Action (Anthony Williams) A book covering C++11 concurrency support including the thread library, the atomics library, the C++ memory model, locks and mutexes, as well as issues of designing and debugging multithreaded applications.

  • Advanced C++ Metaprogramming (Davide Di Gennaro) A pre-C++11 manual of TMP techniques, focused more on practice than theory. There are a ton of snippets in this book, some of which are made obsolete by typetraits, but the techniques, are nonetheless useful to know. If you can put up with the quirky formatting/editing, it is easier to read than Alexandrescu, and arguably, more rewarding. For more experienced developers, there is a good chance that you may pick up something about a dark corner of C++ (a quirk) that usually only comes about through extensive experience.


Reference Style - All Levels

  • The C++ Programming Language (Bjarne Stroustrup) (updated for C++11) The classic introduction to C++ by its creator. Written to parallel the classic K&R, this indeed reads very much alike it and covers just about everything from the core language to the standard library, to programming paradigms to the language's philosophy. [Review]

  • C++ Standard Library Tutorial and Reference (Nicolai Josuttis) (updated for C++11) The introduction and reference for the C++ Standard Library. The second edition (released on April 9, 2012) covers C++11. [Review]

  • The C++ IO Streams and Locales (Angelika Langer and Klaus Kreft) There's very little to say about this book except that, if you want to know anything about streams and locales, then this is the one place to find definitive answers. [Review]

C++11/14 References:

  • The C++ Standard (INCITS/ISO/IEC 14882-2011) This, of course, is the final arbiter of all that is or isn't C++. Be aware, however, that it is intended purely as a reference for experienced users willing to devote considerable time and effort to its understanding. As usual, the first release was quite expensive ($300+ US), but it has now been released in electronic form for $60US.

  • The C++14 standard is available, but seemingly not in an economical form – directly from the ISO it costs 198 Swiss Francs (about $200 US). For most people, the final draft before standardization is more than adequate (and free). Many will prefer an even newer draft, documenting new features that are likely to be included in C++17.

  • Overview of the New C++ (C++11/14) (PDF only) (Scott Meyers) (updated for C++1y/C++14) These are the presentation materials (slides and some lecture notes) of a three-day training course offered by Scott Meyers, who's a highly respected author on C++. Even though the list of items is short, the quality is high.

  • The C++ Core Guidelines (C++11/14/17/…) (edited by Bjarne Stroustrup and Herb Sutter) is an evolving online document consisting of a set of guidelines for using modern C++ well. The guidelines are focused on relatively higher-level issues, such as interfaces, resource management, memory management and concurrency affecting application architecture and library design. The project was announced at CppCon'15 by Bjarne Stroustrup and others and welcomes contributions from the community. Most guidelines are supplemented with a rationale and examples as well as discussions of possible tool support. Many rules are designed specifically to be automatically checkable by static analysis tools.

  • The C++ Super-FAQ (Marshall Cline, Bjarne Stroustrup and others) is an effort by the Standard C++ Foundation to unify the C++ FAQs previously maintained individually by Marshall Cline and Bjarne Stroustrup and also incorporating new contributions. The items mostly address issues at an intermediate level and are often written with a humorous tone. Not all items might be fully up to date with the latest edition of the C++ standard yet.

  • cppreference.com (C++03/11/14/17/…) (initiated by Nate Kohl) is a wiki that summarizes the basic core-language features and has extensive documentation of the C++ standard library. The documentation is very precise but is easier to read than the official standard document and provides better navigation due to its wiki nature. The project documents all versions of the C++ standard and the site allows filtering the display for a specific version. The project was presented by Nate Kohl at CppCon'14.


Classics / Older

Note: Some information contained within these books may not be up-to-date or no longer considered best practice.

  • The Design and Evolution of C++ (Bjarne Stroustrup) If you want to know why the language is the way it is, this book is where you find answers. This covers everything before the standardization of C++.

  • Ruminations on C++ - (Andrew Koenig and Barbara Moo) [Review]

  • Advanced C++ Programming Styles and Idioms (James Coplien) A predecessor of the pattern movement, it describes many C++-specific “idioms”. It's certainly a very good book and might still be worth a read if you can spare the time, but quite old and not up-to-date with current C++.

  • Large Scale C++ Software Design (John Lakos) Lakos explains techniques to manage very big C++ software projects. Certainly a good read, if it only was up to date. It was written long before C++98, and misses on many features (e.g. namespaces) important for large scale projects. If you need to work in a big C++ software project, you might want to read it, although you need to take more than a grain of salt with it. The first volume of a new edition is expected in 2015.

  • Inside the C++ Object Model (Stanley Lippman) If you want to know how virtual member functions are commonly implemented and how base objects are commonly laid out in memory in a multi-inheritance scenario, and how all this affects performance, this is where you will find thorough discussions of such topics.

  • The Annotated C++ Reference Manual (Bjarne Stroustrup, Margaret A. Ellis) This book is quite outdated in the fact that it explores the 1989 C++ 2.0 version - Templates, exceptions, namespaces and new casts were not yet introduced. Saying that however this is book goes through the entire C++ standard of the time explaining the rationale, the possible implementations and features of the language. This is not a book not learn programming principles and patterns on C++, but to understand every aspect of the C++ language.

I am looking for the definition of when I am allowed to do forward declaration of a class in another class's header file:

Am I allowed to do it for a base class, for a class held as a member, for a class passed to member function by reference, etc. ?

Lakos distinguishes between class usage

  1. in-name-only (for which a forward declaration is sufficient) and
  2. in-size (for which the class definition is needed).

I've never seen it pronounced more succinctly :)

In a C++ project I'm working on, I have a flag kind of value which can have four values. Those four flags can be combined. Flags describe the records in database and can be:

  • new record
  • deleted record
  • modified record
  • existing record

Now, for each record I wish to keep this attribute, so I could use an enum:

enum { xNew, xDeleted, xModified, xExisting }

However, in other places in code, I need to select which records are to be visible to the user, so I'd like to be able to pass that as a single parameter, like:

showRecords(xNew | xDeleted);

So, it seems I have three possible appoaches:

#define X_NEW      0x01
#define X_DELETED  0x02
#define X_MODIFIED 0x04
#define X_EXISTING 0x08

or

typedef enum { xNew = 1, xDeleted, xModified = 4, xExisting = 8 } RecordType;

or

namespace RecordType {
    static const uint8 xNew = 1;
    static const uint8 xDeleted = 2;
    static const uint8 xModified = 4;
    static const uint8 xExisting = 8;
}

Space requirements are important (byte vs int) but not crucial. With defines I lose type safety, and with enum I lose some space (integers) and probably have to cast when I want to do a bitwise operation. With const I think I also lose type safety since a random uint8 could get in by mistake.

Is there some other cleaner way?

If not, what would you use and why?

P.S. The rest of the code is rather clean modern C++ without #defines, and I have used namespaces and templates in few spaces, so those aren't out of question either.

Based on KISS, high cohesion and low coupling, ask these questions -

  • Who needs to know? my class, my library, other classes, other libraries, 3rd parties
  • What level of abstraction do I need to provide? Does the consumer understand bit operations.
  • Will I have have to interface from VB/C# etc?

There is a great book "Large-Scale C++ Software Design", this promotes base types externally, if you can avoid another header file/interface dependancy you should try to.

I am working on a large C++ project in Visual Studio 2008, and there are a lot of files with unnecessary #include directives. Sometimes the #includes are just artifacts and everything will compile fine with them removed, and in other cases classes could be forward declared and the #include could be moved to the .cpp file. Are there any good tools for detecting both of these cases?

If you're interested in this topic in general, you might want to check out Lakos' Large Scale C++ Software Design. It's a bit dated, but goes into lots of "physical design" issues like finding the absolute minimum of headers that need to be included. I haven't really seen this sort of thing discussed anywhere else.

So I finished my first C++ programming assignment and received my grade. But according to the grading, I lost marks for including cpp files instead of compiling and linking them. I'm not too clear on what that means.

Taking a look back at my code, I chose not to create header files for my classes, but did everything in the cpp files (it seemed to work fine without header files...). I'm guessing that the grader meant that I wrote '#include "mycppfile.cpp";' in some of my files.

My reasoning for #include'ing the cpp files was: - Everything that was supposed to go into the header file was in my cpp file, so I pretended it was like a header file - In monkey-see-monkey do fashion, I saw that other header files were #include'd in the files, so I did the same for my cpp file.

So what exactly did I do wrong, and why is it bad?

I will suggest you to go through Large Scale C++ Software Design by John Lakos. In the college, we usually write small projects where we do not come across such problems. The book highlights the importance of separating interfaces and the implementations.

Header files usually have interfaces which are supposed not to be changed so frequently. Similarly a look into patterns like Virtual Constructor idiom will help you grasp the concept further.

I am still learning like you :)

We have a large, multi-platform application written in C. (with a small but growing amount of C++) It has evolved over the years with many features you would expect in a large C/C++ application:

  • #ifdef hell
  • Large files that make it hard to isolate testable code
  • Functions that are too complex to be easily testable

Since this code is targeted for embedded devices, it's a lot of overhead to run it on the actual target. So we would like to do more of our development and testing in quick cycles, on a local system. But we would like to avoid the classic strategy of "copy/paste into a .c file on your system, fix bugs, copy/paste back". If developers are going to to go the trouble to do that, we'd like to be able to recreate the same tests later, and run in an automated fashion.

Here's our problem: in order to refactor the code to be more modular, we need it to be more testable. But in order to introduce automated unit tests, we need it to be more modular.

One problem is that since our files are so large, we might have a function inside a file that calls a function in the same file that we need to stub out to make a good unit test. It seems like this would be less of a problem as our code gets more modular, but that is a long way off.

One thing we thought about doing was tagging "known to be testable" source code with comments. Then we could write a script scan source files for testable code, compile it in a separate file, and link it with the unit tests. We could slowly introduce the unit tests as we fix defects and add more functionality.

However, there is concern that maintaining this scheme (along with all the required stub functions) will become too much of a hassle, and developers will stop maintaining the unit tests. So another approach is to use a tool that automatically generates stubs for all the code, and link the file with that. (the only tool we have found that will do this is an expensive commercial product) But this approach seems to require that all our code be more modular before we can even begin, since only the external calls can be stubbed out.

Personally, I would rather have developers think about their external dependencies and intelligently write their own stubs. But this could be overwhelming to stub out all the dependencies for a horribly overgrown, 10,000 line file. It might be difficult to convince developers that they need to maintain stubs for all their external dependencies, but is that the right way to do it? (One other argument I've heard is that the maintainer of a subsystem should maintain the stubs for their subsystem. But I wonder if "forcing" developers to write their own stubs would lead to better unit testing?)

The #ifdefs, of course, add another whole dimension to the problem.

We have looked at several C/C++ based unit test frameworks, and there are a lot of options that look fine. But we have not found anything to ease the transition from "hairball of code with no unit tests" to "unit-testable code".

So here are my questions to anyone else who has been through this:

  • What is a good starting point? Are we going in the right direction, or are we missing something obvious?
  • What tools might be useful to help with the transition? (preferably free/open source, since our budget right now is roughly "zero")

Note, our build environment is Linux/UNIX based, so we can't use any Windows-only tools.

G'day,

I'd start by having a look at any obvious points, e.g. using dec's in header files for one.

Then start looking at how the code has been laid out. Is it logical? Maybe start breaking large files down into smaller ones.

Maybe grab a copy of Jon Lakos's excellent book "Large-Scale C++ Software Design" (sanitised Amazon link) to get some ideas on how it should be laid out.

Once you start getting a bit more faith in the code base itself, i.e. code layout as in file layout, and have cleared up some of the bad smells, e.g. using dec's in header files, then you can start picking out some functionality that you can use to start writing your unit tests.

Pick a good platform, I like CUnit and CPPUnit, and go from there.

It's going to be a long, slow journey though.

HTH

cheers,

Michael Feathers wrote the bible on this, Working Effectively with Legacy Code

I have a Phone interview coming up next with with a company which works in financial software industry. The interview is mainly going to be in C++ and problem solving and logic. Please tell me the method of preparation for this interview. I have started skimming through Thinking in C++ and brushing up the concepts. Is there any other way I can prepare?? Please help.

Edit:

Thank you all everyone for the advice. I just want to add that I am currently fresh out of grad school and have no previous experience. So Can you suggest some type of questions that will be asked to new grads??

Read (or skim, depending on how much time you have to prepare) "Large-Scale C++ Software Design" by John Lakos. Chances are, you will need it.

Make sure you know your basic data structures and algorithms. You're more likely to be asked about that stuff than something higher up the food chain. Those are usually saved for the in-person interview.

Put another way: be solid with the fundamentals and solid with your C++ syntax. Also, knowledge of common libraries like STL and Boost couldn't hurt...but be sure you know what those libraries give you! In the end phone screens are there to cull out people who can't do the basics. Prove you can and you should move on to the next step. Good luck!

Here's some links of interview questions to check out:

Now, for completion's sake, some books:

Reading the reviews at Amazon and ACCU suggests that John Lakos' book, Large-Scale C++ Software Design may be the Rosetta Stone for modularization.

At the same time, the book seems to be really rare: not many have ever read it, and no pirate electronic copies are floating around.

So, what do you think?

[Since this is Number 3 at Google search for the book title, left my vote for reopening, it would be a pity to loose all the helpful discussion here (which I always thought was the place right for it).]

I've read it, and consider it a very useful book on some practical issues with large C++ projects. If you have already read a lot about C++, and know a bit about physical design and its implications, you may not find that much which is terribly "new" in this book.

On the other hand, if your build takes 4 hours, and you don't know how to whittle it down, get a copy, read it, and take it all in.

You'll start writing physically better code quite quickly.

[Edit] If you want to start somewhere, and can't immediately get a hold of the book, I found the Games From Within series on physical structure useful even after reading Large Scale C++ design.

Interestingly, "More C++ Gems" contains a shortened (to 88(!) pages) version of Lakos' book, which can also be browsed (fully, I believe, as it belongs to the first half of the book) online at Google books.

So, enjoy everyone interested:)

What order should headers be declared in a header / cpp file? Obviously those that are required by subsequent headers should be earlier and class specific headers should be in cpp scope not header scope, but is there a set order convention / best practice?

Good practice: every .h file should have a .cpp that includes that .h first before anything else. This proves that any .h file can be put first.

Even if the header requires no implementation, you make a .cpp that just includes that .h file and nothing else.

This then means that you can answer your question any way you like. It doesn't matter what order you include them in.

For further great tips, try this book: Large-Scale C++ Software Design - it's a shame it's so expensive, but it is practically a survival guide for C++ source code layout.

I just noticed a new term pimpl idiom, what's the difference between this idiom with Bridge design pattern? I am confused about that.

I also noticed the pimpl idiom is always used for swap function, what's that? Could anybody give me an example?

PIMPL is a way of hiding the implementation, primarily to break compilation dependencies.

The Bridge pattern, on the other hand, is a way of supporting multiple implementations.

swap is a standard C++ function for exchanging the values of two objects. If you swap the pointer to the implementation for a different implementation, you are essentially changing the mechanism of the class at runtime.

But in its basic and common form, a class using PIMPL points to a single implementation, so there is no abstract class with distinct subclasses — just one class, forward declared, and compiled elsewhere. Changing the implementation class does not require any recompilation of sources that include the main header.

For example, say you have a lot of private member functions, private enums, and private data. And these private "bits" change fairly frequently as the class is developed and maintained. If the #include dependencies are such that touching this header file causes a large number of sources to be recompiled, you have a good candidate for PIMPL.

So the Bridge pattern is about object-oriented design, while the PIMPL idiom is about physical design of files.

(For more on physical design, I recommend the book Large-Scale C++ Software Design by John Lakos.)

I once worked on a C++ project that took about an hour and a half for a full rebuild. Small edit, build, test cycles took about 5 to 10 minutes. It was an unproductive nightmare.

What is the worst build times you ever had to handle?

What strategies have you used to improve build times on large projects?

Update:

How much do you think the language used is to blame for the problem? I think C++ is prone to massive dependencies on large projects, which often means even simple changes to the source code can result in a massive rebuild. Which language do you think copes with large project dependency issues best?

The best suggestion is to build makefiles that actually understand dependencies and do not automatically rebuild the world for a small change. But, if a full rebuild takes 90 minutes, and a small rebuild takes 5-10 minutes, odds are good that your build system already does that.

Can the build be done in parallel? Either with multiple cores, or with multiple servers?

Checkin pre-compiled bits for pieces that really are static and do not need to be rebuilt every time. 3rd party tools/libraries that are used, but not altered are a good candidate for this treatment.

Limit the build to a single 'stream' if applicable. The 'full product' might include things like a debug version, or both 32 and 64 bit versions, or may include help files or man pages that are derived/built every time. Removing components that are not necessary for development can dramatically reduce the build time.

Does the build also package the product? Is that really required for development and testing? Does the build incorporate some basic sanity tests that can be skipped?

Finally, you can re-factor the code base to be more modular and to have fewer dependencies. Large Scale C++ Software Design is an excellent reference for learning to decouple large software products into something that is easier to maintain and faster to build.

EDIT: Building on a local filesystem as opposed to a NFS mounted filesystem can also dramatically speed up build times.

This book Large-Scale C++ Software Design has very good advice I've used in past projects.

For OOP languages, there are many books describing how to design software, and design patterns are mainly for OOP languages.

I am wondering whether there are any books/good articles teaching how to use C in a big project, like it is a good practice to use static functions when this function is only used in a single file.

You must read Expert C Programming by Peter van der Linden.

alt text

Code Complete 1st Ed by Steve McConell is more oriented towards C, that may be worth a look as well. At any rate his books are great reading for any professional programmer.

G'day,

While heavily focused on C++, John Lakos's excellent book "Large-Scale C++ Software Design" has a lot of information that is very relevant to the design of software written in C.

Edit: Oooh. After seeing @Jackson's suggestion for the excellent "The Practice of Programming" I'd also highly recommend Eric Raymond's excellent book "The Art of UNIX Programming.". Thanks for the reminder @Jackson.

HTH

cheers,

  1. C FAQ
  2. K & R
  3. Linux kernel source code

I think I've become quite good at the basics of programming (for a variety of languages). I can write a *good** line of code. I can write a good method. I can write a good class. I can write a good group of classes. I can write good small or medium application.

I do not however know how to build a good large application. Particularly in the case where multiple technologies are involved and more are likely to become involved with time. Say a project with a large web front-end, a large server back-end that connects to some other integration back-end and finally a large and complex database. Oh, I've been involved in a few of these applications and I could build one I'm sure. I'm not so sure however that it could qualify as "good".

My question is thus for a reference to a book or other good source of reading where I could learn how to distribute and organize code and data for general large projects. For example, would I want to layer things very strictly or would I want to encapsulate it independent units instead. Would I want to try to keep most of the logic in the same pool, or should it just be distributed as it seems most logical when adding whatever feature I'm adding.

I've seen lots of general principals on these issues (e.g. No spaghetti code, meatball code...) and read a few excellent articles that discuss the matter but I've never encountered a source which would lead me to concrete practical knowledge. I realize the difficultly of the question and so I'd be happy to just hear about the readings that others have found to help them in their quest for such knowledge.

As always, thank you for your replies.

****Given the debated nature of the definition of "good" code, the term "good" in this context won't be defined (it means whatever you think it ought to mean).

Here's a book that we have used to guide our coding standards and methods:

alt textLarge-Scale C++ Software Design

The program I'm working on has been in development for almost 10 years since it was first drawn up on the back of the proverbial napkin. And the project is still going strong today. It hasn't been perfect, and there are still problems with circular dependencies and some class interfaces not being very clean, but most classes aren't like that, the program works and our users are happy.

I would also recommend, as has been done before by many others, Code Complete and Software Estimation by Steve McConnell. I particularly like his metaphor of "growing" software rather than constructing or building. This way of viewing software lends itself better to something that will have a long life-cycle.

I'm currently reviewing a very old C++ project and see lots of code duplication there.

For example, there is a class with 5 MFC message handlers each holding 10 identical lines of code. Or there is a 5-line snippet for a very specific string transformation every here and there. Reducing code duplication is not a problem in these cases at all.

But I have a strange feeling that I might be misunderstanding something and that there was originally a reason for this duplication.

What could be a valid reason for duplicating code?

A good read about this is large scale c++ software design by John Lakos.

He has many good points about code duplication, where it might help or hinder a project.

The most important point is asking when deciding to remove duplication or duplicate code:

If this method changes in the future, do I want to change the behaviour in the duplicated method, or needs it to stay the way it is?

After all, methods contain (business) logic, and sometimes you'll want to change the logic for every caller, sometimes not. Depends on the circumstances.

In the end, it's all about maintenance, not about pretty source.

I am very interested in some studies or empirical data that shows a comparison of compilation times between two c++ projects that are the same except one uses forward declarations where possible and the other uses none.

How drastically can forward declarations change compilation time as compared to full includes?

#include "myClass.h"

vs.

class myClass;

Are there any studies that examine this?

I realize that this is a vague question that greatly depends on the project. I don't expect a hard number for an answer. Rather, I'm hoping someone may be able to direct me to a study about this.

The project I'm specifically worried about has about 1200 files. Each cpp on average has 5 headers included. Each header has on average 5 headers included. This regresses about 4 levels deep. It would seem that for each cpp compiled, around 300 headers must be opened and parsed, some many times. (There are many duplicates in the include tree.) There are guards, but the files are still opened. Each cpp is separately compiled with gcc, so there's no header caching.

To be sure no one misunderstands, I certainly advocate using forward declarations where possible. My employer, however, has banned them. I'm trying to argue against that position.

Thank you for any information.

#include "myClass.h"

is 1..n lines

class myClass;

is 1 line.

You will save time unless all your headers are 1 liners. As there is no impact on the compilation itself (forward reference is just way to say to the compiler that a specific symbol will be defined at link time, and will be possible only if the compiler doesnt need data from that symbol (data size for example)), the reading time of the files included will be saved everytime you replace one by forward references. There's not a regular measure for this as it is a per project value, but it is a recommended practice for large c++ projects (See Large-Scale C++ Software Design / John Lakos for more info about tricks to manage large projects in c++ even if some of them are dated)

Another way to limit the time passed by the compiler on headers is pre-compiled headers.

Have a look in John Lakos's excellent Large Scale C++ Design book -- I think he has some figures for forward declaration by looking at what happens if you include N headers M levels deep.

If you don't use forward declarations, then aside from increasing the total build time from a clean source tree, it also vastly increases the incremental build time because header files are being included unnecessarily. Say you have 4 classes, A, B, C and D. C uses A and B in its implementation (ie in C.cpp) and D uses C in its implementation. The interface of D is forced to include C.h because of this 'no forward declaration' rule. Similarly C.h is forced to include A.h and B.h, so whenever A or B is changed, D.cpp has to be rebuilt even though it has no direct dependency. As the project scales up this means that if you touch any header it'll have a massive effect on causing huge amounts of code to be rebuilt that just doesn't need to be.

To have a rule that disallows forward declaration is (in my book) very bad practice indeed. It's going to waste huge amounts of time for the developers for no gain. The general rule of thumb should be that if the interface of class B depends on class A then it should include A.h, otherwise forward declare it. In practice 'depends on' means inherits from, uses as a member variable or 'uses any methods of'. The Pimpl idiom is a widespread and well understood method for hiding the implementation from the interface and allows you to vastly reduce the amount of rebuilding needed in your codebase.

If you can't find the figures from Lakos then I would suggest creating your own experiments and taking timings to prove to your management that this rule is absolutely wrong-headed.

I have seen many explanations on when to use forward declarations over including header files, but few of them go into why it is important to do so. Some of the reasons I have seen include the following:

  • compilation speed
  • reducing complexity of header file management
  • removing cyclic dependencies

Coming from a .net background I find header management frustrating. I have this feeling I need to master forward declarations, but I have been scrapping by on includes so far.

Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?

How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?

I can buy the argument for reduced complexity, but what would a practical example of this be?

How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?

1) reduced disk i/o (fewer files to open, fewer times)

2) reduced memory/cpu usage most translations need only a name. if you use/allocate the object, you'll need its declaration.

this is probably where it will click for you: each file you compile compiles what is visible in its translation.

a poorly maintained system will end up including a ton of stuff it does not need - then this gets compiled for every file it sees. by using forwards where possible, you can bypass that, and significantly reduce the number of times a public interface (and all of its included dependencies) must be compiled.

that is to say: the content of the header won't be compiled once. it will be compiled over and over. everything in this translation must be parsed, checked that it's a valid program, checked for warnings, optimized, etc. many, many times.

including lazily only adds significant disk/cpu/memory increase, which turns into intolerable build times for you, while introducing significant dependencies (in non-trivial projects).

I can buy the argument for reduced complexity, but what would a practical example of this be?

unnecessary includes introduce dependencies as side effects. when you edit an include (necessary or not), then every file which includes it must be recompiled (not trivial when hundreds of thousands of files must be unnecessarily opened and compiled).

Lakos wrote a good book which covers this in detail:

http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&s=books&qid=1304529571&sr=8-1

I often hear people praise the compilation speed of C#. So far I have only made a few tiny applications, and indeed I noticed that compilation was very fast. However, I was wondering if this still holds for large applications. Do big C# projects compile faster than C++ projects of a similar size?

C++ is so slow to compile as the header files have to be reread and reparse every time they are included. Due to the way “#defines” work, it is very hard for a compiler to automatically pre-compile all header files. (Modula-2 made a much better job of this) Having 100s of headers read for each C++ file that is compiled is normal on a lot of C++ projects.

Sometimes incremental c++ compiles can be a lot faster than C#. If you have all your C++ header files (and design) in a very good state (see books like Large-Scale C++ Software Design, Effective C++) You can make a change to the implementation of a class that is used by most of the system and only have one dll recompile.

As C# does not have separate header files whenever you change the implantation of a class, all uses of the class get recompiled even if the public interface of the class has not changed. This can be reduced in C# by using “interface based programming” and “dependency injection” etc. But it is still a pain.

However on the whole I find that C# compiles fast enough, but large C++ projects are so slow to compile that I find myself not wanting to add a methods to a “base class” due to the time of the rebuild.

Having lots of Visual Studio projects with a handful of classes in each can slow down C# builds a lot. Combining related projects together and then “trusting” the developers not to use class that are private to a namespace can at times have a great benefit. (nDepends can be used to check for people breaking the rules)

(When trying to speed up C++ compiles I have found FileMon very useful. One project I worked on, the STL was added to a header file and the build got a lot slower. Just adding STL to the precompiled header file made a big difference! Therefore track your build time and investigate when it gets slower)

First let me say, I am not a coder but I help manage a coding team. No one on the team has more than about 5 years experience, and most of them have only worked for this company.. So we are flying a bit blind, hence the question.

We are trying to make our software more stable and are looking to implement some "best practices" and coding standards. Recently we started taking this very seriously as we determined that much of the instability in our product could be linked back to the fact that we allowed Warnings to go through without fixing when compiling. We also never bothered to take memory leaks seriously enough.

In reading through this site we are now quickly fixing this problem with our team but it begs the question, what other practices can we implement team wide that will help us?

Edit: We do fairly complex 2D/3D Graphics Software that is cross-platform Mac/Windows in C++.

The first thing you need to consider when adding coding standards/best practices is the effect it will have on your team's morale and cohesiveness. Developers usually resent any practices that are imposed on them even if they are good ideas. The people issues have to be addressed for a big change to be successful.

You will need to involve your group in developing the standards and try to achieve consensus. That said, you will never get universal agreement on anything, so you will have to balance consensus and getting to standards. I've seen major fights over something as simple as tabs versus spaces in source.

The best book I've seen for C/C++ guidelines in complicated projects is Large Scale C++ Software Design. That book along with Code Complete (which is a must-read classic) are good starting points.

I am working on a large project that uses the STL and have a question about your preferred way to organise your STL #includes.

  • Do you prefer to #include each header in the source file it is used. For example, if both foo.cpp and bar.cpp require std::string, then both will #include <string>.
  • Do you prefer to have a single header file that includes all the STL headers your project uses (i.e. add them to the MS 'stdafx.h' pre-compiled header).

The advantage of the first method is that the .cpp file is an independent unit and can be used in a different project without having to worry that you're missing a #include. The advantages of the second method is that you can take use your compilers pre-compiled header support plus you can wrap STL #includes in pragmas that disable some warnings (for example, some Boost headers will cause warnings when compiling at level 4).

Which do you prefer to use?

I only include the header files that are really needed in every source, and not 'catch all' headers, to keep dependencies (and hence compile times) as low as possible.

Precompiled headers can work irrespective of this (i.e. I rely on precompiled headers to speed up the compiling process, not to get declarations). So even if something gets declared via the included precompiled headers, I still include the 'regular' header, which will get skipped by the include guard mechanism and won't add anything significant to the compile times.

As precompiled headers are a compiler specific thing. Optimizing / changing precompiled headers should have no effect on the correct functioning of the code in my opinion.

The main advantage of having dependencies as low as possible is that refactoring gets easier (or rather: feasible)

Great book on all this is Large Scale C++ Design from Lakos

what is the most advanced c or c++ book you ever read? i am asking this because i already read lots and lots of books on c and c++ on a lot of topics including (object oriented programming-data structures and algorithms-network programming-parallel programming (MPI-PThreads-OpenMP-Cilk-Cuda)-boost library....). So whats next. I still want to advance.. especially in c.

Hey nobody mentioned about Bruce Eckel's Thinking in C++ Volume 1 And Volume 2. When I read it as the first book it went straight way above my head. However as now I have good experience and have read books like Effective/Exceptional C++ so Eckel's book is now an ordinary stuff. However no doubt its a very popular book (4.5 stars on Amazon - 84 customer reviews).

Large Scale C++ Design by John Lakos.

Practical advice on managing the complexity of compiling/linking and executing large C++ programs. Talks a lot about decoupling and how to avoid the many kinds of dependencies that arise in C++.

(This is something most C#/Java developers, and sadly some C++-devs too, rarely understand. IMO, it's a pain they need to. I wish we had modules in C++ already.)

My favourite "difficult" C++ book is this Template Metaprogramming one: C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond.

You really want to test your mental limits? Then try these:

Alexandrescu: Modern C++ Design

Abrahams&Gurtovoy: C++ Template Metaprogramming

These books look deceiptively thin, but they stretch the limits of template programming, your C++ compiler, and your brain.

It seems to me there aren't half as many books about C programming as there are about C++. The language just isn't that complex.

One interesting read might be P. J. Plauger The Standard C Library. It is supposed to contain some masterful code. It's on my to-read list.

I am not sure if you would consider these advanced, but I would surely put them in the category of must have references:

The C++ Programming Language Special Edition (3rd) by Bjarne Stroustrup

The C++ Standard Library: A Tutorial and Reference by Nicolai M. Josuttis

The other books I would recommend have already been listed by others.

One problem in large C++ projects can be build times. There is some class high up in your dependency tree which you would need to work on, but usually you avoid doing so because every build takes a very long time. You don't necessarily want to change its public interface, but maybe you want to change its private members (add a cache-variable, extract a private method, ...). The problem you are facing is that in C++, even private members are declared in the public header file, so your build system needs to recompile everything.

What do you do in this situation?

I have sketched two solutions which I know of, but they both have their downsides, and maybe there is a better one I have not yet thought of.

John Lakos' Large Scale C++ Software Design is an excellent book that addresses the challenges involved in building large C++ projects. The problems and solutions are all grounded in reality, and certainly the above problem is discussed at length. Highly recommended.

Are function declarations/prototypes necessary in C99 ?

I am currently defining my functions in a header file and #include-ING it in the main file. Is this OK in C99 ?

Why do most programmers declare/prototype the function before main() and define it after main() ? Isn't it just easier to define them before main and avoid all the declarations/prototypes ?

Contents of header.h file:

int foo(int foo)
{
// code
return 1;
}

Contents of main file:

#include <stdio.h>

#include "header.h"

int main(void)
{
foo(1);
return 0;
}

How and where to prototype and define a function in C :

  1. Your function is used only in a specific .c file : Define it static in the .c file. The function will only be visible and compiled for this file.

  2. Your function is used in multiple .c files : Choose an appropriate c file to host your definition (All foo related functions in a foo.c file for example), and have a related header file to have all non-static (think public) functions prototyped. The function will be compiled only once, but visible to any file that includes the header files. Everything will be put together at link time. Possible improvement : always make the related header file, the first one included in its c file, this way, you will be sure that any file can include it safely without the need of other includes to make it work, reference : Large Scale C++ projects (Most of the rules apply to C too).

  3. Your function is inlinable (are you sure it is ?) : Define the function static inline in an appropriate header file. The compiler should replace any call to your function by the definition if it is possible (think macro-like).

The notion of before-after another function (your main function) in c is only a matter of style. Either you do :

static int foo(int foo) 
{ 
// code 
return 1; 
} 

int main(void) 
{ 
foo(1); 
return 0; 
} 

Or

static int foo(int foo);

int main(void) 
{ 
foo(1); 
return 0; 
} 

static int foo(int foo)
{ 
// code 
return 1; 
} 

will result in the same program. The second way is prefered by programmers because you don`t have to reorganize or declare new prototypes every time you declare a new function that use the other ones. Plus you get a nice list of every functions declared in your file. It makes life easier in the long run for you and your team.

In a C++ project, compilation dependencies can make a software project difficult to maintain. What are some of the best practices for limiting dependencies, both within a module and across modules?

Recently I've been writing code similar to this:

messagehandler.h:

#include "message.h"
class MessageHandler {
public:
   virtual ~MessageHandler() {}
   virtual void HandleMessage(Message *msg) = 0:
};

persistmessagehandler.h:

MessageHandler *CreatePersistMessageHandler();

persistmessagehandler.cpp:

#include "messagehandler.h"
#include "persist.h"

class PersistMessageHandler : public MessageHandler {
private:
   PersistHandle ph;
   size_t count;
   InternalCheck();
public:
   PersistMessageHandler(int someParam);
   virtual ~PersistMessageHandler ();
   virtual void HandleMessage(Message *msg):
};
PersistMessageHandler::PersistMessageHandler(int someParam)
{
  ph.Initialize();
}
... rest of implementation.

MessageHandler *CreatePersistMessageHandler(int someParam)
{
  return new PersistMessageHandler(someParam);
}

The reasoning here is to hide the PersistMessageHandler. Clients don't need to include a header for the PersistMessageHandler class, with all the includes and types the implementation might need, and to more cleanly seperate the interface and implementation. . It'll always be dynamically allocated anyway,

All PersistMessageHandler users will just call CreatePersistMessageHandler(..); directly or indirectly get one from a factory.

But. I've not seen this approach used much elsewhere. Is the above good practice ? Are there other/better alternatives for simple cases ?

The process of hiding the implementation details is called Encapsulation. The process of minimizing build dependencies for your users is called Insulation. There is a great (but aging) book by John Lakos devoted to both topics:

http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620

What is good book for industry level C++ programming? I am not looking for a beginners C++ book that talks about datatypes and control structures. I am looking for a more advanced book. For example, how to build system applications using C++. Any kind of guidance will be very helpful.

If you're looking for books on refining your craft in C++ as a language, you don't get much better than Scott Meyers' Effective C++ and More Effective C++ and Herb Sutter's Exceptional C++, More Exceptional C++ and Exceptional C++ Style. All are packed with invaluable information on bringing your facility with the language from the intermediate to the advanced level.

System-level programming is specific to operating system, so the books diverge based on your platform. Ones I've found very helpful (albeit not C++ specific) are: Windows System Programming, by Johnson M. Hart, Advanced Windows Debugging, by Mario Hewardt and Daniel Pravat, and Linux System Programming, by Robert Love.

All of these books (as well as Peter Alexander's excellent suggestion of Modern C++ Design) are available on O'Reilly's Safari service, which is a pretty cost-effective way of doing a lot of technical reading on the cheap and well worth checking out if you're considering going on a studying binge.

Lakos' Large Scale C++ Software Design is quite a good intermediate-advanced level book about C++ software architecture. It's a little out of date - predating widespread use of templates for example - but it is quite a good book on the subject.

Lakos worked for Mentor Graphics in the 1980s when first generation workstations were the technology du jour. This was an era when the difference in performance and memory footprint between C and C++ apps was regarded as significant. This 'old school' approach discusses efficient C++ systems architecture in some depth, which is a bit of a unique selling point for this book.

These are the best two books I have seen and read

Advanced C++ Programing Styles and Idioms

C++ Common Knowledge

Modern C++ Design by Andrei Alexandrescu is probably the most advanced C++ book out there. It's more about very advanced design patterns rather than building software.

I can deal with only the easiest case, when there are only 2 modules A and B

A is dependant on B, so I build B as a library and include B's header file in A, also link to B library when building A.

This won't work when A and B are inter-dependant, and even worse when the number of modules grow ..

So what's the general way to carry out modularized development in c/c++?

UPDATE

Sorry, seems my title is inacurate, the rephrased version is: how can I divide a module into many .h and .cpp files(not a single one)?

The solution is to make sure your modules form a directed acyclic graph... I.e. if A depends on B, make sure B doesn't depend on A. It takes a lot of discipline but is worth it in the long run.

If you are interested in this stuff, Large Scale C++ Software Design is a good read.

I'm thinking specifically of the Strategy pattern (Design Patterns, GoF94), where it is suggested that the context passed to the strategy constructor can be the object which contains the strategy (as a member) itself. But the following won't work:

//analysis.h

class StrategyBase;
class Strategy1;
class Strategy2;
class Analysis
{
   ...
      void ChooseStrategy();
   private:
      StrategyBase* _s;
      ...
};

//analysis.cpp

void Analysis::ChooseStrategy()
{
   if (...) _s = new Strategy1(this);
   else if (...) _s = new Strategy2(this);
   ...
}

//strategy.h

#include analysis.h
...

and then StrategyBase and its subclasses then access the data members of Analysis.

This won't work because you can't instantiate Strategy* classes before they've been defined. But its definition depends on that of Analysis. So how are you supposed to do this? Replace ChooseStrategy with

void SetStrategy(StrategyBase* s) { _s = s; }

and do the instantiation in files which #include both analysis.h and strategy.h? What's best practice here?

You will always have circular dependencies in the State/Strategy Pattern, except for very general States/Strategies. But you can limit the in-size (Lakos) use of the respective other class such that it compiles, at least:

  1. Forward-declare Analysis (analysis.h or strategies.h)
  2. Define StrategyBase and subclasses (don't inline methods that use Analysis) (strategies.h)
  3. Define Analysis (may already use inline methods that use strategies) (analysis.h)
  4. Implement Analysis and the strategy classes' non-inline methods (analysis.cpp)

I need a mock implementation of a class - for testing purposes - and I'm wondering how I should best go about doing that. I can think of two general ways:

  1. Create an interface that contains all public functions of the class as pure virtual functions, then create a mock class by deriving from it.
  2. Mark all functions (well, at least all that are to be mocked) as virtual.

I'm used to doing it the first way in Java, and it's quite common too (probably since they have a dedicated interface type). But I've hardly ever seen such interface-heavy designs in C++, thus I'm wondering.

The second way will probably work, but I can't help but think of it as kind of ugly. Is anybody doing that?

If I follow the first way, I need some naming assistance. I have an audio system that is responsible for loading sound files and playing the loaded tracks. I'm using OpenAL for that, thus I've called the interface "Audio" and the implementation "OpenALAudio". However, this implies that all OpenAL-specific code has to go into that class, which feels kind of limiting. An alternative would be to leave the class' name "Audio" and find a different one for the interface, e.g. "AudioInterface" or "IAudio". Which would you suggest, and why?

Just as I would not hand-author mock objects in Java, I would also not hand-author them in C++. Mock objects are not just stubbed out classes, but are test tools that perform automated checks like making sure certain methods are called, or that they are called in order, etc. I would take a look at the various mock object frameworks for C++ out there. googlemock looks interesting, but there are others.

Regarding how to abstract out the concept of controlling Audio resources from the implementation, I definitely favor using a C++ "interface" (pure virtual base class) with a generic name (e.g. Audio) and an implementation class named for what makes it special (e.g. OpenALAudio). I suggest you not embed the word "interface" or "I" into your class names. Embedding type or programmatic concepts into names has been falling out of vogue for many years (and can force widespread renaming when you, for example, elevate an "interface" to a full-fledged "class").

Developing to interfaces is an object-oriented concept and thus appropriate for C++. Some of the most important books on design specifically targeting C++ are all about programming to interfaces (which in C++ terms means programming using pure virtual base classes). For example, Design Patterns and Large Scale C++ Software Design.

What books should you read to improve your code and get used to good programming practices after getting a taste of the language?

C++ Coding-Standards: 101 Rules, Guidelines, and Best Practices

---- Herb Sutter and Andrei Alexandrescu

alt text

Meyers' Effective C++, "More Effective C++" and "Effective STL".

Design Patterns by the 4 guys affectionately nicknamed "the gang of four".

Lakos' Large Scale C++ Software Design.

There are of course many others, including many truly good ones, but if I had to pick 3 on C++ (counting Meyers' three thin, information-packed volumes as one;-) these would be it...

When I do a fresh compilation for my project, which includes 10+ open-source libs. It takes about 40mins. (on normal hardware)

Question: where really are my bottle necks at? hard-drive seeking or CPU Ghz? I don't think multi-core would help much correct?

--Edit 1--
my normal hardware = i3 oc to 4.0Ghz, 8GB 1600Mhz DDR3 and a 2tb Western digital

--Edit 2--
my code = 10%, libs = 90%, I know I dont have to build everything everytime, but I would like to find out how to improve compiling performance, so when buying new pc for developer, I would make a smarter choice.

--Edit 3--
cc = Visual Studio (damn)

multicore compilation will help, tremendously in most cases.

you'll have to analyze your projects, and the time spent in each phase in order to determine where the bottlenecks are.

in typical large c++ projects, the process is typically CPU bound, then disk bound. if it's the other way around, you're probably in header dependency hell.

there's actually a ton of ways to reduce compile times and dependency in your projects. the best singular reference i know of is by Lakos:

http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&qid=1296569079&sr=8-1

it's one of the most important/practical c++ books i've read.

you can typically reduce compile times dramatically (e.g, over 40x faster if you take it very seriously), but may take a lot of work/time to correct existing codebases.

An aspect of C++ that periodically frustrates me is deciding where templates fit between header files (traditionally describing the interface) and implemention (.cpp) files. Templates often need to go in the header, exposing the implementation and sometimes pulling in extra headers which previously only needed to be included in the .cpp file. I encountered this problem yet again recently, and a simplified example of it is shown below.

#include <iostream> // for ~Counter() and countAndPrint()

class Counter
{
  unsigned int count_;
public:
  Counter() : count_(0) {}
  virtual ~Counter();

  template<class T>
  void
  countAndPrint(const T&a);
};

Counter::~Counter() {
    std::cout << "total count=" << count_ << "\n";
}

template<class T>
void
Counter::countAndPrint(const T&a) {
  ++count_;
  std::cout << "counted: "<< a << "\n";
}

// Simple example class to use with Counter::countAndPrint
class IntPair {
  int a_;
  int b_;
public:
  IntPair(int a, int b) : a_(a), b_(b) {}
  friend std::ostream &
  operator<<(std::ostream &o, const IntPair &ip) {
    return o << "(" << ip.a_ << "," << ip.b_ << ")";
  }
};

int main() {
  Counter ex;
  int i = 5;
  ex.countAndPrint(i);
  double d=3.2;
  ex.countAndPrint(d);
  IntPair ip(2,4);
  ex.countAndPrint(ip);
}

Note that I intend to use my actual class as a base class, hence the virtual destructor; I doubt it matters, but I've left it in Counter just in case. The resulting output from the above is

counted: 5
counted: 3.2
counted: (2,4)
total count=3

Now Counter's class declaration could all go in a header file (e.g., counter.h). I can put the implementation of the dtor, which requires iostream, into counter.cpp. But what to do for the member function template countAndPrint(), which also uses iostream? It's no use in counter.cpp since it needs to be instantiated outside of the compiled counter.o. But putting it in counter.h means that anything including counter.h also in turn includes iostream, which just seems wrong (and I accept that I may just have to get over this aversion). I could also put the template code into a separate file (counter.t?), but that would be a bit surprising to other users of the code. Lakos doesn't really go into this as much as I'd like, and the C++ FAQ doesn't go into best practice. So what I'm after is:

  1. are there any alternatives for dividing the code to those I've suggested?
  2. in practice, what works best?

A rule of thumb (the reason of which should be clear).

  • Private member templates should be defined in the .cpp file (unless they need to be callable by friends of your class template).
  • Non-private member templates should be defined in headers, unless they are explicitly instantiated.

You can often avoid having to include lots of headers by making names be dependent, thus delaying lookup and/or determination of their meaning. This way, you need the complete set of headers only at the point of instantiation. As an example

#include <iosfwd> // suffices

class Counter
{
  unsigned int count_;
public:
  Counter() : count_(0) {}
  virtual ~Counter();

  // in the .cpp file, this returns std::cout
  std::ostream &getcout();

  // makes a type artificially dependent
  template<typename T, typename> struct ignore { typedef T type; };

  template<class T>
  void countAndPrint(const T&a) {
    typename ignore<std::ostream, T>::type &cout = getcout();
    cout << count_;
  }
};

This is what I used for implementing a visitor pattern that uses CRTP. It looked like this initially

template<typename Derived>
struct Visitor {
  Derived *getd() { return static_cast<Derived*>(this); }
  void visit(Stmt *s) {
    switch(s->getKind()) {
      case IfStmtKind: {
        getd()->visitStmt(static_cast<IfStmt*>(s));
        break;
      }
      case WhileStmtKind: {
        getd()->visitStmt(static_cast<WhileStmt*>(s));
        break;
      }
      // ...
    }
  }
};

This will need the headers of all statement classes because of those static casts. So I have made the types be dependent, and then I only need forward declarations

template<typename T, typename> struct ignore { typedef T type; };

template<typename Derived>
struct Visitor {
  Derived *getd() { return static_cast<Derived*>(this); }
  void visit(Stmt *s) {
    typename ignore<Stmt, Derived>::type *sd = s;
    switch(s->getKind()) {
      case IfStmtKind: {
        getd()->visitStmt(static_cast<IfStmt*>(sd));
        break;
      }
      case WhileStmtKind: {
        getd()->visitStmt(static_cast<WhileStmt*>(sd));
        break;
      }
      // ...
    }
  }
};


I am using SUSE10 (64 bit)/AIX (5.1) and HP I64 (11.3) to compile my application. Just to give some background, my application has around 200KLOC (2Lacs) lines of code (without templates). It is purely C++ code. From measurements, I see that compile time ranges from 45 minutes(SUSE) to around 75 minutes(AIX).

Question 1 : Is this time normal (acceptable)?

Question 2 : I want to re-engineer the code arrangement and reduce the compile time. Is there any GNU tool which can help me to do this?

PS :
a. Most of the question in stackoverflow was related to Visual Studio, so I had to post a separate question.
b. I use gcc 4.1.2 version.
c. Another info (which might be useful) is code is spread across around 130 .cpp files but code distribution varies from 1KLOC to 8 KLOCK in a file.

Thanks in advance for you help!!!

Edit 1 (after comments)
@PaulR "Are you using makefiles for this ? Do you always do a full (clean) build or just build incrementally ?"
Yes we are using make files for project building.
Sometimes we are forced to do the full build (like over-night build/run or automated run or refresh complete code since many members have changed many files). So I have posted in general sense.

Read John Lakos's Large-Scale C++ Design for some very good methods of analysing and re-organising the structure of the project in order to minimise dependencies. Ultimately the time taken to build a large project increases as the amount of code increases, but also as the dependencies increase (or at least the impact of changes to header files increases as the dependencies increase). So minimising those dependencies is one thing to aim for. Lakos's concept of Levelization is very helpful in working out how to split several large monolothic inter-dependent libraries into something with a much better structure.

From Large-Scale C++ Software Design (Lakos), page 652:

The question is, "In which unique translation unit will the compiler deposit the virtual table definition(s) for a given class?". The trick employed by CFRONT (and many other C++ implementations) is to place the external virtual tables in the translation unit that defines the lexically first non-inline function that appears in the class (if one exists).

Is this still the case with the most used compilers (GCC and Visual C++)? Or was it ever?

GCC happens to document that it behaves as described in the question (http://gcc.gnu.org/onlinedocs/gcc/Vague-Linkage.html):

VTables

C++ virtual functions are implemented in most compilers using a lookup table, known as a vtable. The vtable contains pointers to the virtual functions provided by a class, and each object of the class contains a pointer to its vtable (or vtables, in some multiple-inheritance situations). If the class declares any non-inline, non-pure virtual functions, the first one is chosen as the “key method” for the class, and the vtable is only emitted in the translation unit where the key method is defined.

Note: If the chosen key method is later defined as inline, the vtable will still be emitted in every translation unit which defines it. Make sure that any inline virtuals are declared inline in the class body, even if they are not defined there.

However, even in situations where there might be several vtables across several object files (as can happen if the 'key method' turns out to be inline), the compiler arranges for the duplicates to be ignored if possible, but the duplicates may end up using space in the final binary if the target doesn't support COMDAT:

When used with GNU ld version 2.8 or later on an ELF system such as GNU/Linux or Solaris 2, or on Microsoft Windows, duplicate copies of these constructs will be discarded at link time. This is known as COMDAT support.

On targets that don't support COMDAT, but do support weak symbols, GCC will use them. This way one copy will override all the others, but the unused copies will still take up space in the executable.

For targets which do not support either COMDAT or weak symbols, most entities with vague linkage will be emitted as local symbols to avoid duplicate definition errors from the linker. This will not happen for local statics in inlines, however, as having multiple copies will almost certainly break things.

FWIW, GCC seems to use a symbol that starts with __ZTV for the vtable.

As far as MSVC is concerned, some empirical testing with VC++10 (I don't think MS documents the behavior) shows that it seems that VC doesn't attempt to limit the vtable to a single object file. Since Microsoft knows that it can rely on the linker supporting COMDAT sections and since constructors are the only functions that use a vtable directly (all other vtable uses are indirect through the object pointer, I believe), it looks like VC just places a copy of the vtable in any object file where a constructor is instantiated. For classes that use the compiler generated ctor, that would be anywhere an object of that type is constructed.

How do I turn one file with lots of classes to many files with one class per file? (C\C++)

So I have that file with such structure: Some includes and then lots of classes that sometimes call each other:

#include <wchar.h>
#include <stdlib.h>
//...
class PG_1 {
  //...
}
class PG_2 {
  //...
}
//......
class PG_N {
  //...
}

There might not be an easy way to do this. The problem is you have to get the #includes right, split the code correctly to different header and cpp files, and if your classes have cyclic dependencies among themselves, you have to deal with them correctly, or better, try to resolve those dependencies to make them non-cyclic.

Best suggestion I can give you: first try to do this manually for two or three classes. Then decide what kind of physical class layout you need. Afterwards, try to write a program. Don't try to write a program unless you fully understand what to do.

By the way, how many classes/files do have?

EDIT: To get a better notion of what a good physical class-to-file layout may be, I suggest to read Large Scale C++ Design from John Lakos. Is a little bit outdated, since it contains nothing about precompiled headers, but still useful.

I'm currently in the process of trying to organize my code in better way.

To do that I used namespaces, grouping classes by components, each having a defined role and a few interfaces (actually Abstract classes).

I found it to be pretty good, especially when I had to rewrite an entire component and I did with almost no impact on the others. (I believe it would have been a lot more difficult with a bunch of mixed-up classes and methods)

Yet I'm not 100% happy with it. Especially I'd like to do a better separation between interfaces, the public face of the components, and their implementations in behind. I think the 'interface' of the component itself should be clearer, I mean a new comer should understand easily what interfaces he must implement, what interfaces he can use and what's part of the implementation.

Soon I'll start a bigger project involving up to 5 devs, and I'd like to be clear in my mind on that point.

So what about you? how do you do it? how do you organize your code?

You might find some of the suggestions in Large Scale C++ Software Design useful. It's a bit dated (published in 1996) but still valuable, with pointers on structuring code to minimize the "recompiling the world when a single header file changes" problem.

I would like to know about books that talk about design issues like when to use namespaces and other coding standards to write good quality efficient C++ code. One that talks about Code testing will also be appreciated.

"Large-Scale C++ Software Design" by John Lakos worked great for me years ago on how to organise code in large projects.

On testing, this is not my area, and I cannot recommend a great book. What I can do is discourage you from getting "Testing Computer Software", 2nd edition by Cem Kaner, Jack Falk and Hung Q. Nguyen. I found it severely dated and extremely clumsy. But please take this with a grain of salt.

For big projects, it is essential to follow a common design and coding style. Consistently.

I found the following book useful to have a common ground in a big project.

C++ Coding Standards: 101 Rules, Guidelines, and Best Practices by Andrei Alexandrescu, Herb Sutter

My C++ application depends on Boost. I'd like someone to just be able to check out my repository and build the whole thing in one step. But the boost distribution is some 100MB and thousands of files, and it seems to bog down source control -- plus I really don't need it to be versioned.

What's the best way to handle this kind of problem?

I've found the book "Large-Scale C++ Software Design" by John Lakos very useful as far as organising large C++ projects is concerned. Recommended.

What does it mean to say - Engineering scalability into applications. Are there design patterns that would make an application more scalable? This question is mainly in the context of web applications or SOA middleware based applications.

Here are some great resources on web application scalability to get you started: Todd Hoff's highscalability.com, Scalable Internet Architectures by Theo Schlossnagle, and Building Scalable Web Sites by Cal Henderson. Highscalability.com will point you to a lot of presentations and articles well worth reading, including this one from Danga about how they scaled LiveJournal as it grew.

When I think about "large scale applications" I think of three very different things:

  1. Applications that will run across a large scale-out cluster (much larger than 1024 cores).

  2. Applications that will deal with data sets that are much larger than physical memory.

  3. Applications that have a very large source base for the code.

Each kind of "scalability" introduces a different kind of complexity, and requires a different set of compromises.

Scale-out applications typically rely on libraries that use MPI to coordinate the various processes. Some applications are "embarrassingly parallel" and require very little (or even no) communication between the different processes in order to complete the task (e.g. rendering different frames of an animated movie). This style of application tends to be performance bound based on CPU clock rates, or memory bandwidth,. In most cases, adding more cores will almost always increase the "scalability" of the application. Other applications require a great deal of message traffic between the different processes in order to ensure progress toward a solution. this style of application will tend to be bound on the overall performance of the interconnect between nodes. These message intensive applications may benefit from a very high bandwidth, low latency interconnect (e.g. InfiniBand). Engineering scalability into this style of application begins with minimizing the use of shared files or resources by all the processes.

The second style of scalability are applications that run on a small number of servers (including a single SMP style server), but that deal with a very large dataset, or a very large number of transactions. Adding physical memory to the system can often increase the scalability of the application. However, at some point physical memory will be exhausted. In most cases, the performance bottleneck will be related to the performance of the disc I/O of the system. In these cases, adding high performance persistent storage (e.g. stripped hard drive arrays), or even adding a high performance interconnect to some kind of SAN can help to increase the scalability of the application. Engineering scalability into this style of application begins with algorithmic decisions that will minimize the need to repeatedly touch the same data (or setup the same infrastructure) more than is necessary to complete the task (e.g. open a persistent connection to a database, instead of opening a new connection for each transaction).

Finally, there is the case of scalability related to the overall size of the source code base. In these instances, good software engineering practices can help to minimize conflicts, and to keep the code base clean. The book Large Scale C++ Software Design was the first one that I encountered that really took on the challenge of providing best practices for large source base software development. The book focuses on C++ as the implementation language, but the guidelines and practices can be applied to any project or language. Engineering scalability into this style of application involves making high level decisions about the structure of the code to minimize dependencies within the code base (e.g. do not have a single .h that when changed will force a rebuild of the entire code base, use a build system that will reuse .o's whenever possible).

Does including the same header files multiple times increase the compilation time?

For example, suppose every file in my project uses <iostream> <string> <vector> and <algorithm>. And if I include a lot of files in my source code, then does that increase the compile time?

I always thought that the guard headers served important purpose of avoiding double definitions but as a by product also eliminates double code.

Actually, someone I know proposed some ideas to remove such multiple inclusions. However, I consider them to be completely against the good design practices in c++. But was still wondering what might be the reasons of him to suggest the changes?

If compile times were an issue, people used to use the optimisation recommended by Praetorian, originally recommened in Large Scale Software Design. However, most modern compilers automatically optimise for this case. For example, see the help from gcc

I am looking for instructional materials on object-oriented software design that are framed as extended examples. In other words, over the course of several lessons or chapters, the author would develop a moderately large piece of software and explain the design approach step by step. Ideally, the material would address not only the design of the primary software being built but also offer useful advice on the rest of the development process -- testing, deployment, etc.

This is indispensable for understanding large scale oo design. In though its implemented in c++ the concepts are completely general and can be used effectively on any platform: Large Scale OO Design

Truly a classic!!

I was wondering whether anyone had any good resources (papers/articles/book references) on compile/linking optimizations.

I had worked in two companies that performed their linking operations differently.

  1. The First Company forced a strict DAG structure for the code explaining to me that with a forced tree structure that linking times are crazy fast
  2. The second company employed "master cpps" where they had few cpps that actually included all other ones. (the other ones were then excluded to be compiled from the project).

Both have their advantages/disadvantages and I was hoping on writing a paper for my school report on this subject and just looking for material.

Thanks!

Large Scale C++ Software Design is a good reference for this kind of stuff.

My c++ program is using a separate header file (Let's call it myHeader.h) and therefore includes it (#include "myHeader.h"). In my program I need to use another header file (Let's call it another.h). Does it make a difference whether I put the #include "another.h" directive in the cpp file or in myHeader.h?

There is a difference - every time your h file is included, any files included in that h file are included as well - I haven't kept up-to-date with modern C++ compilers, but this used to really increase compile time.

It also increases the physical dependency of the source - John Lakos' Large Scale C++ Software Design addresses this, and is well worth a read on structuring c++ programs. It's published in 1996, so it's not based around current practice, but the advise on structure is worth knowing.

Should all c++ code in a project be encapsulated into a single class with main simply calling that class? Or should the main function declare variables and classes.

If you are going to build a large project in C++, you should at the very least read Large Scale C++ Software Design by John Lakos about it. It's a little old but it sounds like you could benefit from the fundamentals in it.

Keep in mind that building a large scale system in any language is a challenge and requires skill and discipline to prevent it falling to pieces very quickly. Don't take it lightly.

That said, if your definition of "large" is different than mine than I may have alternative advice to give you. I'm assuming you're talking about a project where the word "million" will be mentioned in sentences that also contain the words "lines of code".

We're creating very complex embedded system and «sources» contains few projects of Visual C++, IAR, Code Composer Studio and Altium Designer schemes and pcbs. All of that possibly could be in few versions. So, what practice could you advice me to arrange all that stuff? Thank you

If your C++ source files are numerous and span multiple directories then the effort put into grokking Large Scale C++ Software Design by John Lakos may be very worth it. The main theme of the book is how your physical layout of the software, that is, the arrangement of source code files in directories, limit or extend your ability to modify the software.

For many years, I have been re-reading John Lakos's classic Large-Scale C++ Software Design. Not only it was the first guidebook of this kind, but it also revolutionized how to develop a project in C++, in an efficient fashion to this day!

Do you feel his ideas are outdated now? Some C++ techniques in the book are in fact old (don't forget that book has been written before the latest standard was published) .

What's a good authority to guide building of a big system in C++ nowadays.

Don't get me wrong, I am not giving up Lakos at all. It will always be referenced for me, and occupy a prime location on the bookshelf.

Thanks

Interestingly, his next book, Scalable C++: Component-Based Development, is anticipated in 2006.

I don't think it has ever came to fruition... one day it may!

Also, Agile Principles and patterns are widespread and effective software developing paradigm. I am shifting my gears in that directions.

Check out this book: Agile Software Development, Principles, Patterns, and Practices

How to effectively design a C++ modular program? How to learn?

I am a vc++ developer but I spend most of my time learning c++.What are all the things I should know as a vc developer.

I don't understand why people here post things about WinAPI, .NET, MFC and ATL.

You really must know the language. Another benefit would be the cross platform libraries. C++ is not about GUI or Win32 programming. You can write Multi-Platform application with libraries like boost, QT, wxWidgets (may be some XML parser libs).

Visual C++ is a great IDE to develop C++ application and Microsoft is trying hard to make Visual C++ more standard conform. Learning standard language without dialects (MS dialect as well) will give you an advantage of Rapid Development Environment combined with multi-platform portability. There are many abstraction libraries out there, which work equally on Windows, Linux, Unix or Mac OS. Debugger is a great app in VC++ but not the first thing to start with. Try to write unit tests for your application. They will ensure on next modifications that you did not broke other part of tested (or may be debugged:) code.

Do not try to learn MFC or ATL from scratch, try to understand STL. MFC is old, and new version are more or less wrapper around ATL. ATL is some strange lib, which tries to marry STL-idioms (and sometimes STL itself) and WinAPI. But using ATL concepts without knowing what is behind, will make you unproductive as well. Some ATL idioms are very questionable and might be replaced by some better from boost or libs alike.

The most important things to learn are the language philosophy and concepts. I suggest you to dive into the language and read some serious books:

When here you will be a very advanced C++ developer Next books will make guru out of you:

Remember one important rule: If you have a question, try to find an answer to it in ISO C++ Standard (i.e. Standard document) first. Doing so you will come along many other similar things, which will make you think about the language design.

Hope that book list helps you. Concepts from these books you will see in all well designed modern C++ frameworks.

With Kind Regards,
Ovanes

I am going to detail out a common hypothetical problem.

Problem:

I am provided with a static library say libX.a and the header files a.h and b.h. The header files a.h and b.h contain the APIs exported by the library. a.h includes a1.h and b.h includes b1.h. But a1.h and b1.h are not shipped by the owner of the library because a1.h and b1.h contain the data structures which are used privately by the library and the owner does not want to expose these data structures.

I have to write an application invoking the APIs exported by the library. So I have to include a.h and b.h which contains the declaration for the APIs.

Ok, fine. I write my application and include the header files a.h and b.h and invoke the APIs. But I will get a compiler error, because the compiler cannot find a1.h and b1.h which are internally included by a.h and b.h.

Questions:

  1. Is there a solution to this problem? If yes, seeking earnestly to know the solution :)

  2. Is it necessary that the library owner expose all the private header files he internally uses in his library?

1) You could look at the nm tool, see SO: how to list symbols in a so-file. Perhaps it works for static-libraries as well, i'm not sure.

2) The library owner could have used the techniques in Large scale C++ software design by John Lakos to prevent exposing the internal structure. Perhaps you can use the techniques in there to create the required parts of a1.h and b1.h without relying on information you don't have. Especially declaring structures/classes without defining their contents.

I keep running into problems the larger my program gets. For instance, I get the following error:

In file included from WidgetText.h:8,
                 from LCDText.h:17,
                 from WidgetText.cpp:13:
Generic.h:21: error: expected class-name before ',' token

Here are those lines:

#include "Generic.h" // WidgetText.h:8

#include "WidgetText.h" // LCDText.h:17

#include "LCDText.h" // WidgetText.cpp:13

class Generic: public virtual LCDText, public CFG, public virtual Evaluator { // Generic.h:21

Here are the contents of the various header files:

//Generic.h
#include "CFG.h"
#include "Evaluator.h"
#include "LCDText.h"
#include "Widget.h"

//WidgetText.h
#include "Generic.h"
#include "Property.h"
#include "Widget.h"

//LCDText.h
class Generic;
#include "LCDBase.h"
#include "WidgetText.h"

This isn't providing much; I know. I'm not sure what else to include. Each header defines a class named after its header, so LCDText.h has a class named LCDText.

The one line declaring class 'Generic' in LCDText.h had to be placed there due to an earlier problem similar to this one. I'm assuming this current issue has a similar solution, but I've failed to find it thus far.

Part of the solution is to add some forward declarations to get rid of these compiler errors (just like you did with your class Generic line). Google will turn up lots of suggestions on how exactly to do this.

Using forward declarations will let you eliminate the cyclic / circular #includes described in this answer.

A forward declaration lets you include references to and pointers to the forward-declared class, and it lets you pass the forward-declared class as a parameter, but it does not let you derive from or include an instance member of the forward-declared class. So your Generic class needs a way to #include (and not just forward-declare) the header files for LCDText, CFG, and Evaluator. If it can't do that because LCDText, CFG, or Evaluator need to #include (and not just forward-declare) Generic, then you need to rearrange your hierarchy to fix this (for example, by making a member variable a pointer or reference to a class instead of making it an instance of a class).

However, using multiple inheritance like this (and especially using the diamond inheritance implied by two virtual inheritances) is a definite code smell. It suggests that you should be designing your class hierarchy differently. For example, maybe you need to be favoring composition over inheritance. That would make cleaning up your forward declarations and cyclic dependencies a lot easier.

Edit: You mentioned that you've been running into this problem more as your code base gets larger. I'm told that John Lakos's Large-Scale C++ Software Design is a good reference for managing issues such as header file dependencies in large projects, although it may be overkill for where your project is right now.

I'm a programmer for several years.

I was always told (and told others) that you should include in your .c files only the .h files that you need. Nothing more, nothing less.

But let me ask - WHY?

Using today's compilers I can include the entire h files of the project, and it won't have a huge effect on compilation times.

I'm not talking about including OS .h files, which include many definitions, macros, and preprocessing commands.

Just including one "MyProjectIncludes.h". That will only say:

#pragma once
#include "module1.h"
#include "module2.h"
// and so on for all of the modules in the project

What do you say?

In general you don't want to have to re-compile modules unless headers that they actually depend on are changed. For smaller projects this may not matter and a global "include_everything.h" file might make your project simple. But in large projects, compile times can be very significant and it is preferable to minimize inter-module dependencies as much as possible. Minimizing includes of unnecessary headers is only one approach. Using forward declarations of types that are only referenced by pointers or references, using Pimpl patterns, interfaces and factories, etc., are all approaches aimed at reducing dependencies amongst modules. Not only do these steps decrease compile time, they can also make your system easier to test and easier to modify in general.

An excellent, though somewhat dated reference on this subject, is John Lakos "Large Scale Software Design".

In my spare time, I've been taking code I've written for various purposes and appropriating them into other languages just to have a look at what's out there. Currently I'm taking a genetic programming graph colouring algorithm, originally written in Java, and trying to coerce it into C++.

The arbitrary data structure I'm using for the task has a few classes. In Java, it wasn't so much of an issue for me because I had been exposed to it for a while. The graph structure was only created once, and a Colouring was assigned to that. The Colouring (specifically finding a mostly optimal one) was the real point of the code. I could have a Graph class with inner classes like Node and Edge, for instance, or I could have a package graph with classes Graph, Node, Edge, etc.

The first case above might lend itself well to my idea of C++. A main *.cpp file might have some classes Node, Graph, Edge, defined in it. But this seems to really be missing the point of C++, from what I can tell. I'm just taking what I wrote in Java and forcing it into C++, adding destructors where appropriate and turning object references to pointers. I'm not yet thinking in C++. Do these classes bear separating into separate *.cpp files? Should they be separated, and then compiled as a library to use in the main program? What I really need are some good resources or contrived examples (or even rules of thumb) to say, in C++ programming, what are the different options that exist and when is it a good idea to thinking about one over the other?


EDIT: I've been asked by @Pawel Zubrycki to provide some example code. I'm not going to do this, because each component is fairly trivial - It generally has a reference to the next thing, and some get/set methods. I will, however, describe it.

It's essentially an incidence list. There is some unnecessary use of classes termed ...Pointer - they were a product of a literal translation of a diagram first used to explain incidence lists to me.

There is a container class, VertexList, which contains a head element VertexPointer, and methods to add new VertexPointer objects (Adding it to the graph, but not connecting it to any other nodes, allowing searches to search non-connected graphs), naive search for indices on Vertex objects, etc. Every VertexPointer has a Vertex object, as well as a VertexPointer next;, and all those handy hasNext() methods that you might expect. A Vertex also has an associated ConnectionList

The same is duplicated for EdgeList, EdgePointer, and Edge, except that an Edge is associated with two Connection objects.

ConnectionList and Connection: ConnectionList mimicking VertexList or EdgeList, having a Connection head; and all those handy methods you might expect, like addConnection(). A Connection has an Edge associated with it, as well as some Connection next;

This allows us to easily get the connected components of any one point in the graph, and have an arbitrary number of connections.


It seems pretty over-the-top complicated, but the same functionality could be duplicated with some LinkedList of Vertex objects, a LinkedList of Edge objects, and a number of LinkedList of Connection objects. The LinkedList of Vertex Objects allows us to iterate over all Vertices for exhaustive searches on Vertices, and the same applies for edges. The LinkedList objects of Connection allow us to quickly traverse to any connected Vertices and to arbitrarily add or connections in the graph. This step up in complexity was added to deal with the complexity of evaluating a certain colouring of a graph (weighted edges, quick traversal of local subgraphs, etc.)

If you have classes like Node, Graph and Edge, and their implementation is not too large, it makes perfectly good sense to define them in one and the same .cpp file. After all, they are meant to be used together.

In C++, a package like this is called a component. Usually it makes more sense to think in components than classes, since C++ is not only an OOP language and classes are not always the preferred way do things.

If you want to learn more about the preferred way to organize code in C++, I recommend Large Scale C++ Software Design.

BTW: Making a library out of these classes really seems overkill.

I think this is a really newb question, but I never found out the answer. I don't know how exactly to phrase this question, but I often find that I have to access objects that are "far away" from the current object in terms of the current hierarchy. I just want to make sure that this is the right (only) way to do this.

This goes along with passing parameters in from main also. I find that some objects far away from main need to be passed in with a parameter multiple times. How does an object far away from main get information from the command line?

For example for the first case, for 4 classes...

class A{
   B b;
   //need to check status of D

   //choice 1
   b.get_c().get_d().get_status();  

   //choice 2
   const C& c = b.get_c();
   const D& d = c.get_d();
   d.get_status();
};

class B{
public:
   C c;
   const C& get_c() {return c;}
};

class C{
public:
   D d;
   const D& get_d() {return d;}
};

class D{
public:
   bool check_status();
};

Say something like, A is car, B is door assembly, C is door, D is lock. Then A has to check say, is lock on, otherwise prevent starting.

Choice 3 is to directly call D's method from A, I'd have to make a few layers of check_status() in C, B, and A and return D, C, B.check_status().

Don't all these calls to subobjects (if the code was a bit more complicated) get a lot of overhead?

Thanks.

In addition to the answers above getting hold of a copy of Large Scale C++ Software Design may help in this regard. Don't worry that the first chapter is a bit irrelevant these days, the majority of the ideas presented are still applicable.

I used to use the following code to make sure that the include file is not loaded more than once.

#ifndef _STRING_
#include <string>
#endif

// use std::string here
std::string str;
...

This trick is illustrated in the book "API Design for C++".

Now my co-work told me that this is not necessary in Visual Studio because if the implementation head file of string contains #pragma once, the include guard is not required to improve the compilation speed.

Is that correct?

Quote from original book:

7.2.3 Redundant #include Guards
Another way to reduce the overhead of parsing too many include files is to add redundant preprocessor
guards at the point of inclusion. For example, if you have an include file, bigfile.h, that looks
like this
#ifndef BIGFILE_H
#define BIGFILE_H
// lots and lots of code
#endif
then you might include this file from another header by doing the following:
#ifndef BIGFILE_H
#include "bigfile.h"
#endif
This saves the cost of pointlessly opening and parsing the entire include file if you’ve already
included it.

Redundant include guards are, by definition "redundant". They do not affect the binaries created through compilation. However, they do have a benefit. Redundant include guards can reduce compile times.

Who cares about compile times? I care. I am just one developer is a project of hundreds of developers with millions of lines of source code in thousands of source files. A complete rebuild of the project takes me 45 minutes. Incremental builds from revision control pulls take me 20+ minutes. As my work depends on this big project, I cannot perform any testing while waiting on this prolonged build. If that build time were cut to under 5 minutes, our company would benefit greatly. Suppose the build time saving was 20 minutes. 1 year * 100 developers * 1 build/day, * 1/3 hour/build * 250 days/year * $50/hr = $416,667 savings per year. Someone should care about that.

For Ed S, I have been using Redundant Include guards for 10 years. Occasionally you will find someone who uses the technique, but most shy from it because it can make ugly-looking code. "#pragma once" surely looks a lot cleaner. Percentage-wise, very few developers continually try to improve their talent by continuing their education and techniques. The redundant #include guards technique is a bit obscure, and its benefits are only realized when someone bothers to do an analysis on large-scale projects. How many develops do you know who go out of their way to buy C++ books on advanced techniques?

Back to the original question about Redundant Include guards vs #pragma once in Visual Studio... According to the Wiki #pragma once, compilers which support "#pragma once" potentially can be more efficient that #include guards as they can analyze file names and path to prevent loading of files which were already loaded. Three compilers were mentioned by name as having this optimization. Conspicuously absent from this list, is Visual Studio. So, we are still left wondering if, in Visual Studio, should redundant #include guards be used, or #pragma once.

For small to medium sized projects, #pragma once is certainly convenient. For large sized projects where compile time become a factor during development, redundant #include guards give a developer greater control over the compilation process. Anyone who is managing or architecting large-scale projects should have Large Scale C++ Design in their library--it talks about and recommends redundant #include guards.

Possibly of greater benefit than redundant include guards is smart usage of #includes. With C++ templates and STL becoming more popular, method implementations are migrating from .cpp files to .h files. Any header dependencies the .cpp implementation would have had, is now necessarily having to migrate to the .h file. This increases compilation time. I have often seen developers stack lots of unnecessary #include's into their header files so they won't have to bother identifying the headers they actually need. This also increases compile time.

Ok, may be this question have answer already but I don't know what keyword to search (most of my searched results are about include guard in .h only, but not in .cpp)

Sometimes I saw in cpp each #include line have a extra include guard (sometimes even the included .h already have the own include guard) like this: SomeClass.cpp

#ifndef__A__
#include A.h
#endif
#ifndef__B__
#include B.h
#endif
#ifndef__C__
#include C.h
#endif

instead of

SomeClass.cpp

#include A.h
#include B.h
#include C.h

, what is the function of this include guard?

The practice of using include guards in .cpp files was recommended by John Lakos in his book Large-Scale C++ Software Design. I don't know whether any one before him had recommended the practice.

Say you have

A.h:

#ifndef __A__
#define __A__

#include "B.h"
#include "C.h"

// ...
// ...
// ...

#endif

B.h:

#ifndef __B__
#define __B__

// ...
// ...
// ...

#endif

C.h:

#ifndef __C__
#define __C__

// ...
// ...
// ...

#endif

SomeClass.cpp:

#ifndef __A__
#include "A.h"
#endif

#ifndef __B__
#include "B.h"
#endif

#ifndef __C__
#include "C.h"
#endif

When SomeClass.cpp is compiled, the contents of A.h is included. As a by-product of including the contents of A.h, the contents of B.h and C.h are also included. Also, the pre-processor macros __A__, __B__ and __C__ are defined. When the line

#ifndef __B__

is processed, since __B__ is already defined, the next line is skipped.

If SomeClass.cpp had just:

#include "A.h"
#include "B.h"
#include "C.h"

the file B.h has to be opened and processed. The contents of the file will not be included again due to the include guards but the file has to be opened and closed.

By using the first strategy, you avoid the cost of of opening and closing B.h and C.h. For large scale C++ project, John Lakos asserts, the cost is too much. Hence, the recommendation of using include guards even in .cpp files.

Can I make the above statement? Is it right or not? does modularity and dependencies are different thing or are inter-related? Help...

They're different things, but clearly they are related. For example, if you have two (alleged;-) components A and B, but A depends on B and B depends on A, then they're not really distinct components -- they're a weird split of what clearly remains a single component. To achieve real modularity, dependencies must indeed be kept in mind -- and Dependency Inversion is one of the crucial techniques to achieve clean, correct dependencies. I'd also strongly recommend this classic book -- while most relevant if your chosen language is C++, it does contain a wealth of advice that's also applicable to many other languages.

I have been coding in java most of the time, and I also studied C and C++ at University. But I have never written a large C++ code from zero made of many files, as I have done in java with a file for each class.

I'd like to know some book or reference with exercises and examples made of many files and classes en C++, so I can face big C++ projects in the future.

Sorry if you feel this question is eternally repeated.

I think it's a spirit of C++ - you don't pay for what you don't want ( you explicitly pay for what you need ):

// a.h
#include <iosfwd>

template< class T >
class QVector;

struct A
{
void process( QVector<int> );

void print( std::ostream& );
};

// some.cpp

#include "a.h"

#include <iostream> // I need only A::print() in this module, not full interface

...
A().print( std::cout );
...

That's why I think that it's not fair to prohibit developer to work such way with STL ( Will C++11 STL have forward declaration's files? ).

But also I see one bad thing: dependencies of module A will spread out in external context ( duplication of #include directives ) and it can lead to hard refactoring when interface will change ( e.g. replace QVector with QList - and now you need to replace all occurrences of <QVector> with <QList> ).

Solution of this problem is:

#include <iostream>
#include <QVector>

struct A
{
void process( QVector<int> );

void print( std::ostream& );
};

Should we call this an idiom "fundamental types of interface" - module interface's types should be like fundamentals types ( are always defined and available )? It also makes sense, but still isn't clear what way is better ( e.g. Qt mixes both approaches ).

My personal decision - always provide both ways for better modularity ( when we have enough dependencies ):

// a_decl.h
#include <iosfwd>

template< class T >
class QVector;

struct A
{
void process( QVector<int> );

void print( std::ostream& );
};

// a.h
// Include this file if you want to use most interface methods
// and don't want to write a lot of `#include`
#include <iostream>
#include <QVector>

#include "a_decl.h"

and let developer chooses what to include.

What you can say about these approaches? What way is better for you and why? Do we have a one clear winner for all cases or it always will depend on context?

From my correspondence with language creator ( I didn't receive an final answer )

UPDATE:

With boost 1.48.0 comes Container library, which allow to define containers of undefined user types ( read more ).

C++ is a language that leaves many degrees of freedom to the programmer, so it is somehow unavoidable that there are different ways to do the same thing.

IMO, what you define as "the solution", i.e., including in any .h file all the necessary includes or forward declarations, is the way to go in order to avoid "incomplete header files", and I have always followed this rule.

There is an interesting book with a thorough discussion of all the pros and cons of doing or not doing so: "Large-Scale C++ Software Design" by John Lakos, where the rule above comes from.

Speaking specifically about forward declarations, Lakos distinguishes between "in-name-only" and "in-size" class usages; only in the second case it is legitimate (according to his opinion) the use of a forward declaration:

Definition: A function f uses a type T in size if compiling the body of f requires having first seen the definition of T.

Definition: A function f uses a type T in name only if compiling f and any of the components on which f may depend does not require having first seen the definition of T.

(source)

Specifically, Lakos' reasoning revolves around the implications of certain styles of programming C++ for large scale systems, i.e. system of certain complexity, but I think that his suggestions are very well suited for any-scale systems also.

Hope that his helps.

I've read that it is better to have internal linkage (for variables, free functions, etc) because this will reduce number of symbols being "exported" from particular compilation unit. That way build times could be better.

Is this true?

Another advantage of using internal linkage is that there will not be any problems with names collision.

reference: Large-Scale C++ Software Design

In theory yes, but ... C++ evolved towards generic programming. And the introduction of namespaces limits the name collision problems.

It is always more frequent to write programs in form of "headr only libraries" included hierarchically from a single cpp file containing just a main whose purpose is instantiate a "manager object" that takes care of all the orchestration, and supply a last resort "catch" for eventually escaped-out exceptions. Long symbol table can make faster by means of "precompiled headers".

In this sense, all linkage is "internal", since there is nothing to "export".

More in general, little external linkage result in faster linking time, little internal linkage result in faster compile time. The best minimum is most likely when the internal and external tables balance each other. But there are many other important factor to take care of.

I wonder if a book like that can still be considered "good" for today standard: did you note that what it suggest -for example- about iterators is all but what the standard library today do?

Ok so I was just thinking to myself why do programmers stress so much when it comes down to Access Modifiers within OOP.

Lets take this code for example / PHP!

class StackOverflow
{
    private var $web_address;

    public function setWebAddress(){/*...*/}
}

Because web_address is private it cannot be changed by $object->web_address = 'w.e.', but the fact that that Variable will only ever change is if your programme does $object->web_address = 'w.e.';

If within my application I wanted a variable not to be changed, then I would make my application so that my programming does not have the code to change it, therefore it would never be changed ?

So my question is: What are the major rules and reasons in using private / protected / non-public entities

So my question is: What are the major rules and reasons in using private / protected / non-public entities

In Python, there are no access modifiers.

So the reasons are actually language-specific. You might want to update your question slightly to reflect this.

It's a fairly common question about Python. Many programmers from Java or C++ (or other) backgrounds like to think deeply about this. When they learn Python, there's really no deep thinking. The operating principle is

We're all adults here

It's not clear who -- precisely -- the access modifiers help. In Lakos' book, Large-Scale Software Design, there's a long discussion of "protected", since the semantics of protected make subclasses and client interfaces a bit murky.

http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620

How can I "hide" parts of a class so that whoever is using the libary does not have to include headers for all the types used in my class. Ie take the MainWindow class below, ho can I have it so when compiled in a static/dynamic libary, whoever is useing the libary does NOT have to include windows.h, ie HWND, CRITICAL_SECTION, LRESULT, etc do not have to be defined.

I know I could split it into two classes, an abstract class with just the public interface, and an implementation class which is hidden that contains the members that require windows.h.

The problem here is that the visible class can no longer be created itsself, and an additional create function (eg CreateMainWindow) is required. That is fine in this case since it is most likly that just a single instance created on the heap is wanted but for other classes this is not.

class MainWindow
{
    HWND hwnd;
    int width, height;
    std::string caption;
    bool started,exited;
    bool closeRequest;

    unsigned loopThread;
    CRITICAL_SECTION inputLock;

    Input *input;
public:
    static void init_type();
    Py::Object getattr(const char *name);

    MainWindow(int width, int height, std::string caption);
    ~MainWindow();

    bool CloseRequest(const Py::Tuple &args);
    bool CloseRequestReset(const Py::Tuple &args);

    HWND GetHwnd();

    int GetWidth();
    int GetHeight();

    Input* GetInput();
protected:
    unsigned static __stdcall loopThreadWrap(void *arg);
    unsigned LoopThreadMain();

    LRESULT WndProc(UINT msg, WPARAM wParam, LPARAM lParam);
    LRESULT static CALLBACK WndProcWrapper(HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam);
};

This book may give you some ideas:

http://www.amazon.com/Large-Scale-Software-Addison-Wesley-Professional-Computing/dp/0201633620

Large-Scale C++ Software Design

by John Lakos

I am having all sorts of problems with include-overload in my newbie C++ project, but I'm not sure how to avoid it.

How do I avoid the problem of having to include dozens of classes, for example in a map-loading scenario:

Here's a trivial example Map class, which will load a game-map from a file:

// CMap.h
#ifndef _CMAP_H_
#define _CMAP_H_
class CMap {
    public:
        CMap();
        void OnLoad();
};
#endif

// CMap.cpp
#include "CMap.h"
CMap::CMap() {
}

void CMap::OnLoad() {
    // read a big file with all the map definitions in it here
}

Now let's say I have a whole plethora of monsters to load into my map, so I might have a list or some other structure to hold all my monster definitions in the map

std::list<CMonster*> MonsterList;

Then I could simple forward-declare "CMonster" in my CMap.h, and add as many monsters as I like to that list

// CMap.h
class CMonster;

// CMap.cpp
void CMap::OnLoad() {
    // read a big file with all the map definitions in it here
    // ...
    // read in a bunch of mobs
    CMonster* monster;
    MonsterList.push_back(monster);
}

But what if I have lots of different types of monster? How do I create lots of different types of monster without including every CMonster_XXX.h? And also use methods on those?

// CMap.cpp
void CMap::OnLoad() {
    // read a big file with all the map definitions in it here
    // ...
    // read in a bunch of mobs
    CMonster_Kitten* kitty;
    kitty->OnLoad();
    MonsterList.push_back(kitty);

    CMonster_Puppy *puppy;
    puppy->OnLoad();
    puppy->SetPrey(kitty);
    MonsterList.push_back(puppy);

    CMonster_TRex *awesome;
    awesome->OnLoad();
    awesome->SetPrey(puppy);
    MonsterList.push_back(awesome);
}

Here's the rule I use for including things.

  • Forward declare as much as you can in your header files.
  • include any .h you need in your .cpp
  • don't include .h in other .h unless you have to.
  • If your project build without needing to include a .h, you are fine. (mostly, provided your compiler is compliant enough)

Edit: Additionally, you may want to read Large-Scale C++ Software Design. It talks about managing physical file dependencies.

I have automatic generated code (around 18,000 lines, basically a wrap of data) and other about 2,000 lines code in a C++ project. The project turned on the link-time-optimization operation. /O2 and fast-code optimization. To compile the code, VC++ 2008 express takes incredibly long time (around 1.5 hours). After all, it has only 18,000 lines, why the compiler takes so much time?

a little explanation to the 18,000 code. It is plain C, not even C++ which includes many unpacked for-loop, a sample would be:

a[0].a1 = 0.1284; a[0].a2 = 0.32186; a[0].a3 = 0.48305; a[1].a1 = 0.543; ..................

It basically fill a complex struct. But not so complex to compiler I guess.

The Debug mode is fast, only the Relase mode has this issue. Before I have the 18,000 lines of code, they are all fine. (that time the data is in external location). However, the release mode does many work which reduce the size of exe from 1,800kb to 700kb.

this issue does happen in link stage because all .obj files are generated. I have suspect on link-time-code-generation too but cannot figure out where is wrong.

Historically, a common cause of slow C++ computation is excessive header file inclusion, usually a result of poor modularization. You can get a lot of redundant compilation by including the same big headers in lots of small source files. The usual reference in these cases is Lakos.

You don't state whether you are using the pre-compiled header, which is the quick and dirty substitute for a header file refactoring.

I have the following dilemma here: I have a few classes, let's say A, B, C and D. A has a public interface, and a has-a relationship with B (like A is having a member variable of type B) and one of the methods of A is returning this B object, B is just a class who exposes some methods, C is another class which exposes other methods, and D is a singleton object. The public interface of D has references (pointers if you like more) to objects of class C.

So, obviously when I want to draw a relationship diagram at this step, I would have a relationship between A and B and C would be put on the diagram, without visible relationship to the other two. So, this is based on the header (.h) files, which contain the declaration of class A, B, C. I'm a little confused about D right now.

On the other end:

  1. both the implementations (in the .cpp files) of A and B are heavily dependent on objects created from class C (no, C is not something standard, such as list, string, queue, but another meaningful class in my application).
  2. both the implementations of A and B use the D singleton with the local C objects.

And here are my questions:

  1. What relationships should I put on the class diagram between A, B, C and D, not counting the one I have identified (A has-a B)? I'm particularly interested in the singleton D's relationship to class C.
  2. What is the generally accepted methodology for this kind of situations (when the interface is not having relationships between objects, because there are none, but in the implementation they are heavily used)?
  3. Would there be a difference if I would have the same question in accordance to Java and not C++ (because in java everything related to a class is in one file, so it's easier to see what a class method is actually using, while in C++ you usually just see the header).

Thanks a lot for your guidance.

You should definitively read the book Large-Scale C++ Software Design.

It particularly deals with the modeling of dependencies between interfaces and implementation, by introducing two new relationships uses-in-the-interface and uses-in-the-implementation, instead of just the traditional "has-a".

Then, it goes on with design principles applied to such modeling (such as isolation, insulation, encapsulation, etc.). It is really a highly technical book, though. So be prepared!

I'm learning OOP and have a doubt. Suppose I have a file ClassA.h that includes ClassB.h, and at some point my ClassB.h needs to include ClassA.h.

This yelds an error and I think I understand why that happens since I get an infinite include loop. But what to do in this case? Is there a way around this error? Or should I rethink my classes to avoid it? Does this mean my class organization is poorly designed? If so, what would be a way to arrange my "class diagram" and avoid this?

I just want to know what would be the best practice in this scenario. Also, why doesn't the "#pragma once" directive solve this problem? Thanks in advance.

There is a way to fix it, but it also means your class organization is broken.

The way to fix it is called an 'include guard', though many compilers also support the #pragma once directive. I suppose it isn't working because #pragma once probably doesn't consider a header file included until the entire thing is parsed. And since the recursive inclusion happens in the middle of the header file, it isn't finished being parsed yet.

An include guard is something like this:

In ClassA.h:

#pragma once // Just because. It really should help.
#ifndef INCLUDED_CLASSA_H
#define INCLUDED_CLASSA_H

#include "ClassB.h"

//... rest of header file

#endif

In ClassB.h:

#pragma once // Just because. It really should help.
#ifndef INCLUDED_CLASSB_H
#define INCLUDED_CLASSB_H

#include "ClassA.h"

//... rest of header file

#endif

The organization problem is called a circular dependency, and circular dependencies are generally a bad idea. There are a number of different ways of breaking them, but which to use depends on the exact nature of and original reason for the dependency.

Depending on the problem you can use one of a variety of techniques:

  • Inheritance from a common base class
  • Turning one of the two classes into a base class for the other - This is a variant of the previous one.
  • Forward declarations - This is not so desired because it doesn't really break the circular dependency, it just arranges it so you don't need to also have a problematic circular include dependency.
  • Turning some part of both classes into a class that they both can use - This is another variant of common base class that uses composition instead of inheritance.

There are other techniques. There is, in fact, a book that has a really wide variety of techniques to use in various situations because removing circular dependencies is a big theme of the book. That book is "Large-Scale C++ Software Design" by John Lakos.

I need to finish others developer work but problem is that he started in different way... So now I found in situation to use existing code where he chooses to inherit a non-abstract class (very big class, without any virtual functions) that already implements bunch of interfaces or to dismiss that code (which shouldn't be to much work) and to write another class that implements interfaces I need.

What are the pros and cons that would help me to choose the better approach.

p.s. please note that I don't have to much experience

Many Thanks

What are the pros and cons that would help me to choose the better approach.

It's legal to derive from a class with no virtual functions, but that doesn't make it a good idea. When you derive from a class with virtual functions, you often use that class through pointers (eg., a class Derived that inherits from Base is often manipulated through Base*s). That doesn't work when you don't use virtual functions. Also, if you have a pointer to the base class, delete-ing it can lead to a memory leak.

However, it sounds more like these classes aren't being used through pointers-to-the-base. Instead the base class is simply used to get a lot of built in functionality, although the classes aren't related in the normal sense. Inversion of control (and has-a relationships) is a more common way to do that nowadays (split the functionality of the base class into a number of interfaces -- pure virtual base classes -- and then have the objects that currently derive from the base class instead have member variables of those interfaces).

At the very least, you'll want to split the big base class into well-defined smaller classes and use those (like mixins), which sounds like your second option.

However, that doesn't mean rewrite all the other code that uses the blob base class all in one go. That's a big undertaking and you're likely to make small typos and similar mistakes. Instead, buy yourself copies of Working Effectively With Legacy Code and Large-Scale C++ Software Design, and do the work piecemeal.

I was wondering what methods of code organization stackoverflow users use. I have a sporadic thought process and as a result my code can start to look messy and over whelming. Any tips ?

I would suggest looking at the principles of Large Scale C++ Software Design by John Lakos (ISBN-13: 978-0201633627) if not the book itself. They are summed up in these lecture notes. Another summary of ideas.

Here's a brief outline of the headings of the principles, which while written about in the C++ context, the geist of which are language agnostic.

  • Internal and External Linkage
  • Components and Dependency Relations
  • Physical Hierarchy Reducing Link-Time
  • Dependencies: Levelization Reducing
  • Compile-Time Dependencies: Insulation

I would like to reduce the link-time of my project, and to do that I want to understand, exactly, why it takes so long - is it a specific library? is it something else? How can I know what to change in order to improve the link time?

Update

There are many "generic" advices such as "reduce library dependencies" but they seem impractical in our case. Our code-base is large, there are many library dependencies, and finding out, by experimenting, which dependency affects the link time the most will take an enormous amount of time. A large portion of the code base was developed years ago without thinking that much about dependencies. We are looking for a way to find a concrete direction, such as "dependency of X on Y will benefit the link time", without exhaustively trying all possible directions..

Note that we are not using LTCG at all.

For Visual C++, I think the first step in the linking time optimization is to turn off the 'Whole-Program Optimization (/GL)' option.

I would like to recommend a book on this subject: Large-Scale C++ Software Design, by John Lakos. This book gives many good points on the large scale C++ development, but I think the main theme is 'how to design package relationship to minimize linking time'.

It is about module(lib, dll) dependency minimization techniques. Because, linking a project consists of many small modules is tend to run faster than one big (many files) project.

Also, check out this blog post: The “Large-Scale C++ Software Design” rules in practice

Since I am not C++ ninja, as I see dependencies always creeps into my programs. Someone may have asked similar question before, but I want more direct responses. I ask C++ ninjas out there If they can suggest me good references for Idioms supported in C++ to minimize inter dependencies of code.

Large-Scale C++ Software Design is a good resource.

Some quick tips to reduce dependencies.

  1. Forward declaration when possible.

  2. Use PIMPL

Program 1:

#include <iostream>

std::string Hello(void){return "Hello";}
std::string World(void){return "world!";}

int main(){

    std::cout << Hello() << " " << World() << std::endl;
}

The functions Hello() and World() exist in the global namespace.
This means they can be called by any other function that comes after their declaration (or in this case, I provided the definition and did not need the declaration first).

I have a problem with this, because as my project gets bigger, I include more header files that fill the global namespace with lots of functions, and I risk having function signature collisions, or worse, accidentally calling the wrong function that should only be called as a sub-task of another function.

I am trying to follow the paradaim of functionally decomposing a task into sub-tasks, and thus it would not make sense for particular functions to ever be called outside the scope of another particular function.

Here is a work-around, and apart from the code becoming unreadable due to the indentation depth, I want to know if there are any performance or implementation gotchas. lambda functions are a bit of magic for me at the moment, so I'm curious about the unforeseen dangers.

Program 2:

#include <iostream>

int main(){

    auto Hello = [](void) -> std::string{return "Hello";};
    auto World = [](void) -> std::string{return "world!";};

    std::cout << Hello() << " " << World() << std::endl;
}

Hello() and World() are encapsulated inside main() and can not be called from outside main()'s scope.
Is that all that's different?

I would not do it. In your case what you will end up is creating huge functions as to allow the definition of the lambdas inside, and you will end up with functions where the contents of the body are much harder to maintain.

There have been very large projects before C++11 and lambdas, and the risk of collisions has been managed by different means, including namespaces as DeadMG already mentioned, but also classes and object oriented design and other forms of encapsulation (define local functions as static at namespace level in the implementation file, forcing internal linkage and avoiding conflicts with other translation units. Above that, if you carefully choose meaningful names for your identifiers, you should avoid 99% of the collisions.

If you really are going to work on a large scale project consider reading John's Lakos Large Scale C++ Software Design

To speed up the compilation of a large source file does it make more sense to prune back the sheer number of headers used in a translation unit, or does the cost of compiling code far outweigh the time it takes to process-out an include-guarded header?

If the latter is true an engineering effort would be better spent creating more, lightweight headers instead of less.

So how long does it take for a modern compiler to handle a header that is effectively include-guarded out? At what point would the inclusion of such headers become a hit on compilation performance?

(related to this question)

Assuming C/C++, simple recompilation of header files scales non-linearly for a large system (hundreds of files), so if compilation performance is an issue, it is very likely down to that. At least unless you are trying to compile a million line source file on a 1980s era PC...

Pre-compiled headers are available for most compilers, but generally take specific configuration and management to work on non system-headers, which not every project does.

See for example:

http://www.cygnus-software.com/papers/precompiledheaders.html

'Build time on my project is now 15% of what it was before!'

Beyond that, you need to look at the techniques in:

http://www.amazon.com/exec/obidos/ASIN/0201633620/qid%3D990165010/002-0139320-7720029

Or split the system into multiple parts with clean, non-header-based interfaces between them, say .NET components.