C++: multiple source files

From Wikiid
Revision as of 20:02, 3 November 2010 by SteveBaker (Talk | contribs) (Makefiles)

Jump to: navigation, search

This HOW-TO guide explains how (and why) you separate out your C++ code into multiple source files.

The Single File Approach

You can put small C++ programs into a single source file:

eg

 #include <stdio.h>
 #include <stdlib.h>
 #include <iostream>
 #include <iomanip>
 #include <fstream>
  
 class MyClass
 {
   int x ;
   int y ;
   int z ;
 public:
    MyClass () { x = y = z = 123 ; }  // Constructor
   ~MyClass () { /* do nothing */ }   // Destructor
    int GetSum () { return x + y + z ; }
 } ;
  
 int main ( int argc, char **argv )
 {
   MyClass a ;
   std::cout << a.GetSum () ;
 }

You can compile it either with a Makefile - or just by typing in the g++ command directly.

This will work fine - but once your program starts to get large, it quickly becomes unwieldy.

Multiple Source Files

It is more common to adopt the Java convention of splitting your program up with each class in a separate file. In C++, there are typically two files per class for the more complex classes. If your class has complicated member functions that shouldn't be 'inline' functions - and especially if it has static members - then you need a 'header' file (with the ".h" extension) for the declaration of the class - and a separate source file (with the ".cpp" or (some people prefer) ".cxx" extension) for the implementation of the class.

You could (in principle) give these files any names you liked - but that way lies madness! It is better to adopt the Java rule where every class is placed into a file with the same name as the class:

eg

In MyClass.h

 class MyClass
 {
   int x ; 
   int y ;
   int z ;
 public:
   MyClass () { x = y = z = 123 ; }  // A really simple constructor
  ~MyClass () { /* Do nothing */ }   // A really simple destructor
   int GetSum () { return x + y + z ; }  // A simple function
   void AReallyReallyComplicatedFunction ( int a, int b ) ;  // Too complicated to put here.
 } ;

In MyClass.cpp

#include "main.h"
 
void MyClass::AReallyReallyComplicatedFunction ( int a, int b )
{
  ...lots and lots of C++ code...
}

In main.h

 #include <stdio.h>
 #include <stdlib.h>
 #include <iostream>
 #include <iomanip>
 #include <fstream>
  
 #include "MyClass.h"

In main.cpp

#include "main.h"
 
int main ( int argc, char **argv )
{
  MyClass a ;
  a.AReallyReallyComplicatedFunction ( 1, 2 ) ;
  std::cout << a.GetSum () ;
}

So here we have the "header file" called MyClass.h - which contains the 'declaration of MyClass - and all of the really simple functions (maybe everything under 5 or so lines long is a good rule of thumb). Every ".cpp" file that implements or uses MyClass has to #include this header file. For the very simplest classes, you may not need a ".cpp" file - in which case you can just leave it out.

Each "MyClass.cpp" file must #include (at a minimum) the MyClass.h file so that it can 'see' the class definition at compile time...but it also has to #include the header files for all of the classes that it references (including system headers for classes such as I/O, math, etc). However, in a complicated program with dozens to hundreds of classes, it can get really hard to remember all of the header files you need - and there is always a half dozen system header files to include (things like 'iostream' that declares std::cout for example). Hence, a common thing is to make a 'main.h' header that includes all of the other headers - so you only have to remember to stick a '#include "main.h"' at the top of every .cpp file - and you're good to go.

Then we have MyClass.cpp which contains the implementation of the function "AReallyReallyComplicatedFunction" - and we have "main.cpp" that contains our main program. Usually, this file is named after the program itself rather than "main" - so if this is MyFirstVideoGame then we'd probably call the files "MyFirstVideoGame.h" and "MyFirstVideoGame.cpp".

Why separate declaration from implementation?

The reason we separate out the big functions of a class into a separate ".cpp" file rather than just sticking the whole thing into one gigantic file (Java-style) is two-fold:

  1. The header file "MyClass.h" is included into every ".cpp" file that needs it - so if the file is huge, it'll take much longer to compile your program since the compiler has to compile the entire thing many, many times over. When you build programs that take 20 minutes to compile, this is no joke!
  2. When you put the source code for a function inside the class definition (ie, in the header file) - the code is typically "inlined" - meaning that the compile puts a complete copy of that code every place it's called. This produces faster code because it avoids the need to actually "call" the function - the code is inserted right where it's needed. But for long functions (and 5 lines or so is a good rule of thumb) this can make for a VERY large executable program...and that can actually slow things down for arcane reasons relating to memory cacheing. Also, in a large function, the overhead in putting just one copy someplace and calling it each time it's needed is typically negligable compared to the time the function takes to execute. So the speedup to be gained from "inline" is small.

When to #include header files

I have suggested (above) that you collect together all of the header files you need and stick them into "main.h". This is a considerable convenience for the programmer in that you'll never forget which header files to include - or in what order they need to be. (In C++ you must declare something before you use it - so if MyOtherClass.h uses MyClass.h then you must #include the MyClass.h file BEFORE MyOtherClass.h).

However, in very large projects, you can end up with a LOT of header files. In my current project there are 2,415 of them! If every program includes "main.h" and if that header includes 2,415 other header files then the compiler has to open, read, compile and close 2,415 files for each ".cpp" file I compile. Since I have 1,257 ".cpp" files, that would mean that to compile my project from scratch, it would have to open, read and compile something over 3 million files! This is S-L-O-W!

Hence, one eventually has to split the project into chunks - and recognize that each chunk shares a lot of declarations - but most of those may not be needed outside of that chunk. Hence, the graphics chunk of my project might make two header files "graphics.h" and "graphicsPrivate.h". The "graphicsPrivate.h" file contains #include's for all of the graphics header files (a couple of dozen of them) - but "graphics.h" includes only the few 'interface' class headers that things outside of the graphics system care about. So all of the graphics ".cpp" files start with #include "graphicsPrivate." and perhaps #include "physics.h" (etc) - but all of the physics ".cpp" files start with #include "physicsPrivate.h" and #include "graphics.h".

This is messy and causes no end of grief - but it keeps compilation times within more reasonable bounds...so it's a practical compromise between computer time wastage and programmer brain time wastage.

But for projects with only a dozen or two classes - the practice of keeping it simple and stuffing all of the #include's into a single "main.h" makes a lot of sense.

Multiple inclusions

When the system gets more complicated, there is a near certainty that this complex layering of include files will eventually result in the same header file being included more than once. In the above case, we might have "MatrixMath.h" which is needed by both the physics and the graphics system...hence it's #included in "graphics.h", "graphicsPrivate.h" and "physics.h" and "physicsPrivate.h" - so when a program #includes "graphics.h" and "physics.h", the "MatrixMath.h" file gets pulled in twice. This typically results in compiler errors as the same class is defined two or more times.

The pragmatic solution to this is to always start your header files with:

 #ifndef _MYCLASS_H_
 #define _MYCLASS_H_  1
  
 class MyClass
 {
   ...whatever...
 } ;
  
 #endif

...the "#ifndef...#endif" part says "only compile this stuff if the symbol "_MYCLASS_H_" is NOT defined. The second line goes and defines that symbol. Hence, the first time the file is #include'd into a particular ".cpp", the symbol will be undefined and the compiler will compile the declaration of MyClass. In so doing, it'll also #define the symbol "_MYCLASS_H_" to "1". The second and subsequent times you #include this header, the symbol will be defined and the compiler will rapidly skip the entire contents of the file without recompiling it. The leading and trailing underscores are an effort to ensure that this symbol is kinda unique looking - and isn't going to get inadvertantly used for something else.

It's a bit of a kludge - but it's an almost universal one in complex systems.

Makefiles

The Makefile for a multi-file C++ program is a little more complex. To make the program "myprog" which has the "main" function in "myprog.h/.cpp", and classes "trig.h/.cpp", "geom.h/.cpp" and "utils.h/.cpp", you'd need:

 # Object files:
 OBJ = myprog.o trig.o geom.o utils.o
 # Header files:
 HDR = myprog.h trig.h geom.h utils.h
 # Libraries that we need to link to:
 LIBS = -lGL -lX11 -lm
 # One rule to invoke what we need:
 all : myprog
 # C++ 'compile' command:
 CPP = g++ -c -o $@ $<
 LINK= g++ -o $@ $< ${LIBS}
 # Generic rules
 %.o : %.cpp ${HDRS}
      ${CPP}
 # The final step:
 myprog : ${OBJ}
      ${LINK}

This Makefile says that each ".cpp" file depends on all of the files listed in the HDR line. That's true when we do the simple thing of including all of the class headers into myprog.h and then including myprog.h into every .cpp file. However, in more complex examples (like the "graphics.h" and "graphicsPrivate.h" example above), the Makefile can get crazily complicated.

When the Makefile becomes too complex to maintain, it's best to look into more sophisticated tools such as "AutoMake" that constructs a Makefile on-the-fly using the results of the compilation of your code to figure out which source files depend on which header files. However, that's a much more complex matter and is well beyond what can be explained here.

# Object files:
OBJ = project3.o Tokenizer.o
# Header files:
HDR = Tokenizer.h
# Libraries that we need to link to:
LIBS = -lGL -lX11 -lm
# One rule to invoke what we need:
all : project3
# C++ 'compile' command:
CPP = g++ -c -o $@ $<
LINK= g++ -o $@ ${LIBS}
# Generic rules
%.o : %.cpp ${HDRS}
       ${CPP}
# The final step:
project3 : ${OBJ}
       ${LINK} ${OBJ}