Understanding the C Tool Chain

I find that in today’s age of IDEs like Visual Studio, Eclipse, Code Blocks, etc., that many developers get hung up on linker issues. This is completely understandable as many languages have moved away from the compiler/linker paradigm toward single compilation units or virtual machine code. Also, these modern development environments hide many of the steps that it takes to get a working executable out of working code. The result is that most of the C developers I run into are very familiar and are even experts in the C language itself, but unexpectedly struggle when presented with a build configuration issue. For that reason, I would like to explain the C Tool Chain from my perspective. There are many other resources out there on this topic, but my hope is that some people would benefit from my explanation.

Let’s start with the compiler. They come in many different names and forms. For example “cl” is the C/C++ compiler for Windows. You probably never knew what it was called because Visual Studio calls it for you. Don’t be insulted, you are in good company. On Linux systems it is “gcc”. They do about the same thing. Their output format is close (they both create x86 or x86_64 bit code). You also have the up and coming Clang Compiler called “clang”. Clang runs on both Window and Linux. You may have also heard of MinGW. MinGW is the Linux “gcc” built for windows (with some extra Linux stuff). If you are developing for an embedded system you probably have used the Tasking compiler or have had to cross compile code for Arm.

The high level idea behind compilers is pretty simple. You put a C file in and then an object file pops out. There are some things you probably have to configure first, but I’ll leave that for a different blog post. But what is the object file? Can you run it? The answer is no, you can’t run it. The object file is full of executable byte code, but it is not a completed executable.

Let’s look at the Hello World program. In the “main” function you are calling the “printf” function to write text to “stdout” (Standard Output). Note, that you are not writing the “printf” function here. The “printf” function lives someplace else. Most programmers would tell you that you need to add “#include <stdio.h>” to the top of your file to get access to the “printf” function. That is not true, but it is good practice. Try it out yourself some time, you will need to ignore warnings. So “printf” does not live in the “stdio.h” include file. Therefore the compiler cannot build you a run-able file with the information you have given it. Instead it does the next best thing. It leaves a note for whoever gets the file next saying “Hey, they wanted to call this function named ’printf’ but I have no idea what address ‘printf’ is at. If you find out, here is where the code needs to be updated. Good Luck”. Every object file has a list of functions and global variables that it can’t find the address for. Every object file also has a list of functions and global variables that it has the address for. These lists are called a symbol table.

Now, let’s look at the linker. The input to the linker is a list of object files, and the linker outputs an executable. At this point you can probably figure out what the linker is doing. It is building a huge list of symbols from all the object files. Then it is piecing all the byte code together and is updating all the address of all the previously unknown symbols. Finally, it puts all of that into a container that your operating system can run.

“But wait,” I can hear you ask, “What about the ‘printf’ function in the hello world example. I didn’t provide an object file with that definition to the linker?”. You are absolutely right! You did not. This is where things get a bit tricky. The linker knows some things automatically based on the system it is installed on. It, by default, will always link against the operating systems objects. This is why your object file will be small, but for some reason the final executable is strangely large. The linker is secretly throwing in a bunch of code from your operating system and the standard library (maybe I will talk about this in another blog post).

Hopefully you found this information useful. I suggest looking for some diagrams for this process on the internet. I also suggest building a 2 C file project on your own, without the aid of Visual Studio or Make. You should be able to get it done in no more than 3 commands.