Background Information
Compilation
Consider the compilation process for a C file. The file gets parsed through your compiler of choice, which is then compiled down to assembly. The assembler then takes over, creating an object file. A linker then steps in to link the object file against other binaries to create your program! But how does the linker actually work?
Linker Basics
Lets say we have two files, print.c
and call_print.c
:
print.c
1 #include <stdio.h>
2
3 void print_me() {
4 printf("meow!\n");
5 }
call_print.c
1 void print_me();
2
3 int main() {
4 print_me();
5 }
If we were to compile call_print.c by itself, we end up with a linker error:
$ gcc call_print.c
/usr/bin/ld: /tmp/cc5dtrRn.o: in function ``main':
call_print.c:(.text+0x5): undefined reference to ``print_me'
collect2: error: ld returned 1 exit status
This is obvious: we try to call a function that has a declaration that it exists,
but there is no definition for print_me
in call_print.c
. The declaration only
exists in print.c
. As a result, we are trying to call code that doesn’t exist! Lets try
to compile this time with print.c
as an argument for gcc.
~/temp/linking ❯ gcc print.c call_print.c
~/temp/linking ❯ ./a.out
meow!
It compiles and links properly! So what actaully happened in between? We can take a
closer look by telling gcc to compile both print.c
and call_print.c
as object files
using the -c
flag.
~/temp/linking ❯ gcc -c print.c call_print.c
~/temp/linking ❯ ls
call_print.c call_print.o print.c print.o
We can see that we have two new files: call_print.o
and print.o
. These are the object
files of their respective names. Lets take a look inside them to see what they hold.
Whats inside an object file?
We will be using objdump to view the contents of the symbol table with each object file! We need to specify what we want to read from the file, which is the contents of the symbol table. This can be achieved with the -t flag:
-t, –syms: Display the contents of the symbol table(s)
~/temp/linking ❯ objdump -t print.o git 01:20:02 AM
print.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 print.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000000 g F .text 0000000000000016 print_me
0000000000000000 *UND* 0000000000000000 puts
~/temp/linking ❯ objdump -t call_print.o git 01:20:13 AM
call_print.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 call_print.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 g F .text 0000000000000010 main
0000000000000000 *UND* 0000000000000000 print_me
Lets unpack this. Lets first look at the objdump spec for symbol entires:
The first column is the symbol’s value (or address offset!).
The second column (l, g) implies if the current symbol is (l)ocal or (g)lobal.
The third column (d) implies that its a (d)ebugging symbol.
The fourth column (f, F) implies if the symbol is a (F)unction or a (f)ile.
The fifth column (ABS, UND, .text) implies if the section is absolute (ie not connected with any section), or UND if the section is referenced in the file being dumped, but not defined there.
Using this, lets then look at call_print.o
’s symbols: We can see that there is a symbol
called print_me
that is called! We can then determine that it is a unknown symbol that is
referenced in call_print.o
but has no definition there. If we then look at print.o
,
we find that there is a (g)lobal (F)unction called print_me. We can thus infer that
print_me.o
creates a temporary header that “promises” the existance of a undefined
symbol somewhere else called print_me
. The linker’s job is to then find a symbol that
has a matching defined symbol elsewhere to determine what type of symbol it is to link
the two object files together!! Isn’t that cool??
References
https://stackoverflow.com/questions/6666805/what-does-each-column-of-objdumps-symbol-table-mean
https://sourceware.org/binutils/docs/binutils/objdump.html#index-symbol-table-entries_002c-printing