C language is an old-school programming language, learned by almost all professional programmers. Still, it never failed to surprise me each time I dig in a little deeper, as it’s full of small details, some hardly noticed, such as this one I recently discovered by accident.
Consider the following two C files:
foo.c:
1 | void foo(int c, int d, int e) |
main.c:
1 | int foo(char c); |
At first sight, you’d probably laugh and think: “What the heck is this? There are some very elementary mistakes that a CS101 student wouldn’t even make. They definitely wouldn’t compile.”
Is it really so?
Try the following command to compile them:
1 | $ gcc hello.c foo.c -o hello |
Or if you’re an LLVM fan:
1 | $ clang hello.c foo.c -o hello |
How do they complain?
None. I’ve tested this on my Linux Ubuntu machine with gcc-4.8, gcc-4.9, and clang-3.5. None of them complained a thing. They got successfully compiled!
Surprise? Not really. If you’re a expert in C and how compiler works, you’d think it’s quite normal. Well, I’m not. So I was quite astonished when I first saw this.
Why would this happen?
Well. Simply put, it’s because C linkers don’t do type checking for functions. C files are first compiled into object files, exposing external symbol names for the linker. In this particular case, main.c
exposes main
function definition and foo
function declaration, and foo.c
exposes foo
function definition. When the C linker notice foo
is only a declaration in main.c
, it would search for its definitions in all the externally exposed symbols in all object files, and it finds a hit in the object file that foo.c
compiles to. As the function symbol in the object file records function names only, no return type or parameter type checking is done. The linker happily accepts this unmatching foo
as a match and use it in main function.
Somehow, C++ does name mangling to preserve function types and any type unmatching for functions could be avoided. This won’t compile for any C++ compilers. Try g++ or clang++. Some other compilers or IDEs with static checkings may also notice this error.
So, what would happen if you actually run it?
I got the following results in one run:
1 | The param is 100, -1549285384, -1549285368 |
It’s easy to understand the 100 output. The following two shall be the value of d and e. And as main
doesn’t pass any parameters, the foo
function will happily read whatever on the program stack where these two parameters should be. And in this case, it shall be garbage.
And what about that 43 returned from the foo
function? That doesn’t look like garbage. Actually if you run this broken piece of program for enough times, you’ll notice this value is always somewhat around 30~50. So this mysterious number could be something more than garbage. Is it the meaning to your life? No, that’s 42. Is it something on the wall of Sheldon’s secret room? Probably.
So what is it exactly?
After poking around in gdb for a while, I confirmed my guess that this is actually the return value of printf
inside of foo
. As on x86 machines, most of the time C program uses eax
register to carry return values, main
function loads a
‘s value from eax
when it tries to read the return value of foo
. As foo
has void return type, this register is untampered after return of printf
inside foo
, and saved directly to integer a
in main
.
The following is the dump inside gdb. This time I have 31 as foo
‘s return value.
1 | Run till exit from #0 __printf (format=0x400637 "The param is %d, %d, %d\n") at printf.c:32 |
We all know the return value of printf
is the number of characters written to the stdout. So, the mysterious return value is actually the number of characters printed out in the first sentence. You can count to confirm, and don’t forget the return character.
Afterthoughts
I remember someone joked that C is but a high-level syntax sugar for assembly. Now it looks to me that it’s also low level enough that it exposes lots of features in assembly. It’s never an easy task to understand all these very little details of C, as well as any other languages, but it’s probably a must if one wishes to become a qualified programmer in it.