Proxy Information
Original URL
gemini://gemini.kaction.cc/log/2022-07-14.1.gmi
Status Code
Success (20)
Meta
text/gemini # Thoughts on separate compilation (part 1) C programming language has concept of separate compilation which means that it is possible to have program source in multiple files, compile them separately and than link them together into final executable file. This way each individual source file have manageable size and only files changed need to be recompiled. But it comes at price of losing opportunities for code size and performance optimizations. ## Opportunity missed Here is small example that was inspired by GNU ed editor, version 1.4. Let us start with following "foo.c" file that defined static variable and two functions: ```c #ifndef MY_STATIC #define MY_STATIC #endif static volatile int foo; MY_STATIC int get_foo() { return foo; } MY_STATIC void set_foo(int value) { foo = value; } ``` In GNU ed sources such functions are called in response to user input, but in our case we have to define variable as volatile to prevent compiler from realizing that all manipulations are completely pointless. Macro "MY_STATIC" will allow us to compare difference between static and non-static definitions. Now, we create file "main.c" with following content: ```c int main() { int x = get_foo(); set_foo(x + 12); return get_foo(); } ``` Clearly this is equivalent to plain "return 12;", but since we marked "foo" variable as volatile, compiler is obliged to actually perform two reads and one write, no shortcuts. And now three versions of putting these two parts together: ```c // split.c (link with foo.o) extern int get_foo(); extern void set_foo(int); #include "main.c" // combined-nonstatic.c #include "foo.c" #include "main.c" // combined-static.c #define MY_STATIC static #include "foo.c" #include "main.c" ``` In case of "split.c", when "foo" and "main" are in different translation units, linker is not capable to inline access to "foo" and both "get_foo" and "set_foo" are included into resulting binary. Here is output of "objdump -d". ```objdump 000000000040101f
: 40101f: 50 push %rax 401020: 31 c0 xor %eax,%eax 401022: e8 10 00 00 00 callq 401037 401027: 8d 78 0c lea 0xc(%rax),%edi 40102a: e8 0f 00 00 00 callq 40103e 40102f: 31 c0 xor %eax,%eax 401031: 59 pop %rcx 401032: e9 00 00 00 00 jmpq 401037 0000000000401037 : 401037: 8b 05 c3 2f 00 00 mov 0x2fc3(%rip),%eax # 404000 <__bss_start> 40103d: c3 retq 000000000040103e : 40103e: 89 3d bc 2f 00 00 mov %edi,0x2fbc(%rip) # 404000 <__bss_start> 401044: c3 retq ``` In both static and non-static combined approaches, compiler inlined access to "foo", as can be seen in output of "objdump -d": ``` 000000000040101f
: 40101f: 83 05 da 2f 00 00 0c addl $0xc,0x2fda(%rip) # 404000 <__bss_start> 401026: 8b 05 d4 2f 00 00 mov 0x2fd4(%rip),%eax # 404000 <__bss_start> 40102c: c3 retq ``` When "get_foo" and "set_foo" are static, they are eliminated by compiler; when they are non-static they can be eliminated by linker if you provide necessary flags, it does not happen by default. => ./2021-01-17.1.gmi So, having everything in one translation unit reduces function "main" size from 8 instructions to 3, from 24 bytes to 14, plus eliminates two functions, 7 bytes each for total win of 24 bytes. For some reason, size(1) has different idea and reports difference of 80 bytes. ``` $ size -G combined-nonstatic combined-static split text data bss dec hex filename 855 0 100 955 3bb combined-static 855 0 100 955 3bb combined-nonstatic 935 0 100 1035 40b split ``` I counted bytes in disassembly of whole .text section, difference is exactly 24 bytes. On other hand, total size of binaries differs by 80 bytes, so size(1) definitely has a point: ``` $ stat -c '%s %n' split combined-static combined-nonstatic 9208 split 9128 combined-static 9128 combined-nonstatic ``` These are sizes of static stripped binaries, compiled with dietlibc=0.34 and clang=11.1.0; your mileage may vary. ## Opportunity recovered Now that we know that putting all code into single translation unit can win us dozen bytes, let's think how we can achieve it. These are just ideas, I haven't implemented any of this yet. One doesn't just concatenate all source files together and call it a day due following scenarios: * Definitions of enumeration that are specific to file. This is easy. We can safely rename every enumeration and every name withing it, and things will keep working, since enumerations are essentially integers. * Static functions and static variables having same name in different files. This can be handled by automatic renaming static definitions into something unique. Building AST of pre-processred C file, finding static top-level definitions and renaming all references to it within same file should be reasonably easy. * Definitions of struct and union that are specific to file. This is hard to do cleanly. If we decide to work with pre-processed file, than we won't know whether definition is local to file and should be renamed or it came from header file and should not be renamed. I see two mutually-exclusive approaches here. One is to rename every struct and union definition and rely on fact that C allows implicit cast between incompatible pointers. For example, following snippet will compile (with warning) and work. ``` struct foo { int x; int y; }; struct bar { int x; int y; }; void accept_foo(struct foo*); int main() { struct bar o = { 0 }; accept_foo(&o); return 0; } ``` Functions that accept structures by value won't work, though. It is uncommon, but happens. For example, GNU dbm does it. Another approach is to build AST of concatenated source file and exclude multiple identical struct/union definitions. If multiple definition are not identical, human intervention would be required. It is quite uncommon to have multiple different struct or union definitions, so that approach may be viable. Working with non-preprocessed source file would require re-implementing C preprocessor and still wouldn't handle situation when one definition of structure "foo" is shared between two source files and another definition of structure "foo" is shared between another two source files, so I consider this approach strictly inferior to ones described above. ## Further research Fully stripped static executable discussed in previous section has around 1Kb of code in text section but around 8Kb of disk size. Size of text section is not only size of "main" function, but also size of C runtime -- code that is run between "_start" and "main", so that number is understandable. File size 8 times bigger is quite an overhead, though. => https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
Capsule Response Time
405.5699 milliseconds
Gemini-to-HTML Time
0.007319 milliseconds

This content has been proxied by September (ba2dc).