您现在的位置是:首页 >技术交流 >通过例子深入了解c++/c的构建系统网站首页技术交流

通过例子深入了解c++/c的构建系统

张博208 2023-04-26 11:53:59
简介通过例子深入了解c++/c的构建系统

Understanding C/C++ Build system by building a simple project

C/C++ is the mother of many popular programming languages out there today, all the fancy programming languages we use today like Python, JavaScript are built using C/C++. For example, the standard python interpreter CPython is built using C and the most popular JavaScript implementation V8 is built using C/C++, C/C++ also powers most of the underlying libraries used by Node.js, In other words, C/C++ powers most of the open source software ever written by humans. One of the main reasons we prefer high level languages like Python is because of the robust package management tools they provide, we don't have to worry about managing dependencies anymore pip automatically manages it for us. Same case holds true for JavaScript as well. These languages also have robust build systems which allow us to build and ship the software more easily.

C/C++ also has few popular build systems like cmake and bazel which manages dependencies automatically, but in this post, we will be compiling a C/C++ project without making use of these tools in order to understand how things work internally.

We will be building a simple system logger that logs total free RAM memory of the system every 5 seconds.

First things first! Let's create a project structure:

Project structure has to be easily understandable and should isolate different functionalities as much as possible to avoid confusion. No one will ever stop us from using our own project structure, but most of the open source projects built with C/C++ use this structure :

project_root
  - include
  - src
      - module-1
      - module-2
      - module-n
      - main.c/main.cc (depends on the project)
  - Makefile
  - README
  - LICENSE
  - misc files

Let's have a look at what each and every file/directory means:

  1. include - This is the place where all our header files live.
  2. src - The directory that contains all our source code. We can have multiple sub-directories/modules inside src. Also we can have a main function file inside src.
  3. Makefile : Makefiles are used by make command, we will be using make to build our project.

For our project we will have the following structure:

memlogger
   - bin  - will explain the need for this
   - include
       - free_memory_api.h
       - file_writer.h
   - src
       - free_memory_api
          - free_memory_api.c
       - file_writer
          - file_writer.c
       - main.c
   - Makefile

Let's Code

We can start writing code once we are done with the project structure setup. I will not be explaining the code in depth to avoid writing very long post, but we will stress more on the concepts.

What are header files?

Header files are blueprints of our actual C code. For every C module we write, it is a good practice to export the header-file. These header files are used by the compiler to understand what all functions are exported by a module. Once compilation is done, header files are not used anywhere. The actual use of header files comes into picture when our project/module is used as a module in some other project, other programmers can simply include our header file to use the function declarations we exported.

Let us create free_memory_api.h as per the structure :

#ifndef __FREE_MEMORY_API
#define __FREE_MEMORY_API
//this is our API function which returns free memory in bytes
unsigned long long get_free_system_memory();
#endif

Let us create file_writer.h which declares file writer API

#ifndef __FILE_WRITER_API
#define __FILE_WRITER_API

#include <stdio.h>
//opens the log file for writing
FILE * open_log_file(char * path);
// we will use this function to write contents to the log file
void write_log_to_file(FILE * file, unsigned long long free_memory);
//closes the log file
void close_log_file(FILE * file);
#endif

Let's define these APIs:

We declared what all APIs we need, but we did not write the underlying code for those APIs. We will be writing the code for file logging and getting free memory from the system. Before writing the code, we have to import the blueprint we declared before.
file_writer.c

#include "file_writer.h"

// Open the log-file in append mode and return it
FILE * open_log_file(char * file_path) {
    FILE * fp = fopen(file_path, "a");
    return fp;
}

// Close the file 
void close_log_file(FILE * fp) {
    if(fp) {
        fclose(fp);
    }
}

//write log entry into the file
void write_log_to_file(FILE * fp, unsigned long long free_memory) {
    if(fp) {
        fprintf(fp, "free_memory=%llu
", free_memory);
    }
}

Now let us define the free memory api i.e free_memory_api.c:

#include <sys/sysinfo.h>
#include "free_memory_api.h"

unsigned long long get_free_system_memory() {
    struct sysinfo info;
    if (sysinfo(&info) < 0) {
         return 0;
    }
    return info.freeram;;
}

And finally, main.c

#include "file_writer.h"
#include "free_memory_api.h"

#include <unistd.h>

int main(int argc, char **argv) {
    if (argc < 2) {
        printf("Provide log file name
");
        return 0;
    } 

    unsigned long long free_memory = 0;
    while(1) {
        free_memory = get_free_system_memory();
        FILE * log = open_log_file(argv[1]);
        write_log_to_file(log, free_memory);
        close_log_file(log);
        sleep(5);
    }
}

Let's start building the project

Now that we have written the code, it is time to compile the project. Now we have multiple modules in our project. These modules can be linked together to build a standalone executable, or we can build individual modules alone as shared libraries and link them together at runtime.

Building a static-monolithic executable

In this section, we build a single binary that can be shipped, there are many ways we can build a C/C++ project. In this post we will only build a standalone executable which is the most easiest way of building a C project.
We make use of make a Linux command-line utility that can automate any task, it is a series of shell commands which can be grouped and tagged under a name to perform a specific task. We can write multiple such tasks conveniently using a Makefile. Let's see our Makefile now.

COMPILER=gcc

file_writer:
    @$(COMPILER) -c src/file_writer/*.c -Iinclude/ -o bin/file_writer.o
    @echo "Built file_writer.o"

free_memory_api:
    @$(COMPILER) -c src/free_memory_api/*.c -Iinclude/ -o bin/free_memory_api.o 
    @echo "Built free_memory_api.o"

project:
    $(COMPILER) -c src/main.c -Iinclude/ -o bin/main.o
    @$(COMPILER) bin/free_memory_api.o bin/file_writer.o bin/main.o -o memlogger
    @echo "Finished building memlogger"

Even though we can build the entire project in a single command, I have divided this into three phases.

  1. file_writer : This make rule will generate file_writer.o object file under ./bin.
  2. free_memory_api : This rule generates free_memory_api.o under ./bin.
  3. project : This builds the entire project,it generates main.o and links main.o with other two object files to create a standalone executable called memlogger.

Let's execute these commands with make:
Step-1. file_writer.o

   make file_writer

Step-2. free_memory_api.o

   make free_memory_api

Step-3. Final binary:

   make project

Executing the binary:

We can execute the binary by running it like a normal linux executable:

memlogger logs.txt

After 15 seconds, we see 3 entries in the log file:

free_memory=8322523136
free_memory=8330776576
free_memory=8335728640

Which is approximately 8GB free out of 16GB RAM and it is correct. hurray! we created a small system logger.

What happened under the hood??

It is important to understand the build process we used here. To understand that, we need to know the concept of object files.

What we did in the Makefile??

We defined three rules, each rule builds a module, the third rule goes ahead by one step and links all the three modules.
We used gcc compiler, (g++ for C++ projects). Options used:

  1. -c : This tells the compiler to only compile and don't perform linking, since we are explicitly linking the object files in Step-3.
  2. -I : Since we defined our own headers, we have to provide it to the compiler during compile time, by default compiler searches for these headers in standard locations, we can also tell the compiler to include our custom location for resolving the headers using -I.
  3. -o : The output file name.

What are object files?

An object file is a linux ELF - Executable and linkable format binary produced by the compiler. ELF is designed by POSIX standards and all Linux distributions can understand what an object file is. What this object file contains in layman terms is a mapping table and symbol definitions.

  1. mapping-table or Symbol table : The mapping table contains a set of symbols defined by the object file and an offset in text segment where actual code for the symbols are defined.
  2. Symbol definitions : This section contains machine code of all our functions. So, to get the machine code of a function/symbol we do these two steps: First, we lookup the symbol table of the object file to get its offset in the text segment. Second, we go to the text segment and obtain its code.

These are just layman terms, ELF has standard definitions for mapping-table and Segment definitions which are confusing for a beginner.

Let's see the object file contents of file_writer.o, we use a tool called readelf which is default in all the linux systems.

readelf -h bin/file_writer.o

Output:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          1168 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         13
  Section header string table index: 12

These are ELF headers. Now let's see the Symbol table (or mapping table in our terms)

readelf --syms bin/file_writer.o

Output:

Symbol table '.symtab' contains 16 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file_writer.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     9: 0000000000000000    41 FUNC    GLOBAL DEFAULT    1 open_log_file
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fopen
    12: 0000000000000029    34 FUNC    GLOBAL DEFAULT    1 close_log_file
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fclose
    14: 000000000000004b    54 FUNC    GLOBAL DEFAULT    1 write_log_to_file
    15: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND fprintf

As we can see in the table, we have entries for open_log_fileclose_log_filewrite_log_to_file which are the API functions we defined. Hurray! Our object file is correct. Also, if you observe carefully, we see the presence of fclosefopen and fprintf and they are prefixed with UND which means these symbol addresses are not known yet, but C/C++ runtime resolves them during linking which is in step-3, it either links to these functions statically or dynamically during the runtime, we will see the concept of shared libraries in the next part.

Similarly, we can run the same command for free_memory_api.o to see it's symbol table. We get the output as :

Symbol table '.symtab' contains 14 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS free_memory_api.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    8 
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     9: 0000000000000000    89 FUNC    GLOBAL DEFAULT    1 get_free_system_memory
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND sysinfo
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __stack_chk_fail

We can see get_free_system_memory as a FUNC and sysinfo which is undefined.

What we did in the final step?

In the first two steps, we compiled the modules and generated the object files, but they can't be executed because they don't have the main function definition which is the entry-point of any C/C++ program. We have two commands in Makefile under project rule (step-3), the first command only compiles the main.c file into main.o, let's try to run it, it should run because it has main function.

./main.o

Output:

bash: ./bin/main.o: cannot execute binary file: Exec format error

We cannot run it because it is not an executable file, it is an object file, the final step is still remaining which links all the three object files and generate the final executable binary.
Before that we will try to link only main.o and discard the remaining two modules, let's see what happens:

gcc bin/main.o -o memlogger

Output:

bin/main.o: In function `main':
main.c:(.text+0x36): undefined reference to `get_free_system_memory'
main.c:(.text+0x4d): undefined reference to `open_log_file'
main.c:(.text+0x64): undefined reference to `write_log_to_file'
main.c:(.text+0x70): undefined reference to `close_log_file'
collect2: error: ld returned 1 exit status

This is exactly what was supposed to happen, the executable file needs the following functions but don't know where they are. So we need to link it with remaining two object files.

gcc bin/free_memory_api.o bin/file_writer.o bin/main.o -o memlogger

Now the compiler looks into the Symbol tables of file_writer.o and free_memory_api.o to resolve the functions which were undefined in our previous command. Since the Symbol tables of these two object files defines those symbols/functions the linking is successful and the final executable is generated.

Let's see the mapping table or Symbol table of our memlogger binary:

readelf --syms memlogger

Output:

    26: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    27: 0000000000000780     0 FUNC    LOCAL  DEFAULT   14 deregister_tm_clones
    28: 00000000000007c0     0 FUNC    LOCAL  DEFAULT   14 register_tm_clones
    29: 0000000000000810     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
    30: 0000000000201010     1 OBJECT  LOCAL  DEFAULT   24 completed.7698
    31: 0000000000200d88     0 OBJECT  LOCAL  DEFAULT   20 __do_global_dtors_aux_fin
    32: 0000000000000850     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    33: 0000000000200d80     0 OBJECT  LOCAL  DEFAULT   19 __frame_dummy_init_array_
    34: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS free_memory_api.c
    35: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file_writer.c
    36: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
    37: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    38: 0000000000000c5c     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    39: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
    40: 0000000000200d88     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_end
    41: 0000000000200d90     0 OBJECT  LOCAL  DEFAULT   21 _DYNAMIC
    42: 0000000000200d80     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_start
    43: 0000000000000a78     0 NOTYPE  LOCAL  DEFAULT   17 __GNU_EH_FRAME_HDR
    44: 0000000000200f80     0 OBJECT  LOCAL  DEFAULT   22 _GLOBAL_OFFSET_TABLE_
    45: 0000000000000a30     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
    46: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
    47: 0000000000201000     0 NOTYPE  WEAK   DEFAULT   23 data_start
    48: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@@GLIBC_2.2.5
    49: 0000000000201010     0 NOTYPE  GLOBAL DEFAULT   23 _edata
    50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fclose@@GLIBC_2.2.5
    51: 0000000000000a34     0 FUNC    GLOBAL DEFAULT   15 _fini
    52: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@@GLIBC_2
    53: 00000000000008dc    34 FUNC    GLOBAL DEFAULT   14 close_log_file
    54: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2.5
    55: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    56: 0000000000201000     0 NOTYPE  GLOBAL DEFAULT   23 __data_start
    57: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fprintf@@GLIBC_2.2.5
    58: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    59: 0000000000201008     0 OBJECT  GLOBAL HIDDEN    23 __dso_handle
    60: 0000000000000a40     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
    61: 000000000000085a    89 FUNC    GLOBAL DEFAULT   14 get_free_system_memory
    62: 00000000000009c0   101 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    63: 0000000000201018     0 NOTYPE  GLOBAL DEFAULT   24 _end
    64: 0000000000000750    43 FUNC    GLOBAL DEFAULT   14 _start
    65: 0000000000201010     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
    66: 0000000000000934   130 FUNC    GLOBAL DEFAULT   14 main
    67: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fopen@@GLIBC_2.2.5
    68: 00000000000008fe    54 FUNC    GLOBAL DEFAULT   14 write_log_to_file
    69: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND sysinfo@@GLIBC_2.2.5
    70: 0000000000201010     0 OBJECT  GLOBAL HIDDEN    23 __TMC_END__
    71: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    72: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND sleep@@GLIBC_2.2.5
    73: 00000000000008b3    41 FUNC    GLOBAL DEFAULT   14 open_log_file
    74: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@@GLIBC_2.2
    75: 0000000000000698     0 FUNC    GLOBAL DEFAULT   11 _init

Since the Symbol table is very big, I have pasted entries starting from 26. As you can see our final executable has all our definitions from all the three modules and also there are no UND symbols, these symbols are now replaced with something like fopen@@GLIBC_2.2.5. This means, the code for these functions are not copied into our binary, instead they have to be resolved during the runtime, it is the responsibility of Linux loader ld.so to link these symbols dynamically during runtime.

So this is it! We are done with the first part of the Post. If you are reading this line. I would like to really thank you for reading the entire Post, keep the compliments even if you skipped everything and came here directly.

Thank you :) Have a good time.

风语者!平时喜欢研究各种技术,目前在从事后端开发工作,热爱生活、热爱工作。