Compilation Steps and Memory Layout of the C Program

If you are a fresher to the Embedded, Obviously you have been asked this question. What are the stages of compilation process and Memory Layout of the program. So in this tutorial we are going to discuss Compilation Steps and Memory Layout of the C Program.

Compilation Steps and Memory Layout of the C Program

Compilation Steps of C

Normally the Compiling a C program is a multi-stage process and utilizes different ‘tools’.

  1. Preprocessing
  2. Compilation
  3. Assembly
  4. Linking

In this post, I’ll walk through each of the four stages of compiling the following C program:

Preprocessing

The first stage of compilation is called preprocessing. In this stage, lines starting with a # character are interpreted by the preprocessor as preprocessor commands.  Before interpreting commands, the preprocessor does some initial processing. This includes joining continued lines (lines ending with a \) and stripping comments.

To print the result of the preprocessing stage, pass the -E option to cc:

cc -E hello_world.c

Given the “Hello, World!” example above, the preprocessor will produce the contents of the stdio.h header file joined with the contents of thehello_world.c file, stripped free from its leading comment:

Compilation

The second stage of compilation is confusingly enough called compilation. In this stage, the preprocessed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human readable language.

The existence of this step allows for C code to contain inline assembly instructions and for different assemblers to be used.

Some compilers also supports the use of an integrated assembler, in which the compilation stage generates machine code directly, avoiding the overhead of generating the intermediate assembly instructions and invoking the assembler.

To save the result of the compilation stage, pass the -S option to cc:

cc -S hello_world.c

This will create a file named hello_world.s, containing the generated assembly instructions. On Mac OS 10.10.4, where cc is an alias for clang, the following output is generated:

Assembly

During the assembly stage, an assembler is used to translate the assembly instructions to machine code, or object code. The output consists of actual instructions to be run by the target processor.

To save the result of the assembly stage, pass the -c option to cc:

cc -c hello_world.c

Running the above command will create a file named hello_world.o, containing the object code of the program. The contents of this file is in a binary format and can be inspected using hexdump or od by running either one of the following commands:

Linking

The object code generated in the assembly stage is composed of machine instructions that the processor understands but some pieces of the program are out of order or missing. To produce an executable program, the existing pieces have to be rearranged and the missing ones filled in. This process is called linking.

The linker will arrange the pieces of object code so that functions in some pieces can successfully call functions in other pieces. It will also add pieces containing the instructions for library functions used by the program. In the case of the “Hello, World!” program, the linker will add the object code for the puts function.

The result of this stage is the final executable program. When run without options, cc will name this file a.out. To name the file something else, pass the -ooption to cc:

cc -o hello_world hello_world.c

For your quick reference :

Compilation-Steps Compilation Steps and Memory Layout of the C Program

Memory Layout of the C Program

When you run any C-program, its executable image is loaded into RAM of  computer in an organized manner which is called process address space or Memory layout of C program.
This memory layout is organized in following fashion:

  1. Text or Code Segment
  2. Initialized Data Segments
  3. Uninitialized Data Segments (bss)
  4. Stack Segment
  5.  Heap Segment
  6. Unmapped or Reserved Segment

memory-layout Compilation Steps and Memory Layout of the C Program

Text or Code Segment

Code segment, also known as text segment contains machine code of the compiled program. The text segment of an executable object file is often read-only segment that prevents a program from being accidentally modified. It will be .bin or .exe or .hex etc.

As a memory region, a text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.

Data Segments

Data segment stores program data. This data could be in form of initialized or uninitialized variables, and it could be local or global. Data segment is further divided into four sub-data segments (initialized data segment, uninitialized or .bss data segment, stack, and heap) to store variables depending upon if they are local or global, and initialized or uninitialized.

Initialized Data Segment

Initialized data or simply data segment stores all global, static, constant, and external variables (declared with extern keyword) that are initialized beforehand.

Note that, data segment is not read-only, since the values of the variables can be altered at run time.

This segment can be further classified into initialized read-only area and initialized read-write area.

All global, static and external variables are stored in initialized read-write memory except const variable.

Uninitialized Data Segment

Uninitialized data segment is also called BSS segment. BSS stands for ‘Block Started by Symbol’ named after an ancient assembler operator. Uninitialized data segment contains all global and static variables that are initialized to zero or do not have explicit initialization in source code.

Stack Segment

Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller’s environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn’t interfere with the variables from another instance of the function.

So Stack frame contain some data like return address, arguments passed to it, local variables, and any other information needed by the invoked function.

 A “stack pointer (SP)” keeps track of stack by each push & pop operation onto it, by adjusted stack pointer to next or previous address.

The stack area traditionally adjoined the heap area and grew the opposite direction; when the stack pointer met the heap pointer, free memory was exhausted. (With modern large address spaces and virtual memory techniques they may be placed almost anywhere, but they still typically grow opposite directions.)

Heap Segment

Heap is the segment where dynamic memory allocation usually takes place.

The heap area begins at the end of the BSS segment and grows to larger addresses from there.The Heap area is managed by malloc, realloc, and free, which may use the brk and sbrk system calls to adjust its size (note that the use of brk/sbrk and a single “heap area” is not required to fulfill the contract of malloc/realloc/free; they may also be implemented using mmap to reserve potentially non-contiguous regions of virtual memory into the process’ virtual address space). The Heap area is shared by all shared libraries and dynamically loaded modules in a process.

Unmapped or reserved segment

Unmapped or reserved segment contain command line arguments and other program related data like lower address-higher address of executable image, etc.

Just see the below example. I will tell you the Memory layout using Practical Example.

Example

Step 1:

We will see the memory layout of the below program.

Compile and Check memory

Step 2:

Let us add one global variable in program, now check the size of bss.

Compile and Check Memory

Step 3:

Let us add one static variable which is also stored in bss.

Compile and Check

Step 4:

Let us initialize the static variable to non zero which will then be stored in Initialized Data Segment (DS).

Compile and Check

Step 5:

Let us initialize the global variable to non zero which will then be stored in Initialized Data Segment (DS).

Compile and Check

Last Word

In this tutorial we talked about steps involved in compilation and memory layout of a C program, and its various segments (text or code segment, data, .bss segments, stack and heap segments). Hope you have enjoyed reading this article. Thanks for reading!

%d bloggers like this: