Get Simple Application Compiled

Let's try to compile simple application of infinite loop, called test_cpp_simple.

A linker script is required to get all the generated objects successfully linked. It states what code/data sections need to be loaded at what addresses as well as defines several symbols that may be required by the sources. Here is a good manual of linker script syntax and here is the linker script I use to get applications linked for Raspberry Pi platform.

Depending on your compiler, the link may fail because some symbols are missing. For example __exidx_start and __exidx_end are needed when the application is compiled with exceptions support, or __bss_start__ and __bss_end__ may be required by standard library if it contains the code for zeroing .bss section.

Every application must have a startup code usually written in Assembler. This startup code must perform the following steps: 1. Write the interrupt vector table at appropriate location (usually at address 0x0000). 1. Set the stack pointers for every runtime mode. 1. Zero the .bss section 1. Call constructors of global (static) objects (applicable only to C++) 1. Call the main function.

It may happen that compiler generates some startup code for you, especially if you haven't excluded standard library (stdlib) from compilation. To check whether this is the case, we need to analyse assembler listing of the successfully compiled and linked image binary. All the generated files for a test application will reside in <build_dir>/src/test_cpp/<app_name>. The assembler listing file will have kernel.list name.

Side note: the assembler listing can be generated using the following command:

arm-none-eabi-objdump -D -S app_binary > app.list

Open the listing file and look for function with CRT string in it. CRT stands for “C Run-Time”. When using this compiler, the function that compiler has generated, is called _mainCRTStartup. Let's take closer look what this function does.

00008198 <_mainCRTStartup>:

Load the address of the end of the RAM and assign its value to stack pointer (sp).

    8198:    e59f30f0     ldr    r3, [pc, #240]    ; 8290 <_mainCRTStartup+0xf8> 
    819c:    e3530000     cmp    r3, #0 
    81a0:    059f30e4     ldreq    r3, [pc, #228]    ; 828c <_mainCRTStartup+0xf4> 
    81a4:    e1a0d003     mov    sp, r3

Set the value of sp for various modes, the sizes of the stacks are determined by the compiler itself.

    81a8:    e10f2000     mrs    r2, CPSR 
    81ac:    e312000f     tst    r2, #15 
    81b0:    0a000015     beq    820c <_mainCRTStartup+0x74> 
    81b4:    e321f0d1     msr    CPSR_c, #209    ; 0xd1 
    81b8:    e1a0d003     mov    sp, r3 
    81bc:    e24daa01     sub    sl, sp, #4096    ; 0x1000 
    81c0:    e1a0300a     mov    r3, sl 
    81c4:    e321f0d7     msr    CPSR_c, #215    ; 0xd7 
    81c8:    e1a0d003     mov    sp, r3 
    81cc:    e2433a01     sub    r3, r3, #4096    ; 0x1000 
    81d0:    e321f0db     msr    CPSR_c, #219    ; 0xdb 
    81d4:    e1a0d003     mov    sp, r3 
    81d8:    e2433a01     sub    r3, r3, #4096    ; 0x1000 
    81dc:    e321f0d2     msr    CPSR_c, #210    ; 0xd2 
    81e0:    e1a0d003     mov    sp, r3 
    81e4:    e2433a02     sub    r3, r3, #8192    ; 0x2000 
    81e8:    e321f0d3     msr    CPSR_c, #211    ; 0xd3 
    81ec:    e1a0d003     mov    sp, r3 
    81f0:    e2433902     sub    r3, r3, #32768    ; 0x8000 
    81f4:    e3c330ff     bic    r3, r3, #255    ; 0xff 
    81f8:    e3c33cff     bic    r3, r3, #65280    ; 0xff00 
    81fc:    e5033004     str    r3, [r3, #-4] 
    8200:    e9532000     ldmdb    r3, {sp}^ 
    8204:    e38220c0     orr    r2, r2, #192    ; 0xc0 
    8208:    e121f002     msr    CPSR_c, r2 
    820c:    e243a801     sub    sl, r3, #65536    ; 0x10000 
    8210:    e3b01000     movs    r1, #0 
    8214:    e1a0b001     mov    fp, r1 
    8218:    e1a07001     mov    r7, r1

Load the addresses of __bss_start__ and __bss_end__ symbols and zero all the area in between.

    821c:    e59f0078     ldr    r0, [pc, #120]    ; 829c <_mainCRTStartup+0x104> 
    8220:    e59f2078     ldr    r2, [pc, #120]    ; 82a0 <_mainCRTStartup+0x108> 
    8224:    e0522000     subs    r2, r2, r0 
    8228:    eb00004a     bl    8358 <memset> 

   ... Then comes some code, purpose of which is not clear

Call the __libc_init_array function provided by standard library which will initialise all the global objects. It will treat the area between __init_array_start and __init_array_end as list of pointers to initialisation functions and call them one by one.

    8278:    eb000014     bl    82d0 <__libc_init_array>

Call the main function.

    8284:    eb000010     bl    82cc <main>

If main function returns for some reason, call the exit function, which probably must be implemented as infinite loop or jumping back to the beginning of the startup code.

    8288:    eb000008     bl    82b0 <exit>

Here comes local data

    828c:    00080000     andeq    r0, r8, r0 
    8290:    04008000     streq    r8, [r0], #-0 
    ... 
    829c:    00008458     andeq    r8, r0, r8, asr r4 
    82a0:    00008474     andeq    r8, r0, r4, ror r4

The only missing stage in the startup process is updating the interrupt vector table. After the latter is updated properly, it is possible to call the provided _mainCRTStartup function. However, if your compiler doesn't provide such function you have no other choice but to write the whole startup code yourself. Here is an example of such code.

Please note, that .bss section by definition contains uninitialised data that must be zeroed at startup. Even if you don't have uninitialised variables in your code, zeroing .bss is a must have operation. This is because compiler might put variables that are explicitly initialised to 0 into the .bss for performance reasons and count on this section being zeroed at startup.

Also note, that pointers to initialisation functions of global variables reside in .init.array section. To initialise your global objects you just iterate over all entries in this section and call them one by one.

To implement the missing stage for use the following assembler instructions:

_entry: 
    ldr pc,reset_handler_ptr        ;@  Processor Reset handler 
    ldr pc,undefined_handler_ptr    ;@  Undefined instruction handler 
    ldr pc,swi_handler_ptr          ;@  Software interrupt 
    ldr pc,prefetch_handler_ptr     ;@  Prefetch/abort handler. 
    ldr pc,data_handler_ptr         ;@  Data abort handler/ 
    ldr pc,unused_handler_ptr       ;@ 
    ldr pc,irq_handler_ptr          ;@  IRQ handler 
    ldr pc,fiq_handler_ptr          ;@  Fast interrupt handler. 

    ;@ Set the branch addresses 
    reset_handler_ptr:      .word reset 
    undefined_handler_ptr:  .word hang 
    swi_handler_ptr:        .word hang 
    prefetch_handler_ptr:   .word hang 
    data_handler_ptr:       .word hang 
    unused_handler_ptr:     .word hang 
    irq_handler_ptr:        .word irq_handler 
    fiq_handler_ptr:        .word hang 

reset: 
    ;@ Disable interrupts 
    cpsid if 

    ;@ Copy interrupt vector to its place 
    ldr r0,=_entry 
    mov r1,#0x0000 

    ;@  Here we copy the branching instructions 
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} 
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} 

    ;@  Here we copy the branching addresses 
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} 
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}

Please note that at interrupt vector table that resides at address 0x0000 contains branch instructions to the appropriate handlers, not just addresses of the handlers. Let's take a closer look how these branching instructions look in our assembler listing file:

_entry: 
    800c:    e59ff018     ldr    pc, [pc, #24]    ; 802c <reset_handler_ptr> 
    8010:    e59ff018     ldr    pc, [pc, #24]    ; 8030 <undefined_handler_ptr> 
    8014:    e59ff018     ldr    pc, [pc, #24]    ; 8034 <swi_handler_ptr> 
    8018:    e59ff018     ldr    pc, [pc, #24]    ; 8038 <prefetch_handler_ptr> 
    801c:    e59ff018     ldr    pc, [pc, #24]    ; 803c <data_handler_ptr> 
    8020:    e59ff018     ldr    pc, [pc, #24]    ; 8040 <unused_handler_ptr> 
    8024:    e59ff018     ldr    pc, [pc, #24]    ; 8044 <irq_handler_ptr> 
    8028:    e59ff018     ldr    pc, [pc, #24]    ; 8048 <fiq_handler_ptr> 

0000802c <reset_handler_ptr>: 
    802c:    0000804c     andeq    r8, r0, ip, asr #32 

00008030 <undefined_handler_ptr>: 
    8030:    000082b4             ; <UNDEFINED> instruction: 0x000082b4 

00008034 <swi_handler_ptr>: 
    8034:    000082b4             ; <UNDEFINED> instruction: 0x000082b4 

00008038 <prefetch_handler_ptr>: 
    8038:    000082b4             ; <UNDEFINED> instruction: 0x000082b4 

0000803c <data_handler_ptr>: 
    803c:    000082b4             ; <UNDEFINED> instruction: 0x000082b4 

00008040 <unused_handler_ptr>: 
    8040:    000082b4             ; <UNDEFINED> instruction: 0x000082b4 

00008044 <irq_handler_ptr>: 
    8044:    000082b8             ; <UNDEFINED> instruction: 0x000082b8 

00008048 <fiq_handler_ptr>: 
    8048:    000082b4             ; <UNDEFINED> instruction: 0x000082b4

The branching instructions load address of the interrupt function to “pc” register. However the address of the function is stored somewhere and compiler generates access to this storage using relative offset to current “pc” register. This is the reason why we have to copy not just the branching instructions, but also the storage area where addresses of interrupt routines are stored:

    ;@ Copy interrupt vector to its place 
    ldr r0,=_entry 
    mov r1,#0x0000 

    ;@  Here we copy the branching instructions 
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} 
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9} 

    ;@  Here we copy the branching addresses 
    ldmia r0!,{r2,r3,r4,r5,r6,r7,r8,r9} 
    stmia r1!,{r2,r3,r4,r5,r6,r7,r8,r9}

Last updated