Tail Call Optimization


Tail Call Optimization is an optimization strategy used by compiler to generate code in which subroutine/function call is done without adding stack frame to call stack. More info is available here,


In Action

I will try to explain using below simple program,

$ cat x.c
#include <stdio.h>

int bar(int val)
        volatile int cnt;
        cnt = val;
        while (cnt--)

int foo(int val)

int main()
        return 0;

Let’s compile without using any optimization flag first and see output disassembly,

$ arm-none-eabi-gcc -c x.c

$ arm-none-eabi-size x.o
   text    data     bss     dec     hex filename
    156       0       0     156      9c x.o

$ arm-none-eabi-objdump -S x.o | less
Disassembly of section .text:

00000000 <bar>:
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   4:   e28db000        add     fp, sp, #0
   8:   e24dd014        sub     sp, sp, #20
   c:   e50b0010        str     r0, [fp, #-16]
  10:   e51b3010        ldr     r3, [fp, #-16]
  14:   e50b3008        str     r3, [fp, #-8]
  18:   e1a00000        nop                     ; (mov r0, r0)
  1c:   e51b3008        ldr     r3, [fp, #-8]
  20:   e3530000        cmp     r3, #0
  24:   03a02000        moveq   r2, #0
  28:   13a02001        movne   r2, #1
  2c:   e20220ff        and     r2, r2, #255    ; 0xff
  30:   e2433001        sub     r3, r3, #1
  34:   e50b3008        str     r3, [fp, #-8]
  38:   e3520000        cmp     r2, #0
  3c:   1afffff6        bne     1c <bar+0x1c>
  40:   e1a00003        mov     r0, r3
  44:   e28bd000        add     sp, fp, #0
  48:   e8bd0800        pop     {fp}
  4c:   e12fff1e        bx      lr

00000050 <foo>:
  50:   e92d4800        push    {fp, lr}
  54:   e28db004        add     fp, sp, #4
  58:   e24dd008        sub     sp, sp, #8
  5c:   e50b0008        str     r0, [fp, #-8]
  60:   e51b0008        ldr     r0, [fp, #-8]
  64:   ebfffffe        bl      0 <bar>
  68:   e1a00003        mov     r0, r3
  6c:   e24bd004        sub     sp, fp, #4
  70:   e8bd4800        pop     {fp, lr}
  74:   e12fff1e        bx      lr

00000078 <main>:
  78:   e92d4800        push    {fp, lr}
  7c:   e28db004        add     fp, sp, #4
  80:   e3a0000a        mov     r0, #10
  84:   ebfffffe        bl      50 <foo>
  88:   e3a03000        mov     r3, #0
  8c:   e1a00003        mov     r0, r3
  90:   e24bd004        sub     sp, fp, #4
  94:   e8bd4800        pop     {fp, lr}
  98:   e12fff1e        bx      lr

Points to note here are:

  1. Stack frame for foo is prepared by main in transition from main->foo
  2. Stack frame for bar is preapred by foo in transition from foor->bar
  3. Take a note of size of output object file, 156 bytes

Note: For ARM architecure lr stores return address of caller, bl instruction refers to branch and link operation whereas b instrction refers to branch only.

Let’s compile with optimization enabled now,

$ arm-none-eabi-gcc -Os -c x.c

$ arm-none-eabi-size x.o
   text    data     bss     dec     hex filename
     64       0       0      64      40 x.o

$ arm-none-eabi-objdump -S x.o | less
Disassembly of section .text:

00000000 <bar>:
   0:   e24dd008        sub     sp, sp, #8
   4:   e58d0004        str     r0, [sp, #4]
   8:   e59d3004        ldr     r3, [sp, #4]
   c:   e2432001        sub     r2, r3, #1
  10:   e3530000        cmp     r3, #0
  14:   e58d2004        str     r2, [sp, #4]
  18:   1afffffa        bne     8 <bar+0x8>
  1c:   e28dd008        add     sp, sp, #8
  20:   e12fff1e        bx      lr

00000024 <foo>:
  24:   eafffffe        b       0 <bar>

Disassembly of section .text.startup:

00000000 <main>:
   0:   e92d4008        push    {r3, lr}
   4:   e3a0000a        mov     r0, #10
   8:   ebfffffe        bl      24 <foo>
   c:   e3a00000        mov     r0, #0
  10:   e8bd4008        pop     {r3, lr}
  14:   e12fff1e        bx      lr

Points to note here are:

  1. Stack frame for foo is prepared by main in transition from main->foo
  2. No stack frame for bar, just a direct branch
  3. With size enabled optimization (-Os), size reduced to 64 bytes

From (2) above it is quite clear that, bar() will return directly to main(), at address c. This is what is “Tail Call Optimization”.


  1. Possibly can impact debugging process, because of lack of stack frame in call stack traversal.
  2. If in call, local storage is used then (as is will not be stored/passed in stack) may cause issues. But I guess compiler should handle this situation.

More Analysis

Is available here,


This entry was posted in Coding and tagged , , , . Bookmark the permalink.

One Response to Tail Call Optimization

  1. Pingback: Compile Optimization | Eternal Struggle

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s