Ftrace is part of the Linux kernel and includes a list of tracers that help developers to know what is going on inside the kernel.

prerequisites

Since the Linaro kernels do not enable ftrace by default you may create a custom kernel deb package.

general kernel config options

  • CONFIG_FTRACE

    • location: Kernel hacking -> Tracers

    • enables the kernel tracing infrastructure
    • note that some of the tracers significantly increase the kernel size and overhead - see the kernel docs for details
  • CONFIG_DYNAMIC_FTRACE

    • location: Kernel hacking -> Tracers -> enable/disable ftrace tracepoints dynamically

    • allows the kernel to patch out the calls to ftrace at runtime
    • this reduces the overhead when a tracer is not enabled

tracer/profiler/tests

Ftrace is controlled through various files in the tracing directory of the debugfs. If ftrace is configured into the linaro kernel the debugfs will be mounted at /sys/kernel/debug:

  • # mount | grep debugfs
    none on /sys/kernel/debug type debugfs (rw)

    The most important files in the /sys/kernel/debug/tracing directory are:

    • available_tracers shows which tracers are available:

      # cat /sys/kernel/debug/tracing/available_tracers 
      blk kmemtrace wakeup_rt wakeup irqsoff function sched_switch nop

    • current_tracer shows the current tracer and allows to set a tracer:

      • enable the function tracer: echo function > current_tracer

      • disable the function tracer: echo nop > current_tracer

    • tracing_enabled indicates whether the current tracer is enabled or not

      • echo 1 > tracing_enabled enables the current_tracer

      • echo 0 > tracing_enabled disables the current_tracer

    • trace_options allows to configure the output of the tracer

    • trace and trace_pipe are the interfaces to obtain the tracing data

      • echo > trace empties the trace log

function tracer

The function tracer allows to trace every kernel function.

  • CONFIG_HAVE_FUNCTION_TRACER

  • location: Kernel hacking -> Tracers -> Kernel Function Tracer

    cd /sys/kernel/debug/tracing
    echo > trace &&\
     echo 0 > tracing_enabled &&\
     echo function > current_tracer &&\
     echo 1 > tracing_enabled &&\
     sleep 1 &&\
     echo 0 > tracing_enabled
    head trace

    Output:

    # tracer: function
    #
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |       |          |         |
                bash-2005  [000]  5767.440000: _raw_spin_unlock_irqrestore <-tracing_ctrl_write
                bash-2005  [000]  5767.440000: mutex_unlock <-tracing_ctrl_write
                bash-2005  [000]  5767.440000: inotify_inode_queue_event <-vfs_write
                bash-2005  [000]  5767.440000: __fsnotify_parent <-vfs_write
                bash-2005  [000]  5767.440000: inotify_dentry_parent_queue_event <-vfs_write
                bash-2005  [000]  5767.440000: fsnotify <-vfs_write

    To reduce the amount of output you may apply a filter using set_ftrace_filter:

    echo > trace &&\
     echo 0 > tracing_enabled &&\
     echo function > current_tracer &&\
     echo 1 > tracing_enabled &&\
     ls > /dev/null &&\
     echo 0 > tracing_enabled
    head trace

    Output:

    # tracer: function
    #
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |       |          |         |
                  ls-1282  [002]  1947.230000: sys_open <-ret_fast_syscall
                  ls-1282  [002]  1947.230000: sys_open <-ret_fast_syscall
                  ls-1282  [002]  1947.230000: sys_open <-ret_fast_syscall
                  ls-1282  [002]  1947.230000: sys_open <-ret_fast_syscall
                  ls-1282  [002]  1947.230000: sys_open <-ret_fast_syscall
                  ls-1282  [002]  1947.230000: sys_open <-ret_fast_syscall

    You can also restrict ftrace to trace a certain PID only.

function profiler

Counts the number of calls to kernel functions.

  • CONFIG_FUNCTION_PROFILER

  • location: Kernel hacking -> Tracers -> Kernel function profiler The stats are available for each CPU through trace_stats:

    cd /sys/kernel/debug/tracing
    echo 1 > function_profile_enabled &&\
     dd if=/dev/urandom of=/dev/null bs=1M count=1 2> /dev/null &&\
     echo 0 > function_profile_enabled
    head -n 7 trace_stat/function*

    The output looks like this:

    ==> trace_stat/function0 <==
      Function                               Hit
      --------                               ---
      default_idle                          1843
      _raw_spin_lock                        1159
      _raw_spin_unlock_irqrestore            891
      _raw_spin_lock_irqsave                 873
      idle_cpu                               728
    
    ==> trace_stat/function1 <==
      Function                               Hit
      --------                               ---
      _raw_spin_unlock_irqrestore         125795
      _raw_spin_lock_irqsave              125777
      mix_pool_bytes_extract              104864
      extract_buf                         104860
      l2x0_cache_sync                      16296
    
    ==> trace_stat/function2 <==
      Function                               Hit
      --------                               ---
      default_idle                          2757
      idle_cpu                               690
      scale_rt_power                         640
      rcu_bh_qs                              507
      source_load                            480
    
    ==> trace_stat/function3 <==
      Function                               Hit
      --------                               ---
      default_idle                          2764
      idle_cpu                               670
      scale_rt_power                         600
      rcu_bh_qs                              507
      source_load                            450

context switch tracer

Activating the CONFIG_HAVE_FUNCTION_TRACER option also enables CONTEXT_SWITCH_TRACER that allows to trace process context switches:

  • cd /sys/kernel/debug/tracing
    echo > trace &&\
     echo 0 > tracing_enabled &&\
     echo sched_switch > current_tracer &&\
     echo 1 > tracing_enabled &&\
     sleep 1 &&\
     echo 0 > tracing_enabled
    less trace

    Output:

    # tracer: sched_switch
    #
    #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
    #              | |       |          |         |
                bash-1114  [002]  1656.370000:   1114:120:R   + [003]  1192:120:R sleep
                bash-1114  [002]  1656.370000:   1114:120:R   + [002]  1192:120:R sleep
                bash-1114  [002]  1656.370000:   1114:120:S ==> [002]  1192:120:R sleep
               sleep-1192  [002]  1656.370000:   1192:120:R   + [002]     9:  0:R migration/2
               sleep-1192  [002]  1656.370000:   1192:120:D ==> [002]     9:  0:R migration/2
         migration/2-9     [002]  1656.370000:      9:  0:R   + [002]  1192:120:R sleep
         migration/2-9     [002]  1656.370000:      9:  0:S ==> [002]  1192:120:R sleep
    [...]

block device queue tracer

Allows tracing the traffic happening on a block device queue.

  • CONFIG_BLK_DEV_IO_TRACE

  • location: Kernel hacking -> Tracers -> Support for tracing block IO actions

    cd /sys/kernel/debug/tracing
    echo > trace &&\
     echo 0 > tracing_enabled &&\
     echo 1 > /sys/block/sda/trace/enable &&\
     echo blk > current_tracer &&\
     echo 1 > tracing_enabled &&\
     dd if=/dev/sda of=/dev/null bs=1M count=1 2> /dev/null &&\
     echo 0 > tracing_enabled &&\
     echo 0 > /sys/block/sda/trace/enable
    less trace

    Output:

    # tracer: blk
    #
                  dd-1134  [001]   184.750000:   8,0    Q   R 2344 + 8 [dd]
                  dd-1134  [001]   184.750000:   8,0    B   R 2344 + 8 [dd]
                  dd-1134  [001]   184.750000:   8,0    G   R 2344 + 8 [dd]
                  dd-1134  [001]   184.750000:   8,0    P   N [dd]
                  dd-1134  [001]   184.750000:   8,0    I   R 2344 + 8 [dd]
                  dd-1134  [001]   184.750000:   8,0    Q   R 2352 + 8 [dd]
                  dd-1134  [001]   184.750000:   8,0    B   R 2352 + 8 [dd]
                  dd-1134  [001]   184.750000:   8,0    M   R 2352 + 8 [dd]
    [...]

wakeup

Measures the latency of highest priority task to wake up.

  • CONFIG_SCHED_TRACER

  • location: Kernel hacking -> Tracers -> Scheduling Latency Tracer

  • tracks the latency of the highest priority task to be scheduled in

    cd /sys/kernel/debug/tracing
    echo > trace &&\
     echo wakeup > current_tracer &&\
     echo latency-format > trace_options
    echo 0 > tracing_max_latency &&\
     echo 1 > tracing_enabled &&\
     chrt -f 10 sleep 1 &&\
     echo 0 > tracing_enabled
    less trace

    TODO: Investigate why the trace doesn't show any real information but the trace_pipe does:

       sleep-1268    2d....    0us :   1268:120:R   + [002]     9:  0:R migration/2
       sleep-1268    2d....    0us : 0
       sleep-1268    2d....    0us : check_preempt_wakeup <-try_to_wake_up
       sleep-1268    2d....    0us : resched_task <-check_preempt_wakeup
       sleep-1268    2dN...    0us : task_woken_rt <-try_to_wake_up
       sleep-1268    2dN...    0us : _raw_spin_unlock_irqrestore <-try_to_wake_up
       sleep-1268    2dN...    0us : _raw_spin_unlock_irqrestore <-stop_one_cpu
       sleep-1268    2.N...    0us : wait_for_completion <-stop_one_cpu
       sleep-1268    2.N...    0us : wait_for_common <-stop_one_cpu
       sleep-1268    2.N...    0us : _raw_spin_lock_irq <-wait_for_common
       [...]

interrupts-off latency tracer

This tracer measures the time spent in irqs-off critical sections.

  • CONFIG_IRQSOFF_TRACER

  • location: Kernel hacking -> Tracers -> Interrupts-off Latency Tracer

    echo irqsoff > current_tracer
    echo > trace && echo 0 > tracing_max_latency
    # do something
    echo latency-format > trace_options
    less trace

    Output:

    # tracer: irqsoff
    #
        bash-1114    0d....    0us : _raw_spin_lock_irqsave
        bash-1114    0d....    0us : _raw_spin_unlock_irqrestore <-__down_write_nested
        bash-1114    0d....    0us : _raw_spin_unlock_irqrestore
        bash-1114    0d....    0us : trace_hardirqs_on <-_raw_spin_unlock_irqrestore
        bash-1114    0d....    0us : _raw_spin_lock_irqsave
        bash-1114    0d....    0us : _raw_spin_unlock_irqrestore <-notify_change
        bash-1114    0d....    0us : _raw_spin_unlock_irqrestore
        bash-1114    0d....    0us : trace_hardirqs_on <-_raw_spin_unlock_irqrestore

workqueue tracer

Provides statistical information about each CPU workqueue thread.

  • CONFIG_WORKQUEUE_TRACER
  • location: Kernel hacking -> Tracers -> Trace workqueues

    # cat trace_stat/workqueues          
    # CPU  INSERTED  EXECUTED   NAME
    # |      |         |          |
      0   4832       4832       events/0
      0    430        430       khelper
      0      0          0       pm
      0     40         40       kblockd/0
      0   3415       3415       kmmcd
      0      0          0       aio/0
      0      0          0       kpsmoused
      0      0          0       kstriped
      0      0          0       kmpathd/0
      0      0          0       kmpath_handlerd
      0      0          0       ksnapd
      0      0          0       usbhid_resumer
      0      0          0       ext4-dio-unwrit
      1   3680       3680       events/1
      1      0          0       kblockd/1
      1      0          0       aio/1
      1      0          0       kmpathd/1
      1      0          0       ext4-dio-unwrit
      2  10432      10432       events/2
      2      0          0       kblockd/2
      2      0          0       aio/2
      2      0          0       kmpathd/2
      2      0          0       ext4-dio-unwrit
      3   3712       3712       events/3
      3     18         18       kblockd/3
      3      0          0       aio/3
      3      0          0       kmpathd/3
      3      0          0       ext4-dio-unwrit

kmemtrace

Allows to debug/profile the kernel slab allocator functions (kmalloc and friends).

  • CONFIG_KMEMTRACE

  • location: Kernel hacking -> Tracers -> Trace SLAB allocations

    cd /sys/kernel/debug/tracing &&\
     echo > trace &&\
     echo 0 > tracing_enabled &&\
     echo kmemtrace > current_tracer &&\
     echo 1 > tracing_enabled &&\
     sleep 1 &&\
     echo 0 > tracing_enabled &&\
     less trace

    Output:

    # tracer: kmemtrace
    #
                bash-1111  [000]   320.540000: type_id 0 call_site security_file_free+0x1c/0x24 ptr 3876888032
                bash-1111  [000]   320.540000: type_id 1 call_site getname+0x20/0xd8 ptr 3885162496 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
                bash-1111  [000]   320.540000: type_id 1 call_site user_path_at+0x74/0x94 ptr 3885162496
                bash-1111  [000]   320.540000: type_id 1 call_site getname+0x20/0xd8 ptr 3885162496 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
                bash-1111  [000]   320.540000: type_id 1 call_site user_path_at+0x74/0x94 ptr 3885162496
                bash-1111  [000]   320.540000: type_id 1 call_site getname+0x20/0xd8 ptr 3885162496 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
                bash-1111  [000]   320.540000: type_id 1 call_site user_path_at+0x74/0x94 ptr 3885162496
                bash-1111  [000]   320.540000: type_id 1 call_site getname+0x20/0xd8 ptr 3885162496 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1

stack tracer

Shows the maximum stack footprint of the kernel at /sys/kernel/debug/tracing/stack_trace

  • CONFIG_STACK_TRACER

  • location: Kernel hacking -> Tracers -> Trace max stack

  • add stacktrace to the bootcmd of the kernel to get the maximum stack size during bootup

    cd /sys/kernel/debug/tracing
    echo 1 > tracing_enabled
    echo 1 > /proc/sys/kernel/stack_tracer_enabled
    echo 0 > /proc/sys/kernel/stack_tracer_enabled
    cat stack_max_size
    cat stack_trace

    Output:

            Depth    Size   Location    (2 entries)
            -----    ----   --------
      0)     2188       8   ftrace_test_stop_func+0x20/0x2c
      1)     2180    2180   ftrace_call+0x4/0xc

branch profiling

  • CONFIG_PROFILE_ANNOTATED_BRANCHES

    • location: Kernel hacking -> Tracers -> Branch Profiling -> Trace likely/unlikely profiler

    • profiles all the likely and unlikely macros in the kernel
    • results: /sys/kernel/debug/tracing/profile_annotated_branch
    • warning: adds significant overhead
    Example:

    head trace_stat/branch_annotated 
     correct incorrect  %        Function                  File              Line
     ------- ---------  -        --------                  ----              ----
           0    32895 100 page_mapping                   mm.h                 659
           0    32890 100 mapping_unevictable            pagemap.h            50
           0    30661 100 trace_workqueue_insertion      workqueue.h          38
           0    30661 100 trace_workqueue_execution      workqueue.h          45
           0     1067 100 pre_schedule_rt                sched_rt.c           1476
           0      670 100 check_context                  mmu_context.h        62
           0      122 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 143
           0       28 100 trace_workqueue_creation       workqueue.h          68

  • CONFIG_PROFILE_ALL_BRANCHES

    • location: Kernel hacking -> Tracers -> Branch Profiling -> Profile all if conditionals

    • profiles all branch conditions
    • warning: adds great overhead (every "if" taken in the kernel is recorded whether it hit or miss)
    • results: /sys/kernel/debug/tracing/profile_branch

boot initcall tracer

This tracer helps developers to optimize the boot time. It records the timings of the initcalls and traces the key events and the identity of tasks that can cause boot delays.

  • CONFIG_BOOT_TRACER

  • location: Kernel hacking -> Tracers -> Trace boot initcalls In order to collect the initcall trace data the initcall_debug ftrace=initcall options need to be added to the kernel boot command line. As usual the trace can be found at /sys/kernel/debug/tracing/trace and may be visualized using scripts/bootgraph.pl. For the vexpress the bootcmd would look like this:

    setenv bootargs 'root=/dev/sda2 ro mem=1024M raid=noautodetect rootwait vmalloc=256MB devtmpfs.mount=0 mtdparts=armflash:1M@0x4000000(uboot)ro,7M@0x4100000(kernel),24M@0x4800000(initrd) mmci.fmax=190000 initcall_debug ftrace=initcall'
    cp 0x44800000 0x61000000 0x1800000; bootm 0x44100000 0x61000000

    During bootup the init calls are traced:

    [...]
    ftrace: allocating 15291 entries in 30 pages
    Calibrating local timer... 199.99MHz.
    calling  migration_init+0x0/0x68 @ 1
    initcall migration_init+0x0/0x68 returned 0 after 0 usecs
    calling  spawn_ksoftirqd+0x0/0x60 @ 1
    initcall spawn_ksoftirqd+0x0/0x60 returned 0 after 0 usecs
    [...]

    Create a SVG graphic that shows which functions take how much time:

    cat /sys/kernel/debug/tracing/trace | perl linux-linaro/scripts/bootgraph.pl > vexpress-bootgraph-`uname -r`.svg

    Here is an example of the 2.6.35-1010-linaro-vexpress kernel bootup:

    2.6.35-1010-linaro-vexpress init call

ring buffer benchmark

Stress tests the ring buffer and benchmark it:

  • CONFIG_RING_BUFFER_BENCHMARK

  • location: Kernel hacking -> Tracers -> Ring buffer benchmark stress tester

  • module name: ring_buffer_benchmark.ko

    modinfo ring_buffer_benchmark
    echo > /sys/kernel/debug/tracing/trace &&\
     modprobe ring_buffer_benchmark producer_fifo=10 &&\
     sleep 12 &&\
     cat /sys/kernel/debug/tracing/trace &&\
     rmmod ring_buffer_benchmark

    The output should look like this:

               <...>-1639  [002]  2553.570000: ring_buffer_producer: Starting ring buffer hammer
               <...>-1639  [002]  2562.850000: ring_buffer_producer: End ring buffer hammer
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Running Consumer at nice: 19
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Running Producer at SCHED_FIFO 10
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Time:     9277103 (usecs)
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Overruns: 4842450
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Read:     1740600  (by events)
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Entries:  0
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Total:    6583050
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Missed:   0
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Hit:      6583050
               <...>-1639  [002]  2563.180000: ring_buffer_producer: Entries per millisec: 709
               <...>-1639  [002]  2563.180000: ring_buffer_producer: 1410 ns per entry
               <...>-1639  [002]  2563.180000: ring_buffer_producer_thread: Sleeping for 10 secs

ftrace startup test

Tests the configured tracers on bootup to verify they are working

  • CONFIG_FTRACE_STARTUP_TEST

  • location: Kernel hacking -> Tracers -> Perform a startup test on ftrace

  • In addition the CONFIG_EVENT_TRACE_TEST_SYSCALLS at Kernel hacking -> Tracers -> Run selftest on syscall events enables the testing of syscalls events

    After bootup dmesg will show messages like this:

    Testing event sched_stat_iowait: OK
    Testing event sched_stat_runtime: OK
    Running tests on trace event systems:
    Testing event system skb: OK
    Testing event system scsi: OK
    [...]
    Testing event system irq: OK
    Testing event system sched: OK
    Running tests on all trace events:
    Testing all events: OK
    Running tests again, along with the function tracer
    Running tests on trace events:
    Testing event kfree_skb: OK
    Testing event skb_copy_datagram_iovec: OK
    Testing event scsi_dispatch_cmd_start: OK
    Testing event scsi_dispatch_cmd_error: OK
    [...]
    Testing event sched_stat_iowait: OK
    Testing event sched_stat_runtime: OK
    Running tests on trace event systems:
    Testing event system skb: OK
    Testing event system scsi: OK
    [...]
    Testing event system irq: OK
    Testing event system sched: OK
    Running tests on all trace events:
    Testing all events: OK

pointers

KenWerner/Sandbox/ftrace (last modified 2010-12-10 11:40:20)