Adaptive Tickless (a.k.a full NO_HZ)

Introduction

The linux kernel has supported tickless idle (a.k.a. NOHZ) for some time now. This feature disables scheduling-clock interrupts when a CPU is idle. Starting with the v3.10 kernel, work has begun to broaden the scope of tickless support. The initial use-case when there is a single task running on a CPU. When that happens, the goal is to eliminate OS jitter caused by scheduling-clock interrupts and the various other work that is tied to the the periodic tick.

Background

For a detailed background, see the excellent LWN article: http://lwn.net/Articles/549580/. Also, the kernel source (>= v3.10) has additional documentation in Documentation/timers/NO_HZ.txt.

Current Status

In the v3.10 kernel, support has been merged to enter 'full tickless" mode in the case where there's a single task running on a CPU. However, because there is still some work tied to the scheduling-clock interrupts that cannot be deferred forever (c.f. Documentation/timers/NO_HZ.txt), the current code has a maximum tick deferment value, currently set to 1 second. (a patch has been proposed to make this value configurable for ease of experimentation.)

ARM support

ARM support is merged for v3.10 as well, but currently requires some minor, Kconfig-only patches since one of the dependencies (namely, CONFIG_VIRT_CPU_ACCOUNTING_GEN) is only allowed for 64-bit platforms. The ARM patches simply remove this limitation.

The ARM patches are available in the arm-nohz-v3.10 branch of this git repository: https://git.kernel.org/cgit/linux/kernel/git/khilman/linux.git, and have been tested on ARM SMP (OMAP4/Panda) and ARM big.LITTLE (TC2.)

Example

As a simple demonstration of adaptive tickless in action, a 2-CPU ARM system (OMAP4 Pandboard) was used as a test platform. Load was generated using the stress utility to keep one CPU busy and alter the load on the 2nd CPU, and CPUsets were used to isolate tasks to specific cores.

The screenshot below shows a kernelshark visualization of the trace resulting from the test.

  • nohz-trace.png

From this trace, what's important to notice is the minimal number of events happening while a single task is running. In fact, looking closely, a periodic pattern of events is clear. These events are actually caused by the 1 sec. maximum tick deferment.

To be sure that's the case, using the proposed patch mentioned above to configure the maximum deferment, it can be completely disabled. The following trace shows the same test with the maximum deferment disabled:

  • nohz-trace-max-defer.png

From this trace, it is much more clear that the since task on CPU1 has run without any interference from the OS. In fact it runs uninterrupted for more than 14 seconds. The reason it stops at around 14 seconds is due to the overflow of the 32-bit timer running at 300MHz.

This shell script was used to generate the load and the trace data.

WorkingGroups/PowerManagement/Doc/AdaptiveTickless (last modified 2013-06-28 04:47:19)