Introduction

With big.LITTLE offering the best of both worlds - performance and power-savings, as-needed, the Linux scheduler needs to be enhanced to utilise this capability. The discussions at the Scheduler mini-summit point to a long-term effort to make the Linux scheduler more power-aware. This work should be generally useful to heterogenous systems.

The goal is to provide a usable mechanism that reliably allows all work to be moved off of a CPU so that CPU can be powered off and back on under user-application control.

This specification outlines the efforts happening inside Linaro to help the community come up with a good solution for HMP scheduling.

References

Planning

2012 Q1-Q2

Goal

  • Make it easy for scheduler developers to work with big.LITTLE (by emulation on current x86 hardware)
  • Make it easy for scheduler developers to test against typical mobile workloads
  • Create a HOWTO/white paper (for medium term use) on how to use current tools/techniques on big.LITTLE systems
  • Identify issues preventing the CPU from reaching a quiescent state (e.g. timers, workqueues)

Tasks

  • Review and test Paul Turner's per-task load tracking patchset and help it get integrated into mainline: Vincent
  • Provide best practices for using existing facilities (cgroups, cpusets, affinity, etc.) to adapt workloads to big.LITTLE systems: Vincent
  • Provide an easy way to emulate a big.LITTLE system on x86 workstations for everyone to test workloads on. Although this is not a substitute for real hardware, it should enable substantial software experimentation and development. There are severals ways to achieve this currently, figure out the best way: Amit K
  • Produce synthetic test load(s) for typical smartphone use cases: Dmitry
    1. Easily used by kernel hackers (must not require an Android setup, preferably a C program or sh/perl/python script)
    2. Must produce useful figures of merit:
      1. User experience (e.g., response time).
      2. Proxy for energy consumption, for example idle residency or DVFS state residency.
      3. A sched-top tool to analyze scheduler traces and statistics.
    3. Need a interactivity test better than cyclic test. Possibly build on Con Kolivas's work. [PJT & APZ]

  • A sched-top tool to analyze scheduler traces and statistics.

  • Accumulate a list of the attributes that SoC vendors believe to be important to include in the scheduling decision: Vincent

The currently list contains:

  • Power-domain and clock-domain constraints. For example, many ARM SoCs require that all CPUs in a cluster run at the same clock rate.

  • Thermal feedback and tradeoffs
  • Process type
  • Relative benefit of reducing frequency of several CPUs as opposed to consolidating workload on a small number of CPUs.
  • Instruction-per-clock (IPC) measurements and correlation between clock rate and useful forward progress.

2012 Q3-Q4

Goals

Tasks

  • Improving quiescence of CPUs:
    1. Create some way to allow a single userspace operation to evacuate processes, timers, and irqs from a specified CPU, which then goes idle. Similarly, there needs to be a single userspace operation to restore a CPU to runnable state.
  • Experiments to reduce CPU-hotplug system-wide disturbance:
    1. When a CPU is removed, don't kill its kthreads and don't deallocate its data. This also requires that the CPU-online path check for first bringup of a given CPU, at which point kthreads must be created and data structures must be allocated.

      This should greatly reduce CPU-online overhead and latency.

    2. There needs to be a how-to for "parking" per-CPU kthreads whose CPU is offline. Challenges:
      1. A wakeup can be delayed, so that it arrives at the kthread after the corresponding CPU has gone offline. Just having the CPU sleep is defeated.
      2. Many of the per-CPU kthreads are coded assuming that they will always be running on their CPU.
      3. The kthread must sleep interruptibly, otherwise soft-lockup messages will be emitted.

        Does "kill -STOP" work? ;-)

    3. Wean CPU hotplug from stop_cpu(). This requires review and possible updates to the CPU_DYING notifiers, and possibly other adjustments. [Paul E. McKenney to remove the stop-machine dependencies in RCU's CPU_DYING notifiers.]

      This reduces OS jitter and avoids waking idle CPUs.

    4. Measure the speed of offlining and onlining a CPU. This should be a handful of milliseconds, less than five.
  • Remove sched_mc. [Peter Zijlstra]

2012 Q4 - 2013

  • See if 3D gaming engines can make good use of big.LITTLE, even in the presence of thermal throttling.
  • Investigate alternative scheduler disciplines e.g. SCHED_EDF.
  • Investigate modal scheduling. Paul Turner gave the following as an example:
    1. Low load of interactive, low-utilization tasks might favor race to idle.
    2. Moderate load of periodic media-feeding tasks might lower frequency to the smallest value that allows the task to keep up with its hardware.
    3. High load of CPU-bound tasks in the absence of thermal limitations might increase frequency.

WorkingGroups/PowerManagement/Archives/OldSpecs/PowerAwareScheduling (last modified 2013-08-22 10:02:01)