Cluster/CPU idle management

  • Lorenzo Pieralisi
  • Differences between ARM/Intel
    • On Intel, cache layout completely transparent to the kernel
    • On Intel, package state managed by the hardware
  • CPUIdle framework has no notion of a "cluster" of CPUs
    • Each platform implement their own solution to handle the cluster shutdown
    • find a generic solution to be used for the different platform
  • ARM SMP CPU power management has changed recently in Linux
    • migration from hotplug to coupled C-states
    • "coupled C-states not a holistic solution" - unlike Nico & Dave's code

  • Introduce CPU idle cluster management
  • What is a cluster?
    • On ARM, it's defined by the MPIDR register
      • affinity levels 0, 1, 2
    • No relation to voltage domains
    • Affinity levels don't always define voltage domain boundaries
  • Kernel must alter scheduling to focus tasks into clusters
    • b.L mini summit on Thursday will touch upon this, so will some other sessions around b.L during the week
  • CPUIdle - coupled C-states
    • must wake CPUs up from WFI via IPIs to enter "cluster idle"
    • useful when the cpu0 had some work to do for the shutdown like the omap4 where only cpu0 can shutdown the cluster
  • CPUIdle governor problem
    • Current governors make decisions on a per-CPU basis
    • Different CPUs enter idle at different times
      • next_event must become a cluster concept, not just a per-CPU concept
      • if the last cpu is going idle and hence will initiate the cluster shutdown, the timer tied with another cpu may wakeup the entire cluster while we are shutting it down. It is not a problem if races are correctly handled but that isn't not efficient as that breaks the governors prediction
    • On a big.LITTLE system, there will be two CPUIdle drivers: one for A15s, one for A7s
      • under the theory that a CPUIdle driver is just a set of state data
      • not yet merged (target 3.8)
  • Logical vs. physical CPU ID differences?
  • LoUIS cache flushing support added
    • How to optimize broadcasts
      • Must not wake up clusters just to relay IRQs
    • In cluster shutdown, L2 cache flush is by far the primary time consumer - up to a millisecond in synthetic tests run by Lorenzo - l1_l2_clean_env()
    • Does governor work at cluster level or cpu-level?
      • Is next_event system-wide?
      • TODO: look into timer wheel

WorkingGroups/PowerManagement/Archives/ConfNotes/2012-10-Connect-Copenhagen/ClusterShutdown (last modified 2013-08-21 12:46:42)