TC2 Update

The live stream is available for viewing in the following locations:

  • Linaro Onair Google+ Page: https://plus.google.com/u/0/116754366033915823792/posts

  • Linaro Onair Youtube Channel: http://www.youtube.com/user/LinaroOnAir

  • In-Kernel Switcher (IKS)
    • Targeted first generation big.LITTLE products
    • Kernel not aware of the actual CPU topology
    • Short-term solution
  • MP solution
    • Expose both big and little cores to the kernel scheduler
    • Long-term solution
  • TC2 Overview
    • Versatile Express tile
    • 2x A15 @ 1.2 GHz, 3xA7 @ 1 GHz
    • No GPU - big problem for Android testing
    • Frequency scaling support
      • Independent frequencies for A15/A7 clusters
    • Cluster power gating, not core power gating
    • Voltage scaling range is fairly narrow - 3 voltages
    • Benchmarking
      • Automated
      • Choose workload, CPU mode, # of active cores, DVFS governor
      • Long term plan is to share this infrastructure
    • IKS approach:
      • big.LITTLE extends DVFS
      • when load is low, runs on A7s
      • when load is higher, runs on A15s
    • IKS audio usecase:
      • No performance advantage to running on A15s
      • 70% power saving running on A7s
    • IKS: bbench + audio usecase:
      • IKS adds another point to the power/performance curve - close to A15 point
    • Interactive governor
      • governor keeps CPUs in overdrive OPPs all the time which results in a power penalty
      • wrote Hispeed2 governor
    • Hispeed2
      • reduces power consumption with negligible performance penalty
      • room for tuning
  • 60 µs - minimum IKS latency
    • 45 µs interrupt disabled time
  • IKS switch could take up to 2 ms if the target cluster was off
    • both clusters would be on during the majority of that time
  • A7 only has 16 FP registers, but A15 has 32. How to handle this? Prevent A7 from being used if 32 FP registers in use?
  • MP solution:
    • Treat big and LITTLE cpus as separate scheduling domains
    • Use PJT's load tracking patches to track individual task load
      • Measures the amount of time the task stays on the runqueue
    • Migrate tasks based on task load
    • Patch set available from Linaro
  • MP experimental implementation:
    • checks task history
    • if it didn't spend a lot of time in the runqueue in the past, gets placed on an A7; otherwise, A15
      • this runs into problems when a task switches runqueue usage profiles
    • forced migration mechanism can push task to a big core if it's CPU intensive
  • MP audio test results:
    • Not as good as IKS
    • Extra power spent on A15 cluster
      • Looking at the trace, A15s did not execute a single userspace instruction
    • Spurious wakeups
      • timers, softirqs, rcu
    • Vincent Guittot: working on this in his patchsets
  • Scale invariant load:
    • Load accumulation rate does not scale with available compute capacity (CPU frequency, big vs little CPUs)
    • No link between scheduler and CPUfreq
      • Tasks may get migrated away from a low-frequency CPU by the scheduler before CPUFreq has a chance to increase the CPU frequency
    • Fixing this should prevent undesired migration between big & little clusters

  • Load accumulation rate
    • For some workloads, tracked load saturates too fast
    • Leads to unnecessary task migrations
    • Extending the tracked load history reduces tracked load variations due to sudden changes in load characteristics
    • Increasing the history leads to a more conservative tracked load in loads that mix long activity periods with long idle periods
      • The problem with this is that it increases the up/down migration delay for tasks that need to be migrated
  • MP top issues:
    • Spurious wakeups
    • cpu wakeup prioritisation
    • Global balancing
    • Cluster aware cpufreq governors

WorkingGroups/PowerManagement/Archives/ConfNotes/2012-10-Connect-Copenhagen/TC2update (last modified 2013-08-21 12:46:41)