This is a work in progress trying to thrash out work for QEMU in the 11.11 cycle. Eventually this will turn into a set of blueprints. Not everything listed here will go into 1111; there may be too much work to complete in the six months.
Michael/Mounir's summary spreadsheet for all of toolchain:
Top level headings correspond to Tn.n entries in the summary spreadsheet; second level headings are supposed to be well-defined chunks of work that will become blueprints. Work items are listed in a format which can be copied into a blueprint whiteboard.
Top level headings are sorted by TSC priority (bracketed); subheadings are sorted in my opinion of priority. I have occasionally split up a TSC toplevel heading where there are some definitely-required (higher priority) blueprints but also some more speculative lower-priority blueprints in the same general area.
We think we probably have about 8 man-months worth of effort for QEMU this cycle; this implies that we expect (starting from the top) to complete everything down to and including the performance work, and then have a month or so for some "QEMU improvements" work.
- T4.1 [HIGH] Maintain/upstream existing models [VE, beagle]
- T4.5 Device Tree support
- Tx.x [HIGH] No TR but work we are committed to
- T3.3 [MED] Regularly release and support Linaro QEMU
- T4.2 [MED] Initial A15 support
- T4.3 [MED] A15 system emulation planning
- T4.5 [LOW] Emulation speed
T4.6 [LOW] QEMU improvements
- Help upstream with AREG0 removal (1 week)
- cp15 infrastructure rework (2 weeks)
- Make risu usable for regression testing (2 weeks)
- Improve test coverage (1 week)
- Implement TrustZone (5 weeks)
- Selective feature enabling (1 week)
- Save/restore (1 week)
- Improve emulation of x86 binaries on ARM hosts (2 weeks?)
- Tx.x Device Tree enhancements
- T5.3 [WISHLIST] QEMU bluesky
- T4.7 [WISHLIST] Low cost A9 model
T4.1 [HIGH] Maintain/upstream existing models [VE, beagle]
Upstream OMAP3/Beagle patches (8 weeks)
There are nearly a hundred OMAP3 related patches in qemu-linaro. We'd like to get these cleaned up and submitted upstream so that you can boot Linaro beagle images on vanilla upstream QEMU.
find omap1/omap2 image for regression testing: TODO test omap1/2 image on qemu-linaro and fix any regressions: TODO analyse patch stack: TODO collapse patches which fix bugs in files added earlier: TODO reorder patch stack to put simple fixes earlier: TODO test and submit generic fixes #1: TODO test and submit generic fixes #2: TODO handle review issues on generic fixes #1: TODO handle review issues on generic fixes #2: TODO split patches which are per-file to be per-feature instead: TODO split patches #2: TODO split patches #3: TODO qdevify any new devices which are not qdev: TODO add save/restore support to new devices which are missing it: TODO test save/restore support: TODO submit omap3 patch series: TODO handle issues raised in omap3 review (round 1): TODO handle issues raised in omap3 review (round 2): TODO handle issues raised in omap3 review (round 3): TODO rebase qemu-linaro to use upstream commits and drop local patches: TODO
(The #1,#2... items here are aimed at keeping each work item to about two days worth of work.)
Estimate: 2 months (over a longer elapsed time)
Fix I/O problems (1 week?)
Currently the OMAP3 and Versatile Express models boot but are pretty near unusable for serious work because I/O (probably disk I/O) is either very slow or tends to lock up or stall. We need to investigate the cause of this bug and fix it. (My suspicion is that this is something to do with SD card emulation, because I don't think it happens when QEMU is emulating a hard disk.)
Find out why the I/O is so slow: TODO Write patch and submit upstream: TODO Handle issues raised in patch review: TODO
Estimate: 1 week (with the proviso that estimates for "find this bug" are almost always worthless)
Add OMAP3 USB support (2 weeks?)
At the moment the OMAP3/beagle model doesn't have working USB. This is important because for beagle keyboard, mouse and networking are all USB devices. The situation is slightly complicated because currently QEMU doesn't support EHCI USB, only OHCI (EHCI support is being worked on but there is no ETA for it). Beagle hardware has (1) USB OTG and (2) EHCI USB (although OMAP3 itself has both OHCI and EHCI). The goal here is a beagle model which can do keyboard, mouse, networking, plus documentation on the wiki of any necessary config/command line options. The Linux omap3 kernel doesn't try to use the OHCI hardware at all, only EHCI and the USB OTG.
Update: it looks as if EHCI support may be landing in QEMU in the not-too-distant future, although it is likely to be unstable initially.
The OTG support was at one point disabled in the kernel, but in linaro's 1105 beta release at least it seems to be back. USB OTG is therefore the most plausible route forwards.
Test USB-OTG on Beagle xM hardware and confirm that it works and how to configure it: TODO Identify QEMU command line options for USB kbd/mouse and confirm (via gdb) that QEMU has connected them to the USB-OTG controller: TODO Fix USB-OTG model bugs that make this not work: TODO Get USB networking working over USB OTG, if possible: TODO Update the wiki to document any necessary config options needed when launching a beagle model: TODO Submit patches to meego tree (and/or upstream if we've managed to get omap3 upstream by this point): TODO Handle issues raised in patch review: TODO
Estimate: 2 weeks.
Track down the causes of crashes seen under OBS (2 weeks?)
Martin Mohring reports that using qemu-linaro to do Meego builds in OBS reveals some persistent crash issues. I had thought that this was caused by TCG's locking issues with multithreaded linux-user apps. However Martin tested a patch by me which I believe fixes those locking problems and still saw crashes.
This sort of cross-build assist is a really popular use of QEMU so we should devote time to fixing these issues.
Get some concrete reproducible test cases from Martin: TODO Investigate crash 1: TODO Feed back fix for crash 1 to Martin for testing: TODO Investigate crash 2: TODO Feed back fix for crash 2 for testing: TODO Clean up and send fixes upstream: TODO Handle any issues raised in upstream review: TODO
Estimate: 2 weeks (a guess, on the assumption there are a few underlying causes rather than dozens)
T4.5 Device Tree support
Basic Device Tree integration (1 week)
Spec page: QEMUDeviceTree
This was asked for by the TSC but seems to have been missed from the set of technical topics.
Device tree support in the kernel and so on is moving forward; we should coordinate with Grant Likely to identify what QEMU work would be useful to integrate with this, and what may already have been been prototyped or done but not yet taken upstream.
This blueprint addresses the immediately obvious and non-controversial parts of the spec: basic support for device tree, on par with that already in QEMU for PPC and Microblaze. QEMU's built in boot loader (-kernel/-initrd command line options) needs to accept a devicetree blob from the user, make minimal changes to set the amount of memory, initrd location and so on, and then pass it to the kernel.
The more advanced/complicated/controversial parts of the spec above are in a separate blueprint.
Find test kernel and device tree blob for vexpress: TODO Add code to take blob, update it with memory etc, and pass to kernel: TODO Clean up and submit patch upstream: TODO Handle any patch review comments: TODO
Estimate: 1 week.
Tx.x [HIGH] No TR but work we are committed to
Help Android move towards upstream (2 weeks)
Currently the Android emulator uses an elderly fork of QEMU; they've cherry-picked some later fixes, but unfortunately never had the resources to try to upstream their device models or rebase on newer upstream QEMU versions. There is a QEMU Google Summer of Code project to start by doing upstream QEMU versions of the Goldfish virtual platform devices; I (Peter Maydell) am the official 'mentor' for the student. Google suggest that this will take an average of 5 hours a week from the mentor until the end of August.
Make sure student is started, status meetings set up, etc: TODO Support student (June): TODO Submit mid-term student evaluation: TODO Support student (July): TODO Support student (August): TODO Submit final student evaluation: TODO
Estimate: 2 weeks total (5 hrs * 15 weeks / 37.5 hrs)
T3.3 [MED] Regularly release and support Linaro QEMU
I'm just counting this one as the usual "test and release" work; fixing bugs is generally T4.4.
Make six releases of Linaro QEMU (2 weeks)
Release qemu-linaro 2011.06: TODO Release qemu-linaro 2011.07: TODO Release qemu-linaro 2011.08: TODO Release qemu-linaro 2011.09: TODO Release qemu-linaro 2011.10: TODO Release qemu-linaro 2011.11: TODO
Estimate: 2 weeks (a little less than two days per release, for a total of six monthly releases)
T4.2 [MED] Initial A15 support
Add support for A15's new user-level instructions to QEMU (1 week)
Dependency: version of ARM ARM defining VFPv4 and UDIV/SDIV. We're going to implement this as user-mode-only support for the new instructions defined by the architecture, so we don't need to be explicitly define an A15 CPU that you can select in QEMU. (That would really depend on having an A15 TRM as well as the ARM ARM update.)
Check that UDIV,SDIV match the A profile definitions of them: TODO Implement fused-multiply (FMAC) in softfloat and add decode, helper to use it: TODO Test and upstream patches: TODO Handle any issues raised in review: TODO
Estimate: 1 week
T4.3 [MED] A15 system emulation planning
Plan for possible implementation of a system level A15 model (2 weeks)
The deliverable here should be a writeup giving a definite estimate, a decision of whether this is worth doing, and a list of any necessary prerequisite QEMU fixes (eg trustzone). (We did a back-of-the-envelope guess that this might be six months of work.)
Read specs: TODO Identify useful prerequisite qemu fixes, necessary restructuring, etc: TODO Identify suitable test kernel: TODO Identify whether hypervisor for testing is or will be available: TODO Identify required board model/device work [probably modified vexpress]: TODO Idenify required core QEMU work based on previous estimates: TODO Write up results: TODO
Estimate: 2 weeks (1 week each PMM, DG)
T4.5 [LOW] Emulation speed
QEMU speed improvements (ARM front-end) (3 weeks)
This blueprint is for improvements to QEMU's speed, focusing on the ARM front end. This includes inlining helper functions and identifying places where we're generating suboptimal TCG op sequences. Say three weeks work including initial identifying and setting up to be able to run some plausible benchmarks to see how much we're improving.
Identify and set up useful benchmarking environment: TODO Benchmark perf work by upstream contributors to confirm it is good for ARM target: TODO Inline important helper functions: TODO Experiment with inlining other helpers: TODO Profile and identify other useful improvements: TODO Handle issues raised in code review: TODO
Estimate: 3 weeks (maybe 4 if we find enough things to fix)
Prototyping use of traces for QEMU speed improvements (4 weeks)
This blueprint is for prototyping a more significant change to QEMU's internals which might produce a better perf improvement or lay the foundation for future improvements by giving scope for more advanced optimisations to work. The issue we're trying to address is that TCG basic blocks are typically very short, because they end at any branch. This means that there's not much potential for optimisations to actually kick in. So we want to prototype some sort of 'trace' setup which allows the codegen and optimisation to work on a larger chunk of code. For a month's worth of work we'd hope to come out with a prototype suitable for posting upstream as an 'RFC' patchset (for example, we might make any required frontend changes only to the ARM frontend, and backend changes only to the x86 backend). Actually creating a completely mergeable patchset would be a separate blueprint and probably another month.
Some other people in QEMU upstream are already looking at generic TCG speed improvements (eg Aurelien, Kirill), so we need to make sure we cooperate here.
This should have about 10 work items for a 4-week blueprint; I've given in and made some of them generic to get the count right.
Become familiar with QEMU's current codegen approach: TODO Sketch out a design for adding traces: TODO Propose upstream, collect feedback: TODO Implement prototype 1: TODO Implement prototype 2: TODO Implement prototype 3: TODO Implement prototype 4: TODO Benchmark and instrument to see how effective it is: TODO Tweaks based on benchmarking results: TODO Submit RFC patchseries upstream: TODO
Estimate: 4 weeks (see above for caveats, scope)
T4.6 [LOW] QEMU improvements
I haven't included a "correctness fixes" blueprint, because we've done almost all of the known correctness issues this cycle, so any further work is going to be (a) the odd low-priority loose end and (b) reactive fixing of bugs as we notice them.
Help upstream with AREG0 removal (1 week)
QEMU currently has a global register which stores a pointer to the current CPU state, which is used in both TCG generated code and some helper functions. There's been a recent decision to try to move towards getting rid of this global. We should put in the work on the ARM front-end.
The justification is that this will force helper functions to be more explicit about when they mess with CPU state, which means TCG can do better optimisation because it doesn't have to be pessimistic; however the chances are that it won't be an immediate win (it might even be a short-term performance loss) so 'qemu improvements' seems a better fit than 'speed improvements'.
Redo and resubmit "FPSCR flags" patchset not to move functions to op_helper.c: TODO Write and submit patchset reverting '*_helper.c have access to global env' change: TODO Write and submit patchset which moves things out of op_helper.c where possible: TODO Handle issues raised in review: TODO
Estimate: 1 week
cp15 infrastructure rework (2 weeks)
At the moment QEMU handles cp15 accesses by calling out to a single helper function which is an enormous set of nested switch statements to handle the different coprocessor registers. Access permissions are checked separately at translate time. This design makes specifying board-dependent or cpu-dependent registers somewhat painful; it's also easy for the access permission checks to be out of sync. There is no support for banked cp15 registers either (needed for trustzone and virtualisation). We need a better design which lets a board or core register handler routines for cp15 registers. This will make the code cleaner and more maintainable as a base for new features.
confirm requirements (usual number of cp15 regs, different banking arrangements, access permissions): TODO write up sketch of proposed design, gather any comments from qemu-devel: TODO implement design: TODO testing and bugfixing: TODO clean up and submit patches upstream: TODO handle any issues raised in code review: TODO
Estimate: two weeks
Make risu usable for regression testing (2 weeks)
At the moment the risu instruction-sequence tester is great for testing patches for specific bug fixes, as it's easy to generate a test for the instruction being fixed. However it's missing the consistent coverage and "just run all the tests" functionality that is needed to use it as an automatic regression test.
design work: command line options, are golden results in same file or separate, avoiding version skew: TODO add ability to "record" golden test results when running on real hardware: TODO add ability to "replay" the golden test results rather than needing real hardware to cross-check emulator against: TODO add makefiles etc so it's easy to create golden results for a whole suite of tests: TODO add wrappers so it's easy to automatically run the whole suite against the emulator: TODO extend the risu coverage so we have a useful test suite: TODO add a jenkins job so we run these tests automatically: TODO
Total estimated time: 2 weeks.
Improve test coverage (1 week)
(You could file this under T4.4 if you preferred.)
At the moment we have a very basic automated continuous integration setup: it builds qemu-linaro from git and confirms that two images (beagle, versatile) boot OK. We should extend this to cover other architectures, linux-user testing, and ARM correctness testing. We should also set up automated benchmarking of builds so that we can track whether performance is improving or not.
We had a session at UDS-O which covered this. If we can provide straightforward instructions for what needs testing the validation team should be able to set something up either via jenkins or abrek or both.
write requirements email to send to Paul Larson etc: TODO identify useful benchmarks and tests, and sort them by order of bang-for-the-buck: TODO send validation folk necessary info, command lines, minor wrapper scripts: TODO
Estimated time: 1 week, spread over a longer elapsed timeframe
Implement TrustZone (5 weeks)
Spec page: QEMUTrustZone
QEMU doesn't currently implement TrustZone. We'd like it for a couple of reasons. Firstly, it's used by the omap3 model (the qemu-linaro tree contains a half-implementation of just enough of it to make things work, but I don't think it's upstreamable). Secondly, it's a prerequisite for virtualization.
Pull appropriate bits of monitor mode and SMC implementation from meego patches: TODO Bank all the required CP15 registers for secure/nonsecure mode: TODO Modify interrupt entry to select appropriate mode for entry based on security configuration: TODO Modify GIC to restrict modification of secure interrupts to be done from secure mode: TODO Modify MMU/TLB walk code to examine NS bits and use correct (banked, etc) cp15 regs: TODO Add new QEMU "MMU modes" for "secure user" and "secure priv" so they get different QEMU TLBs to nonsecure: TODO Make CPU start properly in secure mode: TODO Make relevant CPUs have trustzone feature bit, confirm this doesn't break existing images: TODO Test that omap3_boot's use of trustzone works OK: TODO Implement at least some of the A9 Versatile Express trustzone hardware: TODO Provide a simple piece of monitor mode setup/test code: TODO Test with more complicated trustzone images if available: TODO Clean up patchset and submit upstream: TODO Handle issues raised in code review: TODO
Estimate: about 5 weeks; we might be able to do a slightly chopped down "CPU bits only" version sufficient for omap3 in 3 weeks.
Selective feature enabling (1 week)
At the moment QEMU lets you specify a CPU, but you always get a fully-featured version of that CPU (for instance an A9 will always have Neon). We should allow the user to ask for an A9 with only VFPv3-D16, for example, so they can test code which will run on hardware which doesn't have all the options. QEMU already has this concept implemented in the x86 target: the general idea is that you have optional flags you tack on to the CPU name.
Implement CPU feature flag parsing framework: TODO Test that it works as expected in system and linux-user mode: TODO Clean up and send patches upstream: TODO Handle any issues raised in review: TODO
Estimated time: 1 week to put in the basic framework and support for turning on and off existing features. Adding configurability not currently supported as a QEMU feature switch (eg "I only have 16 VFP registers") would be more work.
Save/restore (1 week)
Save/restore support is a handy QEMU feature for debugging. However you can only use it if every device in your model supports it, and many of the ARM target devices don't. Fortunately it's easy to add. This blueprint covers fixing the ARM devboard models which are already upstream. (Fixing the OMAP3 models will be done as part of the upstreaming of OMAP3 patches.)
Add save/restore to vexpress devices which are missing it (PL061, PL181, a9mpcore, lan9118): TODO Test that save/restore works on vexpress: TODO Send patchset upstream: TODO Handle any review comments: TODO
Estimate: 1 week for the vexpress and other arm devboard platforms.
Improve emulation of x86 binaries on ARM hosts (2 weeks?)
I think this is a low priority but I'd like to put in an entry for it here since we've had some bug reports about it. There are currently problems with running x86 binaries on ARM hosts:
QEMU's x86 target code doesn't support multithreaded programs; this is largely because its exclusive access support is not set up to pass exclusive accesses up to the qemu linux-user top level loop so it can be turned into a mutex. This is harder on x86 because it can be done with a LOCK prefix on many instructions, compared to ARM/MIPS/etc where only a few instructions are exclusive-accesses. 760413, 758424
- QEMU's ARM TCG backend doesn't do the right thing with unaligned accesses on ARMv5. Linaro's focus isn't ARMv5, but on the other hand there is QEMU code that means this ought to work, so it seems worth a few days to try to get it sorted.
Debug problems with unaligned accesses on ARMv5 hosts: TODO Write and submit patch for unaligned access problems: TODO Check x86 docs, identify requirements for exclusive accesses: TODO Design and implement means for passing this up to linux-user main loop: TODO Enable NPTL in x86 target linux-user and test: TODO Clean up and upstream patches for x86 multithreaded linux-user support: TODO Handle any issues raised in code review: TODO
Estimate: 2 weeks
Tx.x Device Tree enhancements
Advanced Device Tree integration
Spec page: QEMUDeviceTree
This blueprint covers the parts of the spec dealing with making QEMU instantiate a board model based on the input device tree. This is (a) more complicated and (b) potentially controversial upstream, or at least likely to have to be done in a generalised and cross-architecture way. So it is worth splitting it into a second and possibly lower priority blueprint.
I haven't attempted to estimate or produce work items for this because (a) the scope and value of this more complicated extra work is unclear and (b) it does not seem likely that we will have the resources to work on it this cycle anyway.
T5.3 [WISHLIST] QEMU bluesky
Produce proposals for 'blue sky' projects for future cycles (3 weeks?)
There are a number of longer term possibilities for QEMU; some of these verge on research projects. We should be aiming to investigate at least some of them so we can write up what would be involved and what the benefits are, so that we can get informed opinion from members about whether these are worth doing.
- better diagnostics -- finer control of whether qemu complains about things which are likely OS bugs (register reads at wrong size, attempting UNPREDICTABLE behaviour, etc)
- tracepoints -- allow hooking in to interesting events like "memory access happened", "insn executed" and much more, for debugging, profiling and so on
timing/power info -- getting useful timing info from a model is tricky but Nokia R&D have prototyped an approach which calibrates against real hardware to decide how much to weight various events (cache miss, branch predictor miss, etc)
- record/replay and reversible debugging -- extending QEMU's vm save-and-restore support to allow "step backwards" type debugging as vmware does
- modularization (ie splitting device models from TCG/KVM core) -- there may be some interest in this from KVM folk too
Brainstorm a 'long-list' of interesting stuff we could be doing: TODO Preliminary investigation to narrow down to a shortlist of three feasible topics: TODO Investigate topic one: TODO Write up topic one: TODO Investigate topic two: TODO Write up topic two: TODO Investigate topic three: TODO Write up topic three: TODO
Estimate: three weeks
T4.7 [WISHLIST] Low cost A9 model
No blueprints, this TR is POSTPONED
The current best candidate for this is the Elba (Tuscan) board. This is just starting to become available to Linaro, so we could start modelling it. However, now we have the Versatile Express model for A9 this requirement seems less urgent. It might be preferable to let this drop down the priority list in favour of other work this cycle. Q: how useful is a Tuscan model without a model of the graphics chipset?
Status: on hold for this cycle.
PeterMaydell/Qemu1111 (last modified 2011-05-24 13:50:14)