This is a work in progress trying to thrash out work for QEMU in the 11.11 cycle. Eventually this will turn into a set of blueprints. Not everything listed here will go into 1111; there may be too much work to complete in the six months.

Michael/Mounir's summary spreadsheet for all of toolchain:

Top level headings correspond to Tn.n entries in the summary spreadsheet; second level headings are supposed to be well-defined chunks of work that will become blueprints. Work items are listed in a format which can be copied into a blueprint whiteboard.

Top level headings are sorted by TSC priority (bracketed); subheadings are sorted in my opinion of priority. I have occasionally split up a TSC toplevel heading where there are some definitely-required (higher priority) blueprints but also some more speculative lower-priority blueprints in the same general area.

We think we probably have about 8 man-months worth of effort for QEMU this cycle; this implies that we expect (starting from the top) to complete everything down to and including the performance work, and then have a month or so for some "QEMU improvements" work.

T4.1 [HIGH] Maintain/upstream existing models [VE, beagle]

Upstream OMAP3/Beagle patches (8 weeks)

There are nearly a hundred OMAP3 related patches in qemu-linaro. We'd like to get these cleaned up and submitted upstream so that you can boot Linaro beagle images on vanilla upstream QEMU.

find omap1/omap2 image for regression testing: TODO
test omap1/2 image on qemu-linaro and fix any regressions: TODO
analyse patch stack: TODO
collapse patches which fix bugs in files added earlier: TODO
reorder patch stack to put simple fixes earlier: TODO
test and submit generic fixes #1: TODO
test and submit generic fixes #2: TODO
handle review issues on generic fixes #1: TODO
handle review issues on generic fixes #2: TODO
split patches which are per-file to be per-feature instead: TODO
split patches #2: TODO
split patches #3: TODO
qdevify any new devices which are not qdev: TODO
add save/restore support to new devices which are missing it: TODO
test save/restore support: TODO
submit omap3 patch series: TODO
handle issues raised in omap3 review (round 1): TODO
handle issues raised in omap3 review (round 2): TODO
handle issues raised in omap3 review (round 3): TODO
rebase qemu-linaro to use upstream commits and drop local patches: TODO

(The #1,#2... items here are aimed at keeping each work item to about two days worth of work.)

Estimate: 2 months (over a longer elapsed time)

Fix I/O problems (1 week?)

Currently the OMAP3 and Versatile Express models boot but are pretty near unusable for serious work because I/O (probably disk I/O) is either very slow or tends to lock up or stall. We need to investigate the cause of this bug and fix it. (My suspicion is that this is something to do with SD card emulation, because I don't think it happens when QEMU is emulating a hard disk.)

Find out why the I/O is so slow: TODO
Write patch and submit upstream: TODO
Handle issues raised in patch review: TODO

Estimate: 1 week (with the proviso that estimates for "find this bug" are almost always worthless)

Add OMAP3 USB support (2 weeks?)

At the moment the OMAP3/beagle model doesn't have working USB. This is important because for beagle keyboard, mouse and networking are all USB devices. The situation is slightly complicated because currently QEMU doesn't support EHCI USB, only OHCI (EHCI support is being worked on but there is no ETA for it). Beagle hardware has (1) USB OTG and (2) EHCI USB (although OMAP3 itself has both OHCI and EHCI). The goal here is a beagle model which can do keyboard, mouse, networking, plus documentation on the wiki of any necessary config/command line options. The Linux omap3 kernel doesn't try to use the OHCI hardware at all, only EHCI and the USB OTG.

Update: it looks as if EHCI support may be landing in QEMU in the not-too-distant future, although it is likely to be unstable initially.

The OTG support was at one point disabled in the kernel, but in linaro's 1105 beta release at least it seems to be back. USB OTG is therefore the most plausible route forwards.

Test USB-OTG on Beagle xM hardware and confirm that it works and how to configure it: TODO
Identify QEMU command line options for USB kbd/mouse and confirm (via gdb) that QEMU has connected them to the USB-OTG controller: TODO
Fix USB-OTG model bugs that make this not work: TODO
Get USB networking working over USB OTG, if possible: TODO
Update the wiki to document any necessary config options needed when launching a beagle model: TODO
Submit patches to meego tree (and/or upstream if we've managed to get omap3 upstream by this point): TODO
Handle issues raised in patch review: TODO

Estimate: 2 weeks.

Track down the causes of crashes seen under OBS (2 weeks?)

Martin Mohring reports that using qemu-linaro to do Meego builds in OBS reveals some persistent crash issues. I had thought that this was caused by TCG's locking issues with multithreaded linux-user apps. However Martin tested a patch by me which I believe fixes those locking problems and still saw crashes.

This sort of cross-build assist is a really popular use of QEMU so we should devote time to fixing these issues.

Get some concrete reproducible test cases from Martin: TODO
Investigate crash 1: TODO
Feed back fix for crash 1 to Martin for testing: TODO
Investigate crash 2: TODO
Feed back fix for crash 2 for testing: TODO
Clean up and send fixes upstream: TODO
Handle any issues raised in upstream review: TODO

Estimate: 2 weeks (a guess, on the assumption there are a few underlying causes rather than dozens)

T4.5 Device Tree support

Basic Device Tree integration (1 week)

Spec page: QEMUDeviceTree

This was asked for by the TSC but seems to have been missed from the set of technical topics.

Device tree support in the kernel and so on is moving forward; we should coordinate with Grant Likely to identify what QEMU work would be useful to integrate with this, and what may already have been been prototyped or done but not yet taken upstream.

This blueprint addresses the immediately obvious and non-controversial parts of the spec: basic support for device tree, on par with that already in QEMU for PPC and Microblaze. QEMU's built in boot loader (-kernel/-initrd command line options) needs to accept a devicetree blob from the user, make minimal changes to set the amount of memory, initrd location and so on, and then pass it to the kernel.

The more advanced/complicated/controversial parts of the spec above are in a separate blueprint.

Find test kernel and device tree blob for vexpress: TODO
Add code to take blob, update it with memory etc, and pass to kernel: TODO
Clean up and submit patch upstream: TODO
Handle any patch review comments: TODO

Estimate: 1 week.

Tx.x [HIGH] No TR but work we are committed to

Help Android move towards upstream (2 weeks)

Currently the Android emulator uses an elderly fork of QEMU; they've cherry-picked some later fixes, but unfortunately never had the resources to try to upstream their device models or rebase on newer upstream QEMU versions. There is a QEMU Google Summer of Code project to start by doing upstream QEMU versions of the Goldfish virtual platform devices; I (Peter Maydell) am the official 'mentor' for the student. Google suggest that this will take an average of 5 hours a week from the mentor until the end of August.

Make sure student is started, status meetings set up, etc: TODO
Support student (June): TODO
Submit mid-term student evaluation: TODO
Support student (July): TODO
Support student (August): TODO
Submit final student evaluation: TODO

Estimate: 2 weeks total (5 hrs * 15 weeks / 37.5 hrs)

T3.3 [MED] Regularly release and support Linaro QEMU

I'm just counting this one as the usual "test and release" work; fixing bugs is generally T4.4.

Make six releases of Linaro QEMU (2 weeks)

Release qemu-linaro 2011.06: TODO
Release qemu-linaro 2011.07: TODO
Release qemu-linaro 2011.08: TODO
Release qemu-linaro 2011.09: TODO
Release qemu-linaro 2011.10: TODO
Release qemu-linaro 2011.11: TODO

Estimate: 2 weeks (a little less than two days per release, for a total of six monthly releases)

T4.2 [MED] Initial A15 support

Add support for A15's new user-level instructions to QEMU (1 week)

Dependency: version of ARM ARM defining VFPv4 and UDIV/SDIV. We're going to implement this as user-mode-only support for the new instructions defined by the architecture, so we don't need to be explicitly define an A15 CPU that you can select in QEMU. (That would really depend on having an A15 TRM as well as the ARM ARM update.)

Check that UDIV,SDIV match the A profile definitions of them: TODO
Implement fused-multiply (FMAC) in softfloat and add decode, helper to use it: TODO
Test and upstream patches: TODO
Handle any issues raised in review: TODO

Estimate: 1 week

T4.3 [MED] A15 system emulation planning

Plan for possible implementation of a system level A15 model (2 weeks)

The deliverable here should be a writeup giving a definite estimate, a decision of whether this is worth doing, and a list of any necessary prerequisite QEMU fixes (eg trustzone). (We did a back-of-the-envelope guess that this might be six months of work.)

Read specs: TODO
Identify useful prerequisite qemu fixes, necessary restructuring, etc: TODO
Identify suitable test kernel: TODO
Identify whether hypervisor for testing is or will be available: TODO
Identify required board model/device work [probably modified vexpress]: TODO
Idenify required core QEMU work based on previous estimates: TODO
Write up results: TODO

Estimate: 2 weeks (1 week each PMM, DG)

T4.5 [LOW] Emulation speed

QEMU speed improvements (ARM front-end) (3 weeks)

This blueprint is for improvements to QEMU's speed, focusing on the ARM front end. This includes inlining helper functions and identifying places where we're generating suboptimal TCG op sequences. Say three weeks work including initial identifying and setting up to be able to run some plausible benchmarks to see how much we're improving.

Identify and set up useful benchmarking environment: TODO
Benchmark perf work by upstream contributors to confirm it is good for ARM target: TODO
Inline important helper functions: TODO
Experiment with inlining other helpers: TODO
Profile and identify other useful improvements: TODO
Handle issues raised in code review: TODO

Estimate: 3 weeks (maybe 4 if we find enough things to fix)

Prototyping use of traces for QEMU speed improvements (4 weeks)

This blueprint is for prototyping a more significant change to QEMU's internals which might produce a better perf improvement or lay the foundation for future improvements by giving scope for more advanced optimisations to work. The issue we're trying to address is that TCG basic blocks are typically very short, because they end at any branch. This means that there's not much potential for optimisations to actually kick in. So we want to prototype some sort of 'trace' setup which allows the codegen and optimisation to work on a larger chunk of code. For a month's worth of work we'd hope to come out with a prototype suitable for posting upstream as an 'RFC' patchset (for example, we might make any required frontend changes only to the ARM frontend, and backend changes only to the x86 backend). Actually creating a completely mergeable patchset would be a separate blueprint and probably another month.

Some other people in QEMU upstream are already looking at generic TCG speed improvements (eg Aurelien, Kirill), so we need to make sure we cooperate here.

This should have about 10 work items for a 4-week blueprint; I've given in and made some of them generic to get the count right.

Become familiar with QEMU's current codegen approach: TODO
Sketch out a design for adding traces: TODO
Propose upstream, collect feedback: TODO
Implement prototype 1: TODO
Implement prototype 2: TODO
Implement prototype 3: TODO
Implement prototype 4: TODO
Benchmark and instrument to see how effective it is: TODO
Tweaks based on benchmarking results: TODO
Submit RFC patchseries upstream: TODO

Estimate: 4 weeks (see above for caveats, scope)

T4.6 [LOW] QEMU improvements

I haven't included a "correctness fixes" blueprint, because we've done almost all of the known correctness issues this cycle, so any further work is going to be (a) the odd low-priority loose end and (b) reactive fixing of bugs as we notice them.

Help upstream with AREG0 removal (1 week)

QEMU currently has a global register which stores a pointer to the current CPU state, which is used in both TCG generated code and some helper functions. There's been a recent decision to try to move towards getting rid of this global. We should put in the work on the ARM front-end.

The justification is that this will force helper functions to be more explicit about when they mess with CPU state, which means TCG can do better optimisation because it doesn't have to be pessimistic; however the chances are that it won't be an immediate win (it might even be a short-term performance loss) so 'qemu improvements' seems a better fit than 'speed improvements'.

Redo and resubmit "FPSCR flags" patchset not to move functions to op_helper.c: TODO
Write and submit patchset reverting '*_helper.c have access to global env' change: TODO
Write and submit patchset which moves things out of op_helper.c where possible: TODO
Handle issues raised in review: TODO

Estimate: 1 week

cp15 infrastructure rework (2 weeks)

At the moment QEMU handles cp15 accesses by calling out to a single helper function which is an enormous set of nested switch statements to handle the different coprocessor registers. Access permissions are checked separately at translate time. This design makes specifying board-dependent or cpu-dependent registers somewhat painful; it's also easy for the access permission checks to be out of sync. There is no support for banked cp15 registers either (needed for trustzone and virtualisation). We need a better design which lets a board or core register handler routines for cp15 registers. This will make the code cleaner and more maintainable as a base for new features.

confirm requirements (usual number of cp15 regs, different banking arrangements, access permissions): TODO
write up sketch of proposed design, gather any comments from qemu-devel: TODO
implement design: TODO
testing and bugfixing: TODO
clean up and submit patches upstream: TODO
handle any issues raised in code review: TODO

Estimate: two weeks

Make risu usable for regression testing (2 weeks)

At the moment the risu instruction-sequence tester is great for testing patches for specific bug fixes, as it's easy to generate a test for the instruction being fixed. However it's missing the consistent coverage and "just run all the tests" functionality that is needed to use it as an automatic regression test.

design work: command line options, are golden results in same file or separate, avoiding version skew: TODO
add ability to "record" golden test results when running on real hardware: TODO
add ability to "replay" the golden test results rather than needing real hardware to cross-check emulator against: TODO
add makefiles etc so it's easy to create golden results for a whole suite of tests: TODO
add wrappers so it's easy to automatically run the whole suite against the emulator: TODO
extend the risu coverage so we have a useful test suite: TODO
add a jenkins job so we run these tests automatically: TODO

Total estimated time: 2 weeks.

Improve test coverage (1 week)

(You could file this under T4.4 if you preferred.)

At the moment we have a very basic automated continuous integration setup: it builds qemu-linaro from git and confirms that two images (beagle, versatile) boot OK. We should extend this to cover other architectures, linux-user testing, and ARM correctness testing. We should also set up automated benchmarking of builds so that we can track whether performance is improving or not.

We had a session at UDS-O which covered this. If we can provide straightforward instructions for what needs testing the validation team should be able to set something up either via jenkins or abrek or both.

write requirements email to send to Paul Larson etc: TODO
identify useful benchmarks and tests, and sort them by order of bang-for-the-buck: TODO
send validation folk necessary info, command lines, minor wrapper scripts: TODO

Estimated time: 1 week, spread over a longer elapsed timeframe

Implement TrustZone (5 weeks)

Spec page: QEMUTrustZone

QEMU doesn't currently implement TrustZone. We'd like it for a couple of reasons. Firstly, it's used by the omap3 model (the qemu-linaro tree contains a half-implementation of just enough of it to make things work, but I don't think it's upstreamable). Secondly, it's a prerequisite for virtualization.

Pull appropriate bits of monitor mode and SMC implementation from meego patches: TODO
Bank all the required CP15 registers for secure/nonsecure mode: TODO
Modify interrupt entry to select appropriate mode for entry based on security configuration: TODO
Modify GIC to restrict modification of secure interrupts to be done from secure mode: TODO
Modify MMU/TLB walk code to examine NS bits and use correct (banked, etc) cp15 regs: TODO
Add new QEMU "MMU modes" for "secure user" and "secure priv" so they get different QEMU TLBs to nonsecure: TODO
Make CPU start properly in secure mode: TODO
Make relevant CPUs have trustzone feature bit, confirm this doesn't break existing images: TODO
Test that omap3_boot's use of trustzone works OK: TODO
Implement at least some of the A9 Versatile Express trustzone hardware: TODO
Provide a simple piece of monitor mode setup/test code: TODO
Test with more complicated trustzone images if available: TODO
Clean up patchset and submit upstream: TODO
Handle issues raised in code review: TODO

Estimate: about 5 weeks; we might be able to do a slightly chopped down "CPU bits only" version sufficient for omap3 in 3 weeks.

Selective feature enabling (1 week)

At the moment QEMU lets you specify a CPU, but you always get a fully-featured version of that CPU (for instance an A9 will always have Neon). We should allow the user to ask for an A9 with only VFPv3-D16, for example, so they can test code which will run on hardware which doesn't have all the options. QEMU already has this concept implemented in the x86 target: the general idea is that you have optional flags you tack on to the CPU name.

Implement CPU feature flag parsing framework: TODO
Test that it works as expected in system and linux-user mode: TODO
Clean up and send patches upstream: TODO
Handle any issues raised in review: TODO

Estimated time: 1 week to put in the basic framework and support for turning on and off existing features. Adding configurability not currently supported as a QEMU feature switch (eg "I only have 16 VFP registers") would be more work.

Save/restore (1 week)

Save/restore support is a handy QEMU feature for debugging. However you can only use it if every device in your model supports it, and many of the ARM target devices don't. Fortunately it's easy to add. This blueprint covers fixing the ARM devboard models which are already upstream. (Fixing the OMAP3 models will be done as part of the upstreaming of OMAP3 patches.)

Add save/restore to vexpress devices which are missing it (PL061, PL181, a9mpcore, lan9118): TODO
Test that save/restore works on vexpress: TODO
Send patchset upstream: TODO
Handle any review comments: TODO

Estimate: 1 week for the vexpress and other arm devboard platforms.

Improve emulation of x86 binaries on ARM hosts (2 weeks?)

I think this is a low priority but I'd like to put in an entry for it here since we've had some bug reports about it. There are currently problems with running x86 binaries on ARM hosts:

  • QEMU's x86 target code doesn't support multithreaded programs; this is largely because its exclusive access support is not set up to pass exclusive accesses up to the qemu linux-user top level loop so it can be turned into a mutex. This is harder on x86 because it can be done with a LOCK prefix on many instructions, compared to ARM/MIPS/etc where only a few instructions are exclusive-accesses. 760413, 758424

  • QEMU's ARM TCG backend doesn't do the right thing with unaligned accesses on ARMv5. Linaro's focus isn't ARMv5, but on the other hand there is QEMU code that means this ought to work, so it seems worth a few days to try to get it sorted.

Debug problems with unaligned accesses on ARMv5 hosts: TODO
Write and submit patch for unaligned access problems: TODO
Check x86 docs, identify requirements for exclusive accesses: TODO
Design and implement means for passing this up to linux-user main loop: TODO
Enable NPTL in x86 target linux-user and test: TODO
Clean up and upstream patches for x86 multithreaded linux-user support: TODO
Handle any issues raised in code review: TODO

Estimate: 2 weeks

Tx.x Device Tree enhancements

Advanced Device Tree integration

Spec page: QEMUDeviceTree

This blueprint covers the parts of the spec dealing with making QEMU instantiate a board model based on the input device tree. This is (a) more complicated and (b) potentially controversial upstream, or at least likely to have to be done in a generalised and cross-architecture way. So it is worth splitting it into a second and possibly lower priority blueprint.

I haven't attempted to estimate or produce work items for this because (a) the scope and value of this more complicated extra work is unclear and (b) it does not seem likely that we will have the resources to work on it this cycle anyway.

T5.3 [WISHLIST] QEMU bluesky

Produce proposals for 'blue sky' projects for future cycles (3 weeks?)

There are a number of longer term possibilities for QEMU; some of these verge on research projects. We should be aiming to investigate at least some of them so we can write up what would be involved and what the benefits are, so that we can get informed opinion from members about whether these are worth doing.

Possible topics:

  • better diagnostics -- finer control of whether qemu complains about things which are likely OS bugs (register reads at wrong size, attempting UNPREDICTABLE behaviour, etc)
  • tracepoints -- allow hooking in to interesting events like "memory access happened", "insn executed" and much more, for debugging, profiling and so on
  • timing/power info -- getting useful timing info from a model is tricky but Nokia R&D have prototyped an approach which calibrates against real hardware to decide how much to weight various events (cache miss, branch predictor miss, etc)

  • record/replay and reversible debugging -- extending QEMU's vm save-and-restore support to allow "step backwards" type debugging as vmware does
  • modularization (ie splitting device models from TCG/KVM core) -- there may be some interest in this from KVM folk too

Brainstorm a 'long-list' of interesting stuff we could be doing: TODO
Preliminary investigation to narrow down to a shortlist of three feasible topics: TODO
Investigate topic one: TODO
Write up topic one: TODO
Investigate topic two: TODO
Write up topic two: TODO
Investigate topic three: TODO
Write up topic three: TODO

Estimate: three weeks

T4.7 [WISHLIST] Low cost A9 model

No blueprints, this TR is POSTPONED

The current best candidate for this is the Elba (Tuscan) board. This is just starting to become available to Linaro, so we could start modelling it. However, now we have the Versatile Express model for A9 this requirement seems less urgent. It might be preferable to let this drop down the priority list in favour of other work this cycle. Q: how useful is a Tuscan model without a model of the graphics chipset?

Status: on hold for this cycle.

PeterMaydell/Qemu1111 (last modified 2011-05-24 13:50:14)