What should Linaro be doing in order to validate it's engineering efforts?
- Get best practice benchmarks from Linaro members (organise via a TSC technical session)
- Feed benchmarking needs to benchmark and validation team as explored for each new area (for example, Server)
- This is a big topic and needs to be handled outside of Office of CTO
Validation and Benchmarking
Linaro's validation so far consists of
- Individual efforts in working groups
- Aligned with upstream code base (for example gcc)
- Dog fooding
- Running Linaro built projects on own ARM hardware (limited by hardware availability, testing tends to be ad hoc (but finds gross errors))
There is an expectation from the upstream open source projects that patches are properly formed and testing. How high those expectations are depends on the project. Contributors who break things are discouraged, so there's peer pressure to maintain stability. Some codebases (gcc is a good example), have explicit sets of tests, but their coverage varies.
The Toolchain working group releases benchmark results here - http://ex.seabright.co.nz/helpers/benchcompare.
A daily cron job polls Launchpad each day, makes a new tarball snapshot. Assuming this works, a build is fired up on the toolchain build machines (WorkingGroups/ToolChain/Hardware). These are all controlled via a set of makefiles that build gcc, run the gcc testsuite, and then use this gcc to build, test, and benchmark a range of programs including eglibc, Python, ffmpeg, bzip, and Vorbis.
The build system supports variants, and uses this to benchmark in different modes such as with NEON, optimised for size, and so on. It would be good extend this test to earlier architectures as part of the toolchain's 'do no harm' statement.
The same system is used for gdb and llvm. Build logs and test results are recorded (http://builds.linaro.org/toolchain/). Builds are run for ARM, x86, and x86_64 but the benchmarks aren't valid on the last two. There's a few helper scripts to track the build results and benchmark results but nothing formal.
Ubuntu picks up the Linaro compiler and uses it to rebuild a large number of packages, and often runs the package test suite as part of that build.
Question: is this automatic or do they wait for us to signal that the build was good?
The platform team supports weekly and milestone testing (see http://qatracker.linaro.org). This is all well documented and tracked, but, essentially the testing is by hand, with engineers needing to install new images, run a series of test cases and report the results. The test heads supported are ALIP, headless, Netbook and Plasma.
This platform testing is fairly spotty; the platform team is using partner assigned QA engineers to have better coverage for each platform and carry out milestone testing. Currently they are working on test plans to map out the specific features of each board that we need to make sure we have tests for.
The Platform team maintain a list of benchmarks at Platform/Validation/AbrekTestsuites
Other things being worked on are:
Continuous Integration - https://blueprints.edge.launchpad.net/linaro/+spec/other-linaro-n-continuous-integration
Results display in launch control - https://blueprints.edge.launchpad.net/linaro/+spec/other-linaro-n-test-result-display-in-launch-control
This includes running a subset of the Phoronix benchmarks (see http://www.phoronix-test-suite.com/)
- Automated testing of images in QEMU
- Automated testing of images on supported hardware platforms
Taking each Linaro deliverable in turn, we have:
- Upstream donations are tested in the working group to the standards of the upstream code base
- There's some testing of the monthly consolidation trees (gcc for example), but that is limited to the tests available to that code base
- The 6 monthly baseline testing is a set of manual ad hoc 'volunteer' tests
- As distributions are tending towards taking consolidation trees, we need more automatic testing of those trees. This may be the same testing as the the 6 monthly baseline testing (or a subset)
- Hardware will help, we could dog food the six monthly release if we are all using ARM based platforms in our day to day work
- I like the culture of "we all test"; having testing as "someone else's job" (so tend towards a central team continuously running tests plus creating and supporting tools to allow anyone, anywhere to run some subset of those tests locally).
- Where tests are specific to an upstream project, we should work with and donate code to that project
- Originally, I thought that the silicon members could bring SoC tests into Linaro, I think that that is starting to happen
- What level of testing do the distributions want? A related question is what level of control do they want on the deliverables as they progress through their release cycles which will, by definition, be out of sync with both the Linaro monthly and six monthly releases?
LTP (Linux Test Project, http://ltp.sourceforge.net/ seems interesting; its scope is gcc and the Linux kernel.
OfficeofCTO/Validation (last modified 2011-01-18 12:02:10)