Summary

Dispatch test jobs to the target boards in validation farm from server(master node), and publish the test result back to server, called by validation scheduler.

Release Note

No release yet, should not affect end users.

Rationale

Users submit test tasks or jobs to the validation environment, there is an component responsible for getting the jobs done, like how to deliver the job to the target board, monitor the job, and get the result. It focuses on how a single job is finished.

User stories

  • Andrew would like to test booting the latest hwpack with the latest developer image for a specific board. He has a hudson job to insert the url of these images in a pre-written job file, launch the job scheduler with that job control file.
  • Matthew would like to run LTP under abrek on a specific developer image and hwpack version. He constructs a job control file that points to those image artifacts, and run ltp under abrek.
  • Isaac would like to submit results from a test he runs under the dispatcher to a specific stream in the dashboard. The submit_results command in his job control file allows him to specify the stream for results to be submitted to.
  • Miriam submits a job from the LAVA scheduler. The scheduler puts the job in a queue, and the job dispatcher is launched with that job control file when the requested machine becomes available.
  • Dave would like to do regression test, he defines a job file of failed cases on last image, and it will run when a daily release image comes.
  • James would like to run his job as far as possible, if the job can not finish as normal, he wants to know what the root cause and can see the finished test result and logs.
  • Victor would like the image preparation and package installation faster, for his network is not very fast, and he likes a caching mode for reserving necessary packages for future releases validation.
  • Amy would like to run the command which is not defined in abrek, she wants to use some general system commands like free.
  • Ackey would like to validate the kernel and packages which is compiled by her.
  • Mark would like to run an audio recording test for 12 hours, after that, the test can quit and he wants to make sure there is no error during recording.
  • Rebecca would like to know an approximate waiting time and running time for her job.

Assumptions

  • There is an maintained devices list database to determine which board is running job, like boardID associated with serial port number, IP address.
  • The target board can be remotely controlled via serial port.
  • The system will have a console accessible from the serial port after booting.
  • Network is available when board boot up. (for validation tools deployment like abrek).
    • If test image network driver is broken or not available, job dispatcher will switch to master image and ask master image to perform the validation tools and test suites deployment, for master image will maintain a stable network. Master image can call dispatcher deployment functions, using chroot or copying files to test partitions directly. Need to consider if tools/suites deployment is always done by master image.

Design

After users submit jobs, the scheduler places the jobs in a queue for the requested target board. The dispatcher is launched when there is a job in the queue. It picks up the job control file, interprets the actions in it, and performs those actions on the target board. Actions include things like deploying a specific image, controlling uboot on the target to force it to boot to the new image, running tests, and pushing results to the dashboard.

Server dispatcher

Responsible for:

  • receive the job request from scheduler/job message queue
  • parse job files
  • call image-deploy to deploy test images
  • handle dependencies on some specific images
  • deploy validation tools, test suites and required python libraries, like abrek, client dispatcher, just by using a location(e.g. url) to get, switch to master image to call dispatcher to deploy if network is not available
  • start tests or test execution frameworks on the client and get status from them
  • Work with a scheduler to ensure it has the proper job and device status
  • log the serial output when running test suites
  • Gather results and submit them

Implementation

Job description

For job description/file in dispatcher, a job is described in .json file, it consists of the actions with specifying target device, it looks like:

  • {
      "job_name": "foo",
      "target": "panda01",
      "timeout": 18000,
      "actions": [
        {
          "command": "deploy_linaro_image",
          "parameters":
            {
              "rootfs": "http://snapshots.linaro.org/11.05-daily/linaro-developer/20110208/0/images/tar/linaro-n-developer-tar-20110208-0.tar.gz",
              "hwpack": "http://snapshots.linaro.org/11.05-daily/linaro-hwpacks/panda/20110208/0/images/hwpack/hwpack_linaro-panda_20110208-0_armel_supported.tar.gz"
            }
        },
        {
          "command": "install_abrek",
          "parameters":
            {
              "tests": ["ltp"]
            }
        },
        {
          "command": "boot_linaro_image"
        },
        {
          "command": "test_abrek",
          "parameters":
            {
              "test_name": "ltp"
            }
        },
        {
          "command": "submit_results",
          "parameters":
            {
              "server": "http://dashboard.linaro.org",
              "stream": "panda01-ltp"
            }
        }
      ]
    }

Test Result Format

Test result bundle composed of every test case result, for each test case result, it includes basic information of a pass/fail/timeout test case. It is defined in lp:launch-control. If abrek is used to execute tests, then the result bundle in json format can be picked up from a predetermined location in the test image after the test is complete, and the board is rebooted back to the master image. The serial log should be added as an attachment before using lc-tool to submit the results bundle to the dashboard. Here would be an example of the results bundle:

  •         {
              "test_case_id": "test-case-0",
              "result": "pass",
              "message": "Test result Message scrubbed from the log file",
              "timestamp": "2010-09-17T16:34:21Z",
              "log_filename": "testoutput.log",
              "log_lineno": 1,
              "duration" : "1d 1s 1us",
               "attributes": {
                              "attr1": "value1",
                              "attr2": "value2"
                             }
            },

And a test result bundle:

  • {
      "test_runs": [
        {
          "test_results": [
            {
              "test_case_id": "test-case-0", 
              "result": "pass",
              "message": "Test result Message scrubbed from the log file",
              "timestamp": "2010-09-17T16:34:21Z",
              "log_filename": "testoutput.log",
              "log_lineno": 1,
              "duration" : "1d 1s 1us",
               "attributes": {
                              "attr1": "value1",
                              "attr2": "value2"
                             }
            }, 
            {
              "test_case_id": "test-case-1", 
              "result": "fail",
              "message": "Test result Message scrubbed from the log file",
              "timestamp": "2010-09-18T16:34:21Z",
              "log_filename": "testoutput.log",
              "log_lineno": 2,  
              "duration" : "1d 1s 1us",
               "attributes": {
                              "attr1": "value1",
                              "attr2": "value2"
                             }
            }, 
          ], 
          "analyzer_assigned_date": "2010-10-15T22:05:45Z", 
          "time_check_performed": false, 
          "analyzer_assigned_uuid": "00000000-0000-0000-0000-000000000005",
          "test_id": "detailed-test-results"
        }
      ], 
      "format": "Dashboard Bundle Format 1.0"
    }

Error Handler

Error and exception types

There are two types of errors and exceptions: Critical error or non-critical one.

System Error:

  • OSError
    • utils.py, already handled
  • RuntimeError: Critical Error

    • deploy.py when l-m-c fails
    • Critical error, need to return serial log and report deployment error

Errors:

  • NetworkError: Critical Error

    • client.py when network detection fails
    • Critical error, need to return serial log and report, another way is to try manual network configuration if available.
  • pexpect.EOF, pexpect.TIMEOUT:
    • it will issue by pexpect module sometimes, maybe anytime in any related actions
    • If it's critical or not depends on the actions, so it's necessary to find what action issues the two exceptions

Exceptions:

Error Logging

Every thrown error or exception will log to metadata in the summary bundle, in detail, it will record in test_results.message. If there are a series exceptions in an action, all of them will be merged to test_results.message, then it can be displayed on dashboard results.

Supported Actions

Here are the proposed initial set of actions to support in the job dispatcher:

  • boot_linaro_image - use the serial line to force uboot to boot into the test image for the target board/board-type
  • boot_master_image - soft reboot if possible to get a proper shutdown, otherwise detect the hang and hard reboot using remote power control. Default bootable image should be set to the master image, but this command should ensure the we actually get to a valid master image shell prompt.
  • deploy_linaro_image - Takes a hwpack and image tarball as arguments. Uses linaro-media-create to construct an image file from which the boot and rootfs are extracted as tarballs and dropped in a location the master image can get to via http. This image is then deployed to the test partitions on the remote machine via serial control of the master image.
  • install_abrek - Install abrek on the remote machine (network needs to work for this right now), and install the requested tests. It will run after deploy_linaro_image(), taking advantages of master image network, use chroot to install abrek and related test suites to test image.
  • deploy_test_tool - Deploy necessary test tools and python scripts. There may be some maps showing test cases and tools dependencies.
  • test_abrek - runs the test by abrek. Results are dropped in a predetermined location (recommendation: /lava/results)
  • submit_results - reboots to the master image, mounts the test rootfs partition, gets results if they exist, attaches serial log, and submits results to the specified stream on the specified dashboard server.

deploy_linaro_image

For action deploy_linaro_image, there are three internal actions, which are generate_tarballs, deploy_linaro_rootfs and deploy_linaro_bootfs. The latter two actions is for test image deployment.

Before test image deployment happens actually, LAVA needs to generate kernel image and rootfs tarballs from general linaro rootfs binary and hwpack. generate_tarballs makes use of linaro-media-create to build an qemu image file, extracts boot files and rootfs from it and package them to tarballs, places the boot files and rootfs tarballs to a local HTTP server, drops a http URL for master image to get.

For deploy_linaro_rootfs and deploy_linaro_bootfs, from master image, it will get bootfs and rootfs tarballs from HTTP server, then extract them to SD card test image partitions labeled "testrootfs" and "testboot".

install_abrek

For action install_abrek, it can be divided to two parts, environment preparation for chroot and commands execution by chroot.

In environment preparation, master image will share its network configuration files like resolv.conf and apt repository setting to test image.

Then it will continue install abrek by chroot, before checkout abrek source code, test image needs to install bzr, python-apt, python-distutils-extra, finally it will install test suites by abrek command.

Finally, clean up the environment and restore configuration files in tester image

submit_results

For action submit_results, it submits test suites logs to l-c server dashboard. It follows the steps:

  1. reboots to the master image
  2. mounts the test image rootfs partition
  3. gets all bundle log files in /lava/results and transfer them by a simple socket thread class
  4. attaches serial log
  5. submits results to the specified stream on the specified dashboard server.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.


CategorySpec

Platform/Validation/Specs/JobDispatcher (last modified 2011-06-15 10:36:01)