Summary

Validation Scheduler is a functional part of LAVA, which will give users possibility to remotely schedule test jobs on different boards placed in Linaro validation farm. The scheduling will primarily support a simple queuing of test jobs, and secondarily it will also support time scheduling.

Release Note

The end-user can schedule a test job, follow the status of a test job, cancel it (if s/he is owner of it) and is linked to the test results when the job is finished. Admin users can, for instance, edit device registry by adding, deleting and modifying device information. Admin users can also cancel any scheduled jobs.

Rationale

Users submit test jobs in validation scheduler and admins maintain the functional parts of the scheduler (for instance device registry). The scheduler has its operating domain in providing a user-friendly user interface, scheduling jobs by keeping the jobs queue updated, updating the test job status in real time, communicating with other components in LAVA, and logging the test jobs for future reference.

User stories

  • Dave wants to define and submit a test job manually using web UI and schedule it for one or more specific boards or board types
  • Dave wants to define aand submit test job manually providing a test job definition (JSON) file and schedule it for one or more specific boards or board types
  • Dave wants to fetch a previously executed test job from history and resubmit it for new execution
  • Stephen wants to add a new machine to the hardware pool and adds a new machine description; the new machine will get included in future test scheduling.
  • Zyga adds a new abrek test suite; he adds a new test suite description to ensure that changes/updates to this test suite will be considered for scheduling in the validation farm.
  • Offspring produces a new image/hw_pack; the scheduler will provide an API for test job submission using the new build.
  • Panter adds a new test description; the scheduler will provide an API so that tests in the test description get run with the right images/hwpacks on the right machines.
  • Devman updates an abrek test suite; the scheduler will provide an API so that all test descriptions referring to that test suite get enqueued on the validation scheduler.
  • Andrew wants fixed the master image that caused failure to update test images on a machine class; he manually triggers rerun of all test cases matching that hardware class to verify that everything works now.

Assumptions

  • Validation Scheduler will be integrated with launch-control, the authentication policy for users will be shared and also user interface should be aligned to launch-control.

Design

Validation Scheduler will consist of two functional parts. One part will be an application in Django called Scheduler application, providing web interface to end-users primarily for job submission and follow-up. Another part will be a deamon called Scheduler Daemon, which will take care of actual job execution by usage of LAVA Dispatcher.

End-users will have two roles: normal and admin users. Normal users will be able to:

  • submit a test job for execution,
  • follow the status of a submitted test job,
  • list the test job history containing all jobs,
  • resubmit an old test job from history,
  • cancel their own ongoing jobs.

Admin users will have the same credentials as normal users, and additionally they will be able to:

  • administrate device table, which contains all connected devices,
  • administrate test suite and test case tables, which contains all supported tests,
  • cancel any scheduled test job from any user.

Each submitted test job will be saved in a database table, which will contain all the necessary information about that job.

Scheduler Daemon will fetch test jobs from test job table, start a Dispatcher session for job execution, and save the final job status in the database when finished job execution. The test jobs will be fetched and executed by the Daemon using first-come, first served (FCFS) policy.

Test jobs

Test jobs will be primarily configured by the end-users, which will be provided with a number of different choices in the web UI. Test jobs can be simple, containing execution of one test case suite on one board, or complex, containing execution of many tests. QEMU support will also be available. A test job will, in general, contain following information:

  • Build image and hardware pack
  • Device ID or device type or tag
  • Test suite(s) and test case(s)

Build image will be provided by the available Linaro build variants, and hardware pack shall correspond to the chosen device. Device ID and test cases are defined by the admin users, and contain supported devices and tests available for test job submission.

A test job definition in JSON fomat will be produced and saved in the database when user submits the job, and that definition is used by the Dispatcher as instructions or steps for job execution.

Test job status

Validation Scheduler will provide status information for each submitted test job. Test job status will be saved and updated by the Scheduler in the test job table. The test job status can have following values:

  • Submitted
  • Running
  • Complete
  • Incomplete
  • Canceled

Submitted state will occur when the Scheduler receives submitted job from a user and puts it in the test job table. Scheduler Daemon will pull the jobs from that table and change the state to Running. When the execution is finished without problems, the Daemon will change the state to Complete. If something goes wrong during test job execution, the Dispatcher will report that to the Scheduler Daemon, which changes the state to Incomplete. For instance, if a complex test job consisting of many tests fails, that will generate Incomplete state.

Canceled state will occur when a user cancels a test job. A test job can be canceled if in Submitted or Running state. When canceled in Submitted state, the Scheduler will simply mark it as Canceled. When canceled in Running state, the Scheduler shall send a cancel command to the Daemon, which will then instruct Dispatcher to cancel a job and send confirmation to the Scheduler. After that, the Daemon will change the job state to Canceled.

Test results in Launch Control

When a test job is finished, Validation Scheduler needs to get the test result bundle ID, which will be provided by the Dispatcher. This bundle ID will be used to provide a link in the Scheduler web UI to the test result for that specific test job in Launch Control.

Test job status vs. test results

It is important to understand the distinction between the test job status (used in Scheduler) and final test results (used in Launch Control). A failed test can certainly have a Complete job status. Incomplete, as a job status, just means that the Dispatcher was unable to finish all the steps in the job. For instance, an example would be if we had a test that required an image to be deployed, booted, and a test run on it. If we tried to deploy the image and hit a kernel panic on reboot, that is an incomplete job because it never made it far enough to run the specified test.

Test suites and test cases

Test suite and test case database tables will contain available test suites and cases for execution in the test jobs. Today that information can be obtained from this page: https://wiki.linaro.org/Platform/Validation/AbrekTestsuites. There will be an admin part for updating supported tests via web UI.

Devices

Available devices for job execution will be store in device table. An admin part for updating this table via web UI will be provided. Each device will be assigned a unique ID, type, tag, and hostname. Device status will also be recorded, and following states will be used:

  • Offline
  • Idle
  • Running

The Scheduler Daemon will solely have the mandate to maintain the device status in the database.

Build and hardware pack images

Users will be able to choose which build/hwpack they want to test in a test job. Validation Scheduler will take care of the retrieval of all needed image files from the place they are stored (currently http://snapshots.linaro.org).

Validation Scheduler API

Validation Scheduler will provide an API for test job submissions not only by web interface, but even other programs, machines or command line interface. One user story where this is useful is when offspring produces a new image. Reacting on that event, a program shall be able to define and submit a test job in our validation farm. For other user stories where this is applicable, please see User Stories chapter.

Implementation

LAVA database

LAVA database will contain all the needed tables for scheduling of jobs and keeping lists of supported devices and test cases. The tables are related to Django modules, and will be used by both the Scheduler Django application and Scheduler daemon.

The main modules (tables in the database) in the Scheduler application, are:

  • TestJob - contains all the submitted test jobs and acts as a job queue for the Scheduler daemon

  • Device - contains all the devices available for test execution
  • TestSuite - contains all supported test suites, containing a number of test cases

  • TestCase - contains all supported test cases, which are organized into test suites

TestJob will keep all the submitted test jobs. When a job has been submitted and saved in the table, Status field will be kept updated with the current job status. Submitter field will contain user name, while SubmitTime, StartTime and EndTime will keep track of test job duration. Definition field will contain test job definition in JSON data format. This table will serve as a history of all the submitted jobs, which will be shown in the Scheduler web UI.

Device will contain all the devices available for test execution and device related information, such as device type and tags. Status field will contain current status of each device, and this field will be updated by the Scheduler daemon. Devices will be added to Device table by admins via web UI.

TestSuite and TestCase tables will contain all supported test suites and tests. It will also have a web UI and will be updated by admins for adding supported tests.

Scheduler API

The Scheduler will provide an API for usage by external entities, for instance Hudson. The API will expose some key functionality for handling test jobs:

  • Submit new test job
  • Cancel test job
  • Resubmit test job
  • Get link to test result in Launch Control
  • Retrieve test job history

The API will be based on XML-RPC protocol. An XML-RPC message is sent as an HTTP POST request, the API will process this request and send a response formatted in XML.

UI Changes

Scheduler web UI will have these main views:

  • Jobs view
  • Submit job view
  • Device status view

Jobs view

Jobs view is shown as the index page of Scheduler, and it will list all existing jobs (running, complete, incomplete, etc.). Clicking on a job will provide all important job details, and also options to resubmit or cancel a job. Resubmit will submit a new identical job to the queue. Cancel will just cancel an ongoing job and will be limited to the user who submitted the job or an admin.

The job list shall be sortable and paginated, and there shall be a possibility to show user's own submitted test jobs.

Submit job view

Submit job is the view where users will submit new test jobs. It will provide a number of selectable choices and information fields for users to choose from or fill in by themselves.

Following choices and fields will be provided to users in this view:

  • Select device - predefined list of choices, where users will be able to choose a specific device or device type, or even a device tag.
  • Deploy image - users will be able to select image type (linaro-*, android, ubuntu...), image, hardware pack and kernel.
  • Run tests - here also a predefined list of test suites and test cases to choose from, where more than one test cases will be possible to choose, and also it will be possible to define a reboot between the tests
  • Upload results - users will be able to select where to upload test results, i.e. results server and pathname or stream to send it to.

Another choice in this view will be to reserve a device. In this case it's treated just as a regular job. If the device is busy, the reservation job will wait in a queue until it's ready. The user shall be notified somehow when the device has been reserved via email or some other way.

When user specifies a device type or tag instead of a specific device, the Scheduler will pick the device with the shortest queue when the job is submitted, and assign it for that job.

Device status view

Device status will list all devices in the system and their current state (idle, running, etc.). A click on a device shall provide another view of jobs queued for that device.

Code Changes

Since no previous versions of this software exist, all written code is new, i.e. no code changes will be done.

Test/Demo Plan

Unit test cases will be developed for the developed functions. Also, more complicated test cases will be written and executed to validate complex user stories and scenarios. The demo version of the whole LAVA shall be available from a server in our validation farm.

Unresolved issues

Time scheduling is a part of the Scheduler that is not addressed directly in this specification.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.


CategorySpec

Platform/Validation/Specs/ValidationScheduler (last modified 2011-04-19 15:32:09)