Summary

Provide remote development board access to Linaro engineers. The boards should be available for exclusive use upon request, with a lease system to avoid starvation if people forget and go on holiday.

Rationale

Linaro engineers occasionally need exclusive access to specific ARM boards; be it to test kernel changes, run benchmarks or other things like that. However, it's obviously not practical to give all supported boards to every enginner that might need them, so we'll instead have a pool of boards that Linaro engineers can access remotely.

User stories

  • As a developer I want exclusive access to a specific board as soon as it's available so that I can debug board-specific issues
  • As kernel developer I wish to boot the board with a particular kernel that I just built so that I can test/debug it
  • As a platform engineer I want a board running a specific hwpack and rootfs so that I can test an image type other than the default one

Design

LAVA will grow the ability to lease boards to Linaro engineers upon request; these requests will be made either via a web UI or a command line interface and the engineers will be notified once the board is ready for remote access, via ssh. For the duration of the lease, the board will be accessible only to the engineer who requested it, and it will also be possible to extend the current lease, up to a certain limit, if needed.

The boards used for remote access may be dedicated, leased from the validation pool, or a combination of the two. If we end up using boards from the validation pool, we need to be careful and make sure we don't starve the validation jobs.

Implementation

The following needs to be implemented in LAVA:

  • The LAVA scheduler, to keep track of board availability (this is being worked on already)
  • A web interface/command line for requesting that a specific board be setup and reserved for the logged in user once it becomes available
  • A web interface reporting information about the lease (e.g. whether or not the board is ready, how much time is left, who it is leased to, etc) and allowing users to extend/terminate the lease
    • This *may* involve the scheduler being able to collect info from the running job while it is active.
  • A dispatcher action to set up the board for remote access so that the requester can login there
  • Ability for the dispatcher job to notify both the user and the scheduler that the board is ready.
    • To notify the user, an email notification would be the obvious choice, but it might be worth considering other alternatives as it may take a long time for the email to be delivered/read, thus wasting resources.
    • To notify the scheduler we'd need to either have the dispatcher job make an API call or intrument the scheduler so that it polls the board (logging in via ssh and checking that it's setup, maybe?) until it's found to be ready. Polling may make more sense as we could keep polling until the end of the lease and report errors?
  • Being able to reboot using a different kernel
    • Do we actually need to allow people to reboot a board that's currently leased to them or would it be enough if we allowed them to specify the kernel they want to use when requesting the lease? Haven't thought about how to implement this yet
  • Data about use of the farm so that we can know whether we need more hardware.

If the scheduler is implemented to execute the dispatcher directly, it will probably consider the job done once the dispatcher returns as that'd make sense when running test suites. In our case, though, our job wouldn't have anything to do after the board is ready for remote login, and although we could change it to sit there idling after the board is ready, it'd make things less robust in the event of failures in the dispatcher or our job itself. Given that, it'd make sense to change the scheduler to not end a lease when the dispatcher returns. Other advantages of this approach are:

  • We wouldn't terminate a lease unless there was another job in the queue
  • It'd be a lot simpler to extend existing leases

There are a few extra things we may want to do in the future, so it's a good idea to keep them in mind when implementing:

  • Requesting a lease for a specific timeframe rather than when-it-becomes-available
  • Ability to conserve a user's "development environment" so that it doesn't have to be recreated on every lease

BoF agenda and discussion

RemoteDevBoards


CategorySpec CategoryTemplate

Platform/Specs/11.11/RemoteDevelopmentBoards (last modified 2011-06-07 11:49:59)