Proposed 11.11 scheduler blueprints
Here I have gathered blueprint proposals for the 11.10 cycle, to be used for further discussion and based on user stories created during LDS.
BP: CLI submit job
I guess this is lava-tool or part of it.
- implement XML-RPC function to receive a json job definition file and return status of submition
- implement and use "validate json definition" function
- implement CLI agent (lava-tool) to take a job file and submit it to scheduler
BP: Scheduler submit job
- fix "submitter" field, use user name from open id
- add "job_id" field to test job table, will be used for storing UUID values
- add support for "tags" field in web UI when submitting jobs
- Can we use device_type as tag?
- Here we have an issue actually, since we need already in this step decide on which board this job shall run. So using device types and tags will only propose how to choose this device, regardless of if it is idle or running. So we need here a clever way to choose a device, of course it would be in first place one which is idle, but if all devices that fullfil the criteria are running, then we maybe could choose one which was used less frequently than others, or similar.
- define priority levels and change "priority" field to choice field
- define usage of "timeout" field and implement according to the definition
- This is about how the timeout value is used, shall we allow for any value to be provided by users, or shall we have predefined values.
- define and implement proper usage of dashboard specific fields like server and stream
- There was a question during LDS session which stream path to use, shall this be created for each user or how it should be handled? Do we need to handle this at all?
- finalize web UI LAF (fix submit page, decide what to show and how)
BP: Daemon multiprocessing
- implement multiprocessing
- implement get next job and spawn new dispatcher instances for each job
- Define the rules regarding to priority of jobs (define priority levels, if not already done) and first come first served. This could be accompanied by an event driven system (maybe using database triggers), which would act each time a board status changes from running to idle. A function could then check if there is any job waiting on such a board to be free, and run this job directly. If there is no job, then this board would just idle and wait on next daemon cycle.
- handle job timeouts
- Do we need a specific job state for this to track, like TIMEOUT or similar?
- handle successfull jobs
- handle unsuccessfull jobs
- This is when something goes wrong in dispatcher (not in the test cases!), do we need some logs to collect and how to show them? Maybe this can be treated differently for admins and normal users: admins are interested in debugging what actually happened, so we might just need to save a log and name it with job id, and normal users maybe should just have a link to resubmit the job again if they wish (which is treated already in another BP). Also, we might be interested to send notifications to users via mail, what do you think about this? It can however be added later.
BP: Create JSON definition
Here we need an algorithm to produce JSON definition according to some predefined rules.
- discuss and define JSON definition algorithm
- Do we need and how to use templates and tags?
- implement create function
BP: Scheduler admin
- show only scheduler tables, not django parts (if not django admin?)
- use same LAF as scheduler app (how important is this?)
- add link to scheduler app from admin
BP: Job history
One idea here is to show, per default, a personal job history if user logs in, otherwise show generic history. Another idea is if it is possible to identify users belonging to working groups and show history for that group as default.
- show history with max XX (how many do we need to show?) jobs per page
- generate and add link to test result in launch control
- fix page browsing
- implement sorting possibility per board, board type and tag (anything to add or remove here?)
BP: Resubmit job
- add resubmit option to web UI for each completed and incompleted job in history
- implement resubmit function to resubmit an identical copy of a job
- implement resubmit function to resubmit an updated copy of a job
BP: Cancel job
- add cancel option to web UI for each not-completed/not-incompleted job in history
- Cancel option should only apply for each user's own test jobs.
- implement cancel function
- We have two cases: (1) cancel submitted (not started) jobs (relatively easy, since the daemon has not started the execution yet); (2) cancel started jobs, which needs implementation in daemon and probably also in dispatcher.
BP: Usage reports
- show usage reports in web UI about which boards are mostly used and how much, for which test suites and cases, which boards are failing more often then others
- Anything to add/remove here?
BP: Time estimates for test jobs
- implement function to calculate time estimate for when each submitted job is expected to start and when it is expected to end
- Expected to start - We need to check the job queue in front of this job and add all time estimates from them to get this time. Sounds as a time consuming task, is this really needed?
- Expected to end - This could be calculated both as average time from previous runs and also as per given timeout time. Maybe a good idea is to track this per dispatcher step.
- Also again, do we need to send any type of notifications to users when their jobs are started and completed, and include links to scheduler/dashboard?
BP: Progress bar for submitted jobs
This could be possible, and most likely we need to have some help from dispatcher, to give us some info for each action of the job, then we could calculate this in a neat way. But I guess this feature goes under "good to have".
Platform/Validation/11.11SchedulerBlueprints (last modified 2011-05-18 23:45:48)