PatchTracking

Summary

We wish to provide a system for tracking the status of patches, and providing metrics on them.

Rationale

Linaro produces a lot of patches and we want to track them in some way.

Some working groups and landing teams are interested in tracking the patches that they have that diverge from upstream, and what the submission status of them is.

In addition the members are interested in patch metrics of how many patches each WG is creating, and how many they are upstreaming.

The platform team would like to be able to track the patches created by the working groups to ensure that as many as feasible are integrated in to the evaluation builds.

User stories

There are many user stories that could be considered:

  • As a kernel developer, I would like to ensure that all patches sent to the mailing list have been identified, action is taken on them, and that they are not overlooked
  • As a toolchain developer, I would like to identify changes in upstream that we do not yet have, and decide whether to ignore them, or backport them to our downstream branch.
  • As a toolchain developer, I would like to identify changes carried in our branch that have not yet gone upstream, and track whether they have been pushed upstream, at which point upstream accepted them, or if they should never go upstream
  • As a project manager, I would like to see metrics such as how many patches are being carried against an upstream project, how long each patch has been carried, and how many patches have/haven't been accepted upstream
  • As a foundations developer, I would like to track fixes for things like cross-compilation against a lot of packages, and whether or not those fixes have gone upstream or not.

We will be focusing on two user stories for the rest of this document:

  • As a Linaro member I would like to know roughly how many patches are being upstreamed by Linaro. (We will call this the "metrics" user story)
  • As a platform developer I would like to see the patches that have been sent upstream by the working groups to ensure that all those appropriate for the release are included in the evaluation builds. (We will call this the "platform" user story)

Requirements

The tool must be as resilient as possible so that it doesn't crash upon encountering adversities, and it should make heavy use of logging to keep information for later debugging. This is important because we'll be basically parsing email messages and it's very likely that we'll encounter unexpected situations.

Metrics

  • Capture each patch sent upstream by Linaro engineers, recording the date they were sent
  • Where possible record when each patch is accepted upstream
  • Attribute patches to the team that they came from
  • Count these numbers per-timeslice (month)
  • Possibly detect when patches are merged upstream
  • Possibly be able to detect different versions of the same patch?

Nice to have:

  • A list of Linaro engineers that submitted patches from an address other than @linaro.org, and the last date on which they did so.

Platform

  • Capture each patch sent upstream by Linaro engineers, recording the date they were sent
  • Identify the package that the patch should be in
  • Allow listing the patches that need to be reviewed for inclusion, and marking patches as included/not wanted etc.
  • Attempt to identify automatically whether the patch is already included in the package
  • Do we only want to consider patches accepted upstream?

Completion

How will we know when we are done?

Metrics

We can report the number of patches forwarded and accepted upstream each month by each linaro team.

Design

We assume that all patches are sent to a mailing list related to the project in question and copied to patches@linaro.org.

Metrics

  • We will watch patches@l.o to capture patches sent upstream by Linaro.

  • We will use the mailing list address to where patches were originally sent to identify the project to which they apply.
  • We may also decide to watch the mailing lists to keep track of discussions about patches. (Patchwork already provides this, so it'd be cheap to do and may be beneficial in the future)
  • We will use a mapping of email address->person->team to attribute the patches to a particular team. This will require some overhead of ensuring that all Linaro email addresses are known, but we may be able to automate some of that based on information in Launchpad and/or gpg keys.

  • We will automatically update the state of patches when they are committed to the master branch of the project to which they apply. For all other state transitions (e.g. patch being rejected), the developers themselves will have to change the state of their patches.
  • This information will then be put in a database so that we can make reports with numbers of patches forwarded upstream by-timeslice.

Platform

The patches@linaro.org address allows us to collect every patch sent upstream by Linaro.

We can then store these patches such that they can be listed by an interested platform developer.

The developer can then mark patches as included/not wanted etc. as needed.

In this case we don't have a revision history as the packages are built from tarballs. Also, the patches may be in the debian/ directory which means they only get applied when building the package, so we can't easily detect included patches automatically. One alternative solution suggested by James would be to allow the developers to tell Patchwork a given upstream revision ID that is used in a package and then we can mark all patches committed before that revision ID as present in the package.

Although we expect part of the work done to support the metrics use case to benefit this use case as well, our initial goal is to support only the metrics use case.

Implementation

We will use Patchwork as it provides a significant chunk of the functionality we need. For the things that are not provided by Patchwork we will either extend it (in case the changes make sense to go upstream) or develop separate django apps that are used in conjunction with Patchwork.

Patchwork already provides the following:

  • The concept of projects, so we can group patches based on their projects
  • Ability to parse email messages to extract patches and other important information
  • A relational database where the patches and discussion about them are stored
  • Ability for users to change the state of their own patches using the web UI
  • An XML-RPC client that can read/write data from/to a patchwork instance
  • A way to filter patches of a given project based on people/state

Other things we need that are not currently implemented by Patchwork:

  • Use email addresses (as a fallback to List-ID headers) to look up the project to which a patch applies.
  • The mapping from email address to Person to Linaro team, possibly using a script which periodically scans the members of every subteam of the Linaro team and store these links in the database. There should also be a way to edit existing entries or add new ones via the web UI.

  • Ability to link a git/bzr branch as the master branch of a project
  • A script to frequently scan the master branch of each project and update the state of patches that have been merged
  • A report containing the number of patches forwarded/accepted upstream by each Linaro team. Ideally this report will be generated dynamically so that changes to the Email->Person->Team mapping are reflected here instantaneously.

  • Use Launchpad's login service to authenticate
  • For patches where none of the recipient addresses match that of a Patchwork project we'll assign the patch to the "Other" project and provide a UI for mass-moving patches from one project to another.
  • Make it easy to register a new project using the address of one of the recipients of a patch. (Nice to have)

Detecting when things are merged upstream

In the case of regular patches, Patchwork already has that facility; there's a script (tools/patchwork-update-commits) which scans all the commits on a given git repo (we may need a similar one for bzr?) and updates the state of the patches in a patchwork instance (using pwclient). Since the script works by comparing (normalised) patch hashes, this may not work if the committed changes are not identical to those in the patch stored in patchwork, but devs can always update the state manually in these cases.

In the case of git we could also be dealing with requests to pull from long-lived branches (i.e. a moving target), which would make it tricky to identify when the changes are merged upstream. However, most pull requests seem to include a diff, so if we could make Patchwork store the diff as well as the reference to the git repo to pull from (http://lists.ozlabs.org/pipermail/patchwork/2011-February/000387.html), we could treat them as if they were regular patches.

With bzr everything is much simpler as long as we can assume that devs will be using Launchpad with short-lived branches. That way we can just query LP if a given branch was merged. XXX: Do we need to worry about bzr now or would it be better to concentrate on git and save bzr support for later?

Generating the metrics

In order to generate metrics we'll need an additional db column to track when a patch is committed upstream.

Often a given change results in multiple patches sent to a mailing list; in these cases they will be counted as a single patch. (As long as the patch author uses the Patchwork web UI to bundle them together)

Notes

  • We currently have Linaro patch tracker in LP subscribed to revision notification on several different branches that Linaro works on. This means patches@l.o gets email for every commit (including the ones from non-Linaro engineers) on those branches, and these messages come from noreply@lp.net and have no header indicating who the author/committer was. We can parse the message's body to get that, but that's obviously not very reliable. We need to decide whether we're going to try and abuse patchwork to parse those or if we should have a separate collector script which scans the commits/merge-proposals on Launchpad branches and add them to the same database used by patchwork.

  • We could show the LWN statistics for the kernel to see where Linaro fits in, or perhaps provide them with our email address list so that they can include us directly.
  • There may be a concern about showing the number of patches for each engineer, so just break it down to the team level for now.

Stakeholder review I notes

  • Want a count of patches accepted per-month (rather than tying that to when the patches were submitted)
  • Don't do a per-cycle report, if someone wants that info they can easily calculate it
  • A couple of useful graphs are wanted, but it wasn't specified exactly what they should show
  • Summary info/totals information at the top is important. This should just be a couple of summary pieces of information to give a flavour of the data.
  • Scott should be asked about whether any extra aggregate/comparative information should be displayed for the Landing Teams.
  • Need a solution to the problem of people leaving/moving teams, but implementation simplicity trumps perfect tracking.
    • we should be able to use Launchpad's menbership date-joined to correctly identify the team to assign a given patch based on the time it was submitted and the teams in which the submitter was a member at that time. would need to think carefully about detecting cases when members leave a team and are re-added shortly after.
  • The front page should have the metrics and links to patch lists, perhaps with a link to the user's outstanding patches
  • Exclude non-linaro.org addresses. A private/well-hidden report should be available of the non-linaro.org addresses that recently submitted patches.
  • It's ok to parse and handle the From: pseudo-headers.
  • Totals on the metrics tables are important, on both axes (total per-month and total ever/in display period per-project/per-team).
  • An About page/explanation page/FAQ should be prominent
  • The project management team should be excluded from the report if possible.
  • The Android team should be included.
  • Information on time-to-patch-acceptance averages should be available.

Original Notes

Requirements

  • Track upstream commits - indentify patches that have gone upstream and allow them to be flagged for cherrypicking, or ignoring
  • Track local commits - track local patches to identify whether they are:
    • local only
    • need-upstream
    • already-upstream
  • Track progress of upstreaming patches
  • Capture versions that a patch was originally generated from, and was upstreamed against
  • Provide metrics:
    • How many patches are currently being carried by a working group
    • How long has each patch been carried
    • How many patches have/haven't been submitted upstream

Design

Once complete, the tool should be able to handle projects that track patches sent to a mailing list, or projects that track changeset differences between two branches. There have so far, been no use cases presented that would require a combination of these two things for a single project

The ability to link a patch, or changeset to a bug report should be optional, and would be useful for both scenarios described above.

Tracking kernel patches and pull requests to a mailing list

These features already exist in patchwork today. A patchwork instance should be created for tracking these for each kernel tree.

Few changes should be needed to cover this, just establishing an instance for each kernel, and the mailing list it needs to watch, setup and configuration for patchwork. However, to facilitate metrics, an additional datestamp should be tracked for when a patch is actually accepted so that it's easier to get at that information for reporting.

Tracking differences between upstream/downstream branches

Michael Hope has a tool at http://ex.seabright.co.nz/helpers/patchtrack that currently provides this functionality. In the short term, that is working for them. Longer term solution is to modify patchwork to incorporate the features seen here. Specifically:

  • Monitoring bzr branches for changes
  • Link to a bug report
  • Track additional states:
    • Patch not intended for upstream
    • Patch not yet submitted upstream
    • Patch submitted upstream
    • Patch accepted upstream (and track revision # where it was taken)

For the toolchain working group, bzr will be required, but kernel teams may wish to make use of this feature using git branches. Both support for bzr and git should be implemented.

Tracking a small number of patches against a large number of packages

This is a use case that is fundamentally different from the other use cases described here. Discussion at UDS, and with stakeholders have led to the idea of not trying to implement this using the same tools as the other use cases described here. Instead, what we will try to do to cover this scenario, is use a launchpad project. For instance, if the overall effort is to modify packages for cross compilation, we could create a project in Launchpad for this.

  1. When a package is discovered that does not cross compile correctly, a bug is opened against this cross-compile-mods project
  2. Bugs against this project provide a work-items queue for the overall effort
  3. Bug status against the cross-compile-mods task can be used to provide feedback on the status of providing a fix for that package
  4. Once a good, working patch is available, a new task is opened against upstream to track the status of getting the change pushed upstream

Implementation

To get landing teams up and running quickly

  1. setup an instance of patchwork For landing teams to use

Tracking differences between upstream/downstream branches

  1. Extend model to allow for new project type, and to allow linking to bugs
  2. Create scripts to pull target and comparison branches
  3. Create scripts to find differences in branches
  4. Create scripts to update branches periodically, and regenerate comparison
  5. Allow changesets to be targetted to an upstream release (for backports)
  6. Create scripts to poll bug linked to changesets for automatic updates
  7. Extend UI for display of branch comparison projects
  8. Collect and display metrics for how many patches are in target branch, that do not exist in upstream branch, and how many patches have already gone upstream

Tracking a small number of patches against a large number of packages

  1. Create project in launchpad
  2. Ensure bugs exist for each outstanding cross-build problem we know of
  3. Open task for upstream project/package to track status of upstreaming

Stakeholders

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.


CategorySpec

internal/archive/Platform/Infrastructure/Specs/PatchTracking (last modified 2013-08-23 02:05:20)