Splitting the archiveuploader out of Launchpad

We want to break the archive uploader out of LP so that it can be used by Vostok as well.

The uploader uses way too much logic from LP model classes, which makes it tricky to split it out of Launchpad, be it as a standalone service or as a library.

As a standalone service

Even though we could break the logic needed by the uploader out of model classes, moving it into a separate service means we'd need to duplicate the model classes themselves in the uploader service. And making sure the models are kept in sync with the parent service's implementation (either Launchpad or Vostok) would be very painful.

We could avoid that by splitting out just the subset of the uploader that doesn't need DB access and make a standalone service out of that, but then we'd probably end up duplicating the rest later, when the time comes for us to extend Vostok's uploader.

Also, some of the code used in the uploader is used in the web UI as well, which makes it even more difficult to try and break it into a separate service. The bits used in the web UI and the uploader could be splitted into a library, though.

As a library

We don't want to duplicate the model-class logic used by the uploader in Vostok, so we need to find a way of moving them into components that can be used by both Vostok and Launchpad.

We could move that logic into ORM-agnostic adapters for model classes, so that they work with either Vostok's (possibly) Django-ORM models and Launchpad's Storm models. For this we'd need our model classes to provide high-level APIs for interacting with the DB, which would be used by the adapters. In theory that could even allow us to use a different DB schema than the one used by Soyuz, but I don't think that's a good idea.

Unanswered questions

Answering the following questions should help in deciding what bits of the archive uploader we need to split into a separate service/library.

  • What are the parts of the uploader that we need in Vostok now?
  • What are the parts that we expect to need in Vostok in the future?
  • What are the parts that we're sure we won't ever need in Vostok?

Launchpad bits used by the archive uploader

These are the model bits used by the uploader:

    - BinaryPackagePublishingHistory.binarypackagerelease
    - SourcePackagePublishingHistory.sourcepackagerelease
    - GPGHandler.getVerifiedSignatureResilient()
    - GPGKeySet.getByFingerprint()
    - Pocket.name
    - LibraryFileAliasSet.create()
    - Component.name
    - SourceFileMixin.is_orig
    - Section.name
    - BinaryPackageBuildSet.getByBuildID()
    - BinaryPackageBuild.createBinaryPackageRelease()
    - LibraryFileContent.md5
    - LibraryFileAlias.content
    - SourcePackageRecipeBuildSource.getById()

    - DistributionSet
        .__getitem__()
        .getByName()

    - ArchiveSet
        .getByDistroPurpose()
        .get()

    - SourcePackageRecipeBuild
        .buildstate
        .upload_log

    - PersonSet
        .getByName()
        .getByEmail()
        .ensurePerson()

    - GPGKey
        .active
        .owner

    - SectionSet
        .__getitem__()
        .__iter__()

    - ComponentSet
        .__getitem__()
        .__iter__()

    - Person
        .preferredemail
        .name

    - SourcePackageNameSet
        .queryByName()
        .getOrCreateByName()

    - BinaryPackageNameSet
        .getOrCreateByName()
        .queryByName()

    - Distribution
        .getDistroSeriesAndPocket()
        .getFileByName()
        .getArchiveByComponent()
        .main_archive

    - PackageDiff
        .from_source
        .title

    - PackageUpload
        .addSource()
        .addCustom()
        .acceptFromUploader()
        .setRejected()
        .setUnapproved()
        .pocket
        .builds
        .sources
        .sourcepackagerelease

    - SourcePackageRelease
        .getBuildByArch()
        .createBuild()
        .requestDiffTo()
        .creator
        .version
        .title
        .name

    - BinaryPackageRelease
        .addFile()
        .version
        .title

    - DistroArchSeries
        .getReleasedPackages()
        .architecturetag

    - Archive
        .checkUpload()
        .purpose
        .is_copy
        .private
        .displayname

    - DistroSeries
        .createUploadedSourcePackageRelease()
        .__getitem__()
        .getPublishedReleases()
        .isSourcePackageFormatPermitted()
        .getQueueItems()
        .createQueueEntry()
        .architecturecount
        .nominatedarchindep
        .main_archive
        .architectures
        .distribution

If we decide to split it as a library, here's a high level plan of how we could do it.

  1. Refactor all the methods/properties listed above, moving any ORM-specific bits into helper methods.
  2. Move all the methods/properties above into adapters for the methods' old interfaces.
  3. Test these new adapters using test doubles that don't require a DB, but make the test cases configurable so that we can run them against real implementations of the interface that is adapted.

Other things to consider

  • need error reporting utility (currently uses webapp.ErrorReportingUtility)

  • uses a bunch of DB enums (e.g. SeriesStatus and PackagePublishingPocket)

  • uses copy_and_close() from librarian.utils, but that function has nothing librarian-specific, so could be moved somewhere else.
  • SourcePackageRecipeUploadPolicy shouldn't live here but it does because we rely on uploadpolicy.py being imported for the registration of policies to happen.

  • the process_upload.py script lives outside of archiveuploader/, but it just calls UploadProcessor.

internal/archive/Platform/Infrastructure/SplittingArchiveUploaderOutOfLP (last modified 2013-08-23 02:02:35)