EC2 Best Practices and Known Issues

Jenkins-based Build Services

Jenkins Known Issues with EC2

On page like https://android-build.linaro.org/jenkins/computer/ , slave types are encoded in <select> by the AMI they use. That means that if there're 2 different slave types using the same AMI, then that page effectively cannot distinguish between them.

Jenkins master 3-volume setup

One of the issue we experienced with Jenkins master is that build archive may overflow disk, and Jenkins doesn't have space to store its current state, like keep track of started build slaves. That causes it to go into vicious circle: it doesn't see existing slaves, so proceeds to start a new one, but can't record that fact, so goes on to start another one. This leads to "zombie slave storms", limited only by instance caps and easily can lead to 20-50 instances spawned in vein until caught.

The solution to this problem is to keep Jenkins build archive (jobs/ subdirectory of JENKINS_HOME) on a separate partition, so even if it overflow, JENKINS_HOME partition is not affected and Jenkins doesn't lose control of its build slaves. So, following mount scheme was established:

  • / - System (OS) volume, 8GB. We keep Jenkins separate from it to ease OS upgrades (OS partitions change, Jenkins partitions stay)
  • /mnt2 - Jenkins home volume 2Gb (Jenkins files actually take <500MB). /var/lib/jenkins symlinks to /mnt2/jenkins (not mounted there directly by historical and maintenance reasons).

  • /mnt2/jenkins/jobs - Jenkins jobs volume (100's of GBs)

One issue with such setup is that, besides jobs/builds archive, we also usually have auxiliary data on master which potentially may grow unbound, e.g. mirrors of repositories/tarballs. Thus, they can't be kept on Jenkins home volume. To avoid volume count proliferation, they would fit on Jenkins jobs volume, except that that volume has individual job directories on the top level (symlinks to jobs/ dir didn't work, so we can't organize it like Jenkins home volume).

So, there's no elegant solution to this problem, but following practical way was used: on the top level of jobs volume, _extra directory was created, with any auxiliary directories to be put under it, and then symlinked from the appropriate places. For example, on android-build we have:

$ ls -l /mnt2
total 20
drwxr-x--x 16 jenkins adm   4096 2012-05-08 11:03 jenkins
drwx------  2 root    root 16384 2012-05-04 19:42 lost+found
lrwxrwxrwx  1 root    root    24 2012-05-04 08:42 seed -> jenkins/jobs/_extra/seed

Platform/Systems/EC2BestPractices (last modified 2014-06-24 17:47:44)