Richard North’s Blog

Faster builds with highly par­al­lel GitHub Actions

It’s a near-uni­ver­sal truth of con­tin­u­ous in­te­gra­tion builds: no mat­ter how fast they are, they’re just never quite fast enough.

On the Testcontainers pro­ject, we feel this pain par­tic­u­larly acutely. Testcontainers is an in­te­gra­tion test­ing tool, so to test it ac­cu­rately we have to have a wide set of in­te­gra­tion tests. Each in­te­gra­tion test can in­volve pulling Docker im­ages, start­ing up servers and wait­ing for them to be ready. To make things even more dif­fi­cult, we sup­port a fairly eye-wa­ter­ing range of in­te­gra­tions. For ex­am­ple, just for our data­base sup­port we have to test against 14 dif­fer­ent data­bases, rang­ing from lean to … pretty heavy­weight.

This all adds up to a very long build!

As a hum­ble open source pro­ject with­out com­mer­cial back­ing, we’re also keen to keep costs down. This means we rely on the gen­eros­ity of cloud CI providers’ free plans, which are of­ten re­source con­strained.

Build op­ti­mi­sa­tions in the past #

In to­tal, we cur­rently have 51 Gradle sub­pro­jects within our build. A cou­ple of core com­po­nents are de­pended upon by a range of mod­ules, each of which is a Testcontainers im­ple­men­ta­tion for a par­tic­u­lar prod­uct.

Long ago, we man­aged to elim­i­nate some of the build time us­ing the myniva/gradle-s3-build-cache re­mote Gradle cache plu­gin. Every build on our master branch pop­u­lates a glob­ally-read­able Gradle cache, so that CI jobs and de­vel­op­ers work­ing lo­cally do not need to re­build, or retest, un­changed mod­ules. This im­proved our best-case build per­for­mance — if most of the 5 jobs did not in­volve changes, they would be fairly short. But a change to the core mod­ule (that all oth­ers de­pend upon) would cause all 5 jobs to run a full re­build of their mod­ules.

At an­other point in the past we had man­u­ally de­fined 5 sep­a­rate jobs on our pri­mary CircleCI build: core, examples, selenium mod­ule, jdbc mod­ules, and a no-jdbc-no-selenium mod­ule — ba­si­cally every­thing else.

A fairly typical build on CircleCI. It's not as slow as it would be without parallelization, but there's clear room for improvement
A fairly typ­i­cal build on CircleCI. It’s not as slow as it would be with­out par­al­leliza­tion, but there’s clear room for im­prove­ment

These jobs might have been rea­son­ably bal­anced back then, but over time as we’ve added user-con­tributed mod­ules some build jobs have grown. For ex­am­ple, the no-jdbc-no-selenium job was a whop­ping 27 min­utes if a full build was re­quired.

We ob­vi­ously needed to re-bal­ance our par­al­lel build jobs — that is, do a bet­ter job of bin-pack­ing our Gradle sub­pro­jects into a set of par­al­lel jobs. We could do that man­u­ally, right? Well, it might work at first, but it seems in­evitable that we’d end up in a sim­i­lar sit­u­a­tion one day as we add new mod­ules or test du­ra­tions evolve. We quickly re­alised that we could do bet­ter.

Variation leads to waste: even running all five jobs in parallel, the results from the fastest build jobs are essentially meaningless before the longest job completes. While waiting for the longest job to complete we're essentially wasting time
Variation leads to waste: even run­ning all five jobs in par­al­lel, the re­sults from the fastest build jobs are es­sen­tially mean­ing­less be­fore the longest job com­pletes. While wait­ing for the longest job to com­plete we’re es­sen­tially wast­ing time

Leave the bin-pack­ing to a queue #

What if we es­sen­tially adopted queue-based load lev­el­ing: split our build into far more than five jobs, and let the CI ex­ecu­tors com­pete to pick up new jobs as soon as there’s ca­pac­ity? Our bin-pack­ing prob­lem would be solved au­to­mat­i­cally, with­out up-front de­sign.
If we were to cre­ate a dis­tinct build job for each Gradle sub­pro­ject, we could achieve the tight­est bin-pack­ing:

Fine-grained jobs bin-pack far better, even if execution times vary significantly
Fine-grained jobs bin-pack far bet­ter, even if ex­e­cu­tion times vary sig­nif­i­cantly

Iterative im­ple­men­ta­tion #

We chose to try this pat­tern as part of our work to mi­grate our main CI jobs to GitHub Actions.

An ini­tial (hacky!) bash script proved the con­cept. The script gen­er­ated a mas­sive work­flow YAML file with a job that would run the check task in each Gradle sub­pro­ject. In the­ory, this bash script could be run pe­ri­od­i­cally, when­ever a new sub­pro­ject is added to our build. But we could do bet­ter than that!

We quickly it­er­ated on this, ex­ploit­ing an ex­tremely pow­er­ful fea­ture of GitHub Actions work­flows: dy­namic build ma­tri­ces. Simply put, this al­lows one job in a work­flow to pro­gra­mat­i­cally gen­er­ate a ma­trix of pa­ra­me­ters to be run in a sub­se­quent job. The GitHub Actions doc­u­men­ta­tion gives an ex­am­ple.

Our im­ple­men­ta­tion looks a lit­tle like this (summarized):


{% raw %}

name: CI

pull_request: {}
push: { branches: [ master ] }

runs-on: ubuntu-18.04
# Declare our output variable
matrix: ${{ steps.set-matrix.outputs.matrix }}
- id: set-matrix
# The below outputs a JSON array of check tasks for each subproject
# and uses GitHub Actions magic (::set-output) to set an output
# variable
run: |
TASKS=$(./gradlew --no-daemon --parallel -q testMatrix)
echo $TASKS
echo "::set-output name=matrix::{\"gradle_args\":$TASKS}"

# We need the other job's output
needs: find_gradle_jobs
fail-fast: false
# Read the variable, parsing as JSON, so that `matrix` becomes a
# list of check tasks
matrix: ${{ fromJson(needs.find_gradle_jobs.outputs.matrix) }}
runs-on: ubuntu-18.04
- name: Build and test with Gradle (${{matrix.gradle_args}})
# Matrix execution will cause the below to be run many times,
# one for each check task to be run
run: |
./gradlew --no-daemon --continue
--scan --info ${{matrix.gradle_args}}

{% en­draw %}

The testMatrix task is a cus­tom task which emits the list of sub­pro­jects’ check tasks in JSON for­mat, and looks like:

gradle/ci-support.gradle (a sub­set)
task testMatrix {
project.afterEvaluate {
def checkTasks = subprojects.collect {
}.findAll { it != null }

doLast {

Part of the list of build jobs gen­er­ated on-the-fly

Part of the list of build jobs generated on-the-fly

As a re­sult of this small amount of script work, we have an au­to­mat­i­cally gen­er­ated list of jobs for GitHub Actions to ex­e­cute.
This list will never go out of date, be­cause it is based on Gradle’s own view of the sub­pro­jects.

It gets bet­ter #

Testcontainers co-main­tainer Sergei Egorov is a Gradle ma­gi­cian, and de­liv­ered the ic­ing on the cake…

With our new dy­namic ma­trix in place, we’d have a job for every Gradle sub­pro­ject. Many of these might ex­e­cute quickly if they found that a cached re­sult al­ready ex­isted. It’s per­haps a lit­tle waste­ful to have CI jobs that do noth­ing, though.

Sergei quickly re­alised that Gradle’s build cache mech­a­nism al­ready has the abil­ity to de­tect which sub­pro­jects have been mod­i­fied or need to be tested. This could be used by our testMatrix task, to avoid gen­er­at­ing a CI job al­to­gether for un­changed mod­ules. After some amend­ments, the fi­nal testMatrix task works ex­tremely well in pre­vent­ing un­nec­es­sary CI jobs. For ex­am­ple, changes to doc­u­men­ta­tion or leaf-node’ mod­ules can ex­e­cute in a far faster time­frame.

Summary: What have we done? #

The re­sults so far #

In short, we’re see­ing mas­sive im­prove­ments in build times for PRs 🎉

Here’s one ex­am­ple, a de­pen­dency bump in the Localstack mod­ule. Like many PRs, this af­fects a sin­gle mod­ule:

Localstack module CI timings
Localstack mod­ule CI tim­ings

We can see:

As-is, this PR had com­plete feed­back in 5 min­utes — a dras­tic re­duc­tion from the build times we were see­ing pre­vi­ously!

We per­ceived that build times had im­proved, but is this truly the case? Let’s analyse some re­cent builds.

Quantitative analy­sis #

Is there an im­prove­ment upon our orig­i­nal build jobs? #

The plot be­low com­pares build time du­ra­tion be­tween our pre­vi­ous and new CI jobs. As hoped for, and match­ing our sub­jec­tive ex­pe­ri­ences, there is a dra­matic im­prove­ment:

Distribution of build du­ra­tions, in sec­onds
min 25% 50% 75% max
CircleCI 35 615 824 1831 75814
GitHub Actions 79 365 500 1539 2711

Is there a dif­fer­ence in run du­ra­tion be­tween suc­cess­ful and failed builds? #

CI builds have two roles to play: pro­vid­ing as­sur­ance that a PR/commit is re­li­able, and pro­vid­ing a sig­nal when it is not. Clearly we’d like both of these sce­nar­ios to be quick.

Intuitively, PRs with­out many changes will tend to run the fewest tests and suc­ceed more of­ten. This ap­pears to be the case in our data. Builds that go on to fail tend to take longer to do so.

We’re happy that many suc­cess­ful builds com­plete quickly, but less happy that there’s slower feed­back for build fail­ures.

Distribution of build du­ra­tions, in sec­onds

What are the slow­est mod­ules to build? #

Recall that our no-jdbc-no-selenium miscellaneous mod­ules’ build used to be the longest run­ning, at up to 27 min­utes. Having split into more par­al­lel mod­ules, we’ve re­moved this bot­tle­neck on our build per­for­mance, but has the bot­tle­neck shifted else­where?

Analysing our new build du­ra­tions on a per-mod­ule ba­sis we can see that it has in­deed:

Build du­ra­tion of slow­est ten mod­ules, in sec­onds.
du­ra­tion job­name
me­dian count
check (:testcontainers:check) 1123.0 62
check (:mysql:check) 513.0 54
check (:selenium:check) 484.0 53
check (:db2:check) 436.5 54
check (:ongdb:check) 404.0 1
check (:mariadb:check) 343.5 54
check (:presto:check) 331.0 55
check (:cassandra:check) 322.0 53
ad­di­tion­al_checks 311.0 1
check (:docs:examples:junit4:generic:check) 288.0 50

We can see that the :testcontainers­:check build job (which is our core mod­ule) now takes the longest to build, with a me­dian du­ra­tion of around 19 min­utes. This means that our build time for PRs that mod­ify the core mod­ule are still go­ing to take at least this much time, even though the other mod­ules will quickly run in par­al­lel.

Not every PR touches core, but when they do it’s likely to take some time.

We be­lieve this ac­counts for the bulge’ of build du­ra­tions seen in our dis­tri­b­u­tion plots above be­tween 1000-1500 sec­onds — which we’d like to try and re­move.

So, with this in mind, our next steps will fo­cus on the core mod­ule’s test per­for­mance: mak­ing our tests more ef­fi­cient, or split­ting the mod­ule’s tests in a way that helps us run them in par­al­lel.

Conclusion and next steps #

Go forth and par­al­lelise!

← Home