Warning
: wall of text ahead, sorry!
As part of my “why should you vote for me” statement in the technical board elections I wrote that I wanted to improve the Django contribution experience and I specifically called Jenkins out as something I wanted to “improve”. My problems with our current CI setup is this:
- The CI configuration is invisible and versioned outside of Django
- Jenkins build logs are deleted after 24 hours (so annoying!)
- Our build matrix is very small for the number of python/database/os versions we support
- Previewing documentation changes is not easy
A lot of technologies have appeared and matured since our Jenkins setup was born that are significantly easier to use, better integrated with developers workflows and offer a generally better developer experience. Below I’m going to outline a proof of concept that I’ve worked on to unify the local development experience with our CI definitions and bring them all into Django itself, with the aim to run exactly the same test setup locally and on the CI. I believe this would greatly improve the general contribution experience, making everyone happier!
tl;dr
I’m proposing we bring django-docker-box into core and make it the “blessed” way to spin up an environment for running the Django test suite. Following on from that, we can use Github Actions with docker-box to greatly expand our database test coverage and unify local tests with the CI tests. We can then also integrate with a free tool like netlify to build deployment previews of all documentation changes that can be viewed in the browser.
You can view a proof of concept of all this here, with the complete set of changes discussed below:
Along with a documentation preview for that PR:
The specifics
Integrating docker-box
Running a full Django test suite has never been super simple. It’s fairly easy to run the sqlite
tests, and we bundle a simple settings file for that, but the moment you want to make any more complex changes (DB specific, cache backends, multi-db, etc) you’re on your own. I initially created django-docker-box
as a proof of concept for moving off Jenkins and onto Travis CI but I found that simply being able to run the test suite on any supported python/database version without needing to install anything was insanely useful, and so it became it’s own thing.
Django wasn’t quite ready to include this in core, so it currently lives in a separate repository. This is a shame as at the end of the day it’s just a docker-compose.yml
file, and running this from a separate project directory increases friction and makes the implementation more complex. Most tooling that integrates with docker-compose (like IDEs) also assume docker-compose.yml
lives in the project root rather than driving a project that lives somewhere else.
So, I’d like to move this into core. I think it’s proved that it’s useful and I believe that it offers a really nice initial contributor experience. It would offer a much better experience during sprints (the images are surprisingly small) and it would also be simpler to maintain: changes made to Django have to be synced across to django-docker-box (i.e https://github.com/django/django-docker-box/pull/22), and because it lives in a separate tree it’s not simple to use against non-master versions of Django.
Integrating docker-box with Github actions
If we have docker-box inside Django itself it unlocks a pretty powerful way of running tests: exactly the same tests can run locally and in the CI. No more weird jenkins-specific failures, and the CI definitions can be changed as part of a pull request! Conceptually it’s as simple as just running docker-compose run sqlite/postgres/etc
as part of a CI job.
The full-matrix build that used to run as part of the django-docker-box repository has caught a few issues that we’ve missed in our Jenkins tests, often around MariaDB if I recall correctly. An advantage of versioning the CI files in the Django repository is that we can easily adjust the “quick” build matrix (discussed below) if more problems arise with specific databases.
For example we seem to have a test failure on Postgres 11 with PostGis 2.5: https://github.com/orf/django/pull/3/checks?check_run_id=1641111944#step:3:110.
Why github actions and not Jenkins?
We could just run docker-box commands part of the standard Jenkins pipeline we currently have, but I believe that Github actions is a superior product that offers a much richer ecosystem and better integrations with Github itself.
One of the best features is third party workflows. In the POC I’m using a third party action that integrates flake8 and isort errors directly into the PR diff (https://github.com/orf/django/pull/6/files):
We can expand this to automate some nice things: we can welcome new contributors (users with no prior pull requests) and link them to the contribution guide, we can try and ensure that each PR has an associated trac ticket in the title and description, etc etc.
The actual integration
The integration is quite simple. We define a github action that looks like this:
jobs:
sqlite:
runs-on: ubuntu-latest
strategy:
matrix:
python: [ 3.7, 3.8, 3.9 ]
name: sqlite-${{ matrix.python }}
steps:
- uses: actions/checkout@v2
- name: Run tests
uses: ./.github/test-action/
with:
app: ${{github.job}}
python: ${{matrix.python}}
docker_token: ${{ secrets.DOCKER_TOKEN }}
The test-action
basically runs this:
runs:
using: "composite"
steps:
- run: |
docker-compose pull -q --include-deps ${{ inputs.app }} || true
docker-compose build --pull ${{ inputs.app }}
docker-compose run --user=root ${{ inputs.app }} ${{ inputs.test_name }}
[[ ! -z "${{ inputs.DOCKER_TOKEN }}" ]] && (echo "${{ inputs.DOCKER_TOKEN }}" | docker login -u "tomforbes" --password-stdin && docker-compose push ${{ inputs.app }})
shell: bash
Specifically the steps are:
- We
pull
the latest docker image and all associated dependencies (databases etc) - We run
docker-compose
build to include any changes. If the cache is fresh, and no new Python versions are released, this should be a no-op - We then run the test suite (in the case above,
sqlite
). - We then push the built docker image to the registry, to be used by future builds. This should only happen on master builds for obvious reasons.
MacOS and Windows
Windows and MacOS builds don’t support Docker, so they just run with a series of steps like so:
non_docker_test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ windows-latest, macos-latest ]
python: [ 3.8 ]
name: sqlite-${{ matrix.python }}-${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- run: brew install libmemcached
if: ${{ matrix.os == 'macos-latest' }}
- run: pip install -q wheel pip
- run: pip install -q -r tests/requirements/py3.txt -e .
- run: python tests/runtests.py -v1
Slow vs quick builds
The full build matrix includes over 60 jobs which might be excessive. I think we should add ~20 or so tests that are run on every change, but include the ability to tag
a PR as full-build
. You can do this by commenting @buildbot full-build
, which will automatically add the tag to the PR, or just via the interface. Any push to a MR that includes the tag full-build
will run the full build matrix of 60 jobs. We could even make it more specific by having commands like @buildbot full-postgres
.
This would all be automated via a Github action, which lets you respond to events like comments and tags in pull requests.
Automated documentation builds
Adding the netlify integration to the repository and adding a netlify.toml
file to the root of the repository gives us free deploy previews of our docs:
[build]
publish = "_build/html/"
command = "make html"
base = "docs/"
[context.deploy-preview.environment]
PYTHON_VERSION = "3.7"
Each build takes about 3 minutes and includes a bunch of automated caching.
Problems
Oracle
Yeah. Oracle. Urgh. The major issue with supporting Oracle is that the docker image requires authentication to pull and this poses big issues around security (sensitive secrets are not injected into non-master builds). Even if we could work around this the startup time for an oracle container is absolutely ridiculous (like, hours).
If we want to support Oracle builds in the CI then I think we need to create a custom django-oracle
image that inherits from the standard upstream Oracle image. The image would need to pre-initialize the database to avoid long startup times and be hosted on a private Docker hub repository that is only available to the Django CI runners (Oracle do not like you distributing the image publicly). However, for now, we could just keep Jenkins around for the Oracle integration .
Permissions
Github actions require containers to run as root
which messes with 3 permission-based tests. I’ve currently just added @unittest.skipIf(os.getuid() == 0)
to them, but something better needs to be worked out.
Maxmind GeoIP database
Maxmind recently put all their previously public (and free) GeoIP databases behind a service that requires authentication. This is a problem similar to the Oracle one above - it’s going to be hard to work out how to include these in the docker image in an automated way.
Final thoughts
I really think this idea has legs, and the PoC was surprisingly easy to get mostly working. If feedback here is favourable then I can spend some more time working on this and an associated DEP.