March 23, 2020 · 7 min read
More and more organisations are embracing mono-repositories as it makes dependency management simpler (read more here). However, implementing and maintaining a mono-repository at scale has its challenges. This is due to the fact that building and testing a large repository is no simple feat. At the core is a question of performance: at each commit, we need to re-build and re-test the whole repository. That sounds very expensive, and therefore, a caching mechanism should be introduced. The question then becomes: how do we find out which parts we need to rebuild and retest?
Larger organisations such as Google, Facebook and Twitter have adopted the mono-repository approach as they have the means to maintain such an infrastructure, but unfortunately solutions for smaller organisations are limited. At Tomorrow, we wanted to embrace the mono-repository structure. This post explains why we built brick, our home-made build tool for mono-repositories.
One of our products is electricityMap.
It shows in real-time the origin of electricity used across the world, alongside its carbon footprint.
We have a paid API which provides forecasts enabling anyone to optimise their electricity usage to reduce cost and emissions.
electricityMap is divided into geographical zones that are defined in a definition file called
This definition file contains bounding boxes, data acquisition parameters and much more (I encourage you to take a look at it here).
In our mono-repository, we have a multitude of micro-services, that are represented as folders in our repository. Here are some of them:
api: The backend which hosts our backend API
api/www: The paid API website
contrib: A git submodule which points to the open-source electricityMap frontend
forecasting: The forecasting pipeline used by the API
weather: The weather forecast fetching service used by the forecasting pipeline
We use continuous deployment, which means that each pull request that changes the
zones.json definition will update several systems in one go:
This is in stark contrast with a multi-repository setting, where one would first update and release a new version of the
zones.json definition, then, for each dependent service, submit a pull request to update the
This is quite cumbersome and doesn’t scale well.
Having a single repository enables us to submit a pull request representing a single atomic change.
It enables us to move faster, keep a very modular codebase and increase the amount of code shared.
The issue faced by mono-repositories is that building and testing takes time as all build steps and tests need to be run, caused by the fact that it is not known which changes affect which builds and/or tests. Several tools address this need, like Bazel used at Google, Buck at Facebook or Pants at Twitter (see a longer list here). The problem is that those tools require a steep learning curve for new developers. We wanted to simplify that drastically.
When designing and assessing existing build tools, we looked at the following requirements:
It turns out that buildkit, the system building Docker images, already fulfills a lot of these requirements:
A small CLI called brick was then built in Python.
It searches for a
BUILD.yaml build definition file and then passes all relevant flags to the buildkit build engine.
As an example, here’s the
BUILD.yaml file we use to build and deploy our API website from the
./api folder of our monorepo:
name: www steps: prepare: # This is the docker image that will be used (optional) image: node:10.3 # These are the commands that will be run commands: - yarn # Failing to declare those inputs will cause the `yarn` # command to fail due to missing files inputs: - package.json - yarn.lock build: commands: - yarn lint - yarn build inputs: - static/ - src/ - gatsby-*.js - .eslintrc.js # Outputs will be copied from the image to the host outputs: - public
Running the build is as easy as running
brick build from that folder.
For each stage, if the commands have not changed and the inputs have not changed, then the build results are read from cache (this is done by the docker engine itself, which will cache
Under the hood, what actually happens is that brick generates a temporary Dockerfile that it builds.
Once the build is done,
brick reads the outputs array defined in the
BUILD.yaml and copies the files from the image to the host machine.
A no-op build with ~15 targets (i.e. micro-services) runs in 30seconds on our CI server (n1-standard-2 on Google Cloud Compute Engine).
As inputs and outputs are clearly defined, dependencies can automatically be detected: if an input intersects the output of another build, then that other build will automatically be triggered.
Starting to work on a new target is as simple as running
brick develop, given that the right configuration is added to
develop: command: yarn develop -H 0.0.0.0 ports: - 8000
What happens under the hood is that a
docker run command is issued, re-using the
prepare step to make sure every developer has the same dependencies.
Using our API website as an example, if a new dependency is added to the
package.json file, then
brick develop will automatically re-run the
This avoids the problem of developers having to remember to run
yarn if dependencies have changed.
Furthermore, it guarantees that developers are always working on a consistent system.
brick develop runs in an isolated container, and automatically mounts volumes to the host filesystem based on the inputs declarations.
This ensures that watching file changes works, while still keeping the guarantee that missing inputs will cause errors, forcing the developer to declare them all.
Tests can also be defined in a similar fashion:
test: commands: - yarn test
Deployments can also be cached.
In this example, we define a deployment configuration which copies the output of our build step (the
public folder) to a Google Cloud Bucket.
Secrets are also defined, which will be mounted during the deployment stage.
deploy: commands: - gsutil -m cp -a public-read -r public/* gs://static.electricitymap.org/api secrets: gcloud: src: ~/.config/gcloud target: /root/.config/gcloud
brick is still very experimental, and the documentation sparse. However it is already used internally for all of our builds and tests. If you feel like you want to help us out make it more robust, please reach out on our Slack or join the conversation on Github!