GitHub Docker caching on Self-Hosted Runners

2025-01-11, 7 minutes reading for Software Engineers

TL;DR

Just don’t use it, because of these problems:

GitHub storage network bandwidth is very slow, ours was limited to ~32 MB/s when running from an AWS Self-Hosted Runners[1]., because of this it takes about one minute to upload 1GB of cache, and another minute to download said cache on your next run.
If your dependencies are not cached on the runner, you will be getting a different build each time, meaning that the cache is invalidated for the “install dependencies” layer. Dependency installation variations lead to cache invalidation, which will also invalidate all subsequent layers

Docker custom caching is a feature that allows you to cache the layers of your Docker images to a custom location. This can be useful to speed up the build process of your Docker images.

Prerequisites

Solid Understanding of:

Docker
GitHub Actions

Docker Cache Modes

Docker offers two caching modes max and min:

In min cache mode (the default), only layers that are exported into the resulting image are cached, while in max cache mode, all layers are cached, even those of intermediate steps.

While min cache is typically smaller (which speeds up import/export times, and reduces storage costs), max cache is more likely to get more cache hits. Depending on the complexity and location of your build, you should experiment with both parameters to find the results that work best for you.

In this post, I'm using the max mode.

Implementation Requirements

There are 2 ways to create this:

With bake-action,
Manually caching and restoring using cache action

Due to environment limitation, I to choose the manual option.

The application that I wanted to cache consisted of multiple images that need to be built and tested

Docker image setup

If you want to cache every image, this can be done with cache_to and cache_from options, this command should be invoked like so in your CI:

docker buildx bake 
  -f docker-compose.yml 
  -f docker-compose.dev.yml 
  --load 
  --set *.cache_from=type=local,src=./tmp/.buildx-cache 
  --set *.cache_to=type=local,dest=./tmp/.buildx-cache

However, if you want control over which services to cache, then include these in your docker-compose file:

cache_from:
  - type=local,src=./tmp/.buildx-cache-restored
cache_to:
  - type=local,dest=./tmp/.buildx-cache,modes=max

View full Dockerfile

services:
  frontend:
    build:
      target: builder
      cache_from:
        - type=local,src=./tmp/.buildx-cache-restored
      cache_to:
        - type=local,dest=./tmp/.buildx-cache,modes=max
    command: npm run dev
    ...
  api:
    build:
      args:
        - NODE_ENV=development
      target: builder
      cache_from:
        - type=local,src=./tmp/.buildx-cache-restored
      cache_to:
        - type=local,dest=./tmp/.buildx-cache,modes=max
    ...

Workflow file

Required GitHub Action setup

Setup the Docker buildx action docker/setup-buildx-action@v3
Restore the cache using actions/cache/restore@v4
Rename the cache directory if cache hit
Create directories for cache if no cache hit, because docker complains if the directory does not exist
Build the images using docker buildx bake -f docker-compose.yml --load --provenance=false
Save the cache using actions/cache/save@v4
Clean-up the cache directories

View full workflow file

name: Docker Caching

on:
  pull_request:

jobs:
  tests:
    runs-on: self-hosted # your runner name here
    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - name: Set up Docker build cache
        id: cache-docker-layer
        uses: actions/cache/restore@v4
        with:
          path: ./tmp/.buildx-cache
          key: ${{ runner.os }}-docker-${{ github.sha }}
          restore-keys: |
            ${{ runner.os }}-docker-

      - name: move cache to buildx-cache-restored
        if: steps.cache-docker-layer.outputs.cache-hit == 'true'
        run: mv ./tmp/.buildx-cache ./tmp/.buildx-cache-restored || true

      - name: ls ./tmp/.buildx-cache-restored
        if: steps.cache-docker-layer.outputs.cache-hit == 'true'
        run: ls ./tmp/.buildx-cache-restored || true

      - name: create dirs for cache if no cache hit
        if: steps.cache-docker-layer.outputs.cache-hit == 'false'
        run: |
          mkdir -p ./tmp/.buildx-cache-restored || true
          chmod -R 777 ./tmp/.buildx-cache-restored || true
          mkdir -p ./tmp/.buildx-cache || true
          chmod -R 777 ./tmp/.buildx-cache || true

      - name: Build images 
        # using multiple -f docker-compose to add and overwrote the base docker-compose
        run: docker buildx bake -f docker-compose.base.yml -f docker-compose.e2e.yml -f  docker-compose.ci.yml --load --provenance=false

      - name: Run tests here
        run: echo "Running tests..."

      - name: ls ./tmp/.buildx-cache
        run: ls ./tmp/.buildx-cache

      - name: Cache Docker layers
        if: always()
        uses: actions/cache/save@v4
        with:
          path: ./tmp/.buildx-cache
          key: ${{ runner.os }}-docker-${{ github.sha }}

      - name: Cleanup containers
        if: always()
        run: |
          rm -rf ./tmp/.buildx-cache || true
          rm -rf ./tmp/.buildx-cache-restored || true

Network Speed Analysis

The transfer speed on cache hit for ~1GB...

Network Speed GitHub Runner vs Self-Hosted Runners

As you can see from the image above, the Network Performance:

GitHub-hosted averaged runners: ~120 MB/s (1 Gbps)
Self-hosted runners: ~32 MB/s[2].

Impact: Cache hits on self-hosted runners have a transfer speed that is 1/4 that of GitHub runners.

Build Time Analysis

Original without docker caching: 7 mins
With docker caching:
- On first run: 11-12 mins
- Subsequent runs:
  - Expected: 4-5min (if application dependencies were cached)
  - Reality:
    - Worst case: 11-12 mins (Dependency installation variations)
    - Best case: ~5mins (when the stars align)

Despite only caching ~1GB worth of layers on GitHub, the runtime was hit and miss(pun intended).

I suspect this is because the application dependencies(such as react) are not cached on the runner, each build will download a different variation of dependencies(there is no guarantee that react 17.1.2 that was downloaded 3 hours ago is the same as react 17.1.2 one downloaded today unless there is a hash), meaning that the cache is invalidated for the “npm install” layer, this will also invalidate all subsequent layers.

However, Seeing my local builds were always faster, I decided to reuse a runner instead of letting it be get recycled[3], this way I would be depending on docker default caching, doing so decreased run time by more than 50% to only 2-3 mins.

Given that this job is trigger many times a day, the mean time will approach 3 mins without any manual caching, which is good enough. However runner reuse comes with security risk, which I ultimately abandoned.

I haven't looked into why default docker caching produced more consistent builds compared to explicit caching, i.e. what guarantees are made that the install dependencies step gets reused. This would be interesting to investigate in the future.

O Keeper of Memory, keep my cache always warm

FIN

Just don't use it. I offer two alternative:

Runner reuse(warm cache) is more effective than relying on GitHub caching for this use case, if your security team allows it.
If you still want to do caching on Self-Hosted runners, you should look into GitHub Actions Cache Server

This probably varies by region, cloud provider, whether it goes over the public internet and other factors, you might get better results if you are using Azure based runners↩

Sometimes it just stalls according to this discussion on GitHub community↩

Not a good practise, runners should be ephemeral↩