Why debby ran out of /tmp (and why we self-host CI on a $1k box)

Shared by

Updated May 22, 2026

Debby ran out of /tmp today

This morning our self-hosted GitHub Actions runner — a single AMD Ryzen 9 7940HS box named debby — started failing every Go build with a cascade of "build failed" errors across dozens of packages. The cause wasn't the code. /tmp was 100% full at 31 GiB, and Go's link stage scratches to /tmp before producing test binaries. ENOSPC, so every package turns red.

The culprit was a 28 GiB /tmp/xgo directory: accumulated go-instrument patch cache from our xgo-based test mocking, growing one CI run at a time. systemd's default /tmp is a tmpfs sized at 50% of RAM (31 GiB on this box), so the cache was eating into RAM until builds couldn't allocate scratch.

Live-resized to 90 GiB, persisted with a systemd drop-in at /etc/systemd/system/tmp.mount.d/size.conf:

[Mount]
Options=mode=1777,strictatime,nosuid,nodev,size=90G,nr_inodes=1m

90 GiB is the realistic ceiling on this hardware — RAM (60 GiB) + swap (49 GiB) = 109 GiB total tmpfs backing. Going higher would risk OOM-kills under sustained load.

Why we self-host CI

GitHub-hosted runners are convenient and expensive. At our merge volume, the math has been clear for a while: a ~$1,000 mini-PC pays for itself in a month or two compared to GitHub's per-minute pricing, and after that it's pure savings — hundreds of dollars a month, every month.

The box: AMD Ryzen 9 7940HS, 16 threads, 64 GiB RAM, 1 TB NVMe. One-time hardware, runs on a shelf, draws ~50W.

The unique part: eight concurrent runners on one box

Most self-hosting guides assume one runner per machine, which wastes the hardware. Debby runs eight GitHub Actions runners as separate systemd units (github-runner-debby-1.service through -8), each effectively assigned 2 threads. Eight PRs can have CI running simultaneously on one $1k box.

What makes that work is a shared cache directory at /opt/github-runners/.shared-cache/:

  • toolcache/ — the Go SDK installed once via actions/setup-go, reused by all runners
  • go-mod/ — module downloads (GOMODCACHE) shared across all eight runners
  • go-build/ — compiled artifacts (GOCACHE) shared across all eight runners
  • gopath/GOPATH shared across all eight runners

When runner #6 builds a package, runner #2 doesn't redownload golang.org/x/... or rebuild pkg/services/llm. Cache hit rates approach 100% on hot paths. A test job that takes 4 minutes cold takes seconds warm. Eight runners and one cache is meaningfully different from eight runners and eight caches — the latter is the default and it's a waste of both disk and time.

What maintaining it actually involves

Day-to-day: basically nothing. It just runs. Maintenance shows up as occasional incidents like today's:

  • tmpfs sizing — done above.
  • Patch-cache bloat/tmp/xgo will rebuild on the next run; the persistent Go caches in /opt/github-runners/.shared-cache/ are bounded by Go's own cache GC.
  • Disk pruning — the NVMe is 1 TB, currently 14% used. Plenty of runway.
  • Runner version pinningactions/runner updates a few times a year. Bump the systemd unit, restart, done.
  • Emergency intervention — admin account has sudo for situations like today; a read-only logs user exists for journal inspection without the keys to break anything.

The honest cost of self-hosting isn't the hardware or the electricity. It's owning one more thing that can break in a way GitHub's status page won't tell you about. Today that cost was about 20 minutes — diagnose, remount, persist, write this up.

Cheaper than a year of GitHub-hosted minutes, easily.