prod-ditto-1 Healthcheck & Capacity Report (2026-05-14)

Shared by

Updated May 21, 2026

prod-ditto-1 Healthcheck & Capacity Report

Date: 2026-05-14 12:25 CEST Host: ditto-prod-1 (Hetzner, 195.201.110.230) Uptime: 50 days

Hardware

Resource Spec
CPU 8× Intel Xeon E3-1275 v5 @ 3.6 GHz
RAM 62 GiB + 4 GiB swap
Disk 464 GB md2 RAID

Current Utilization

Resource Used Free
CPU load 0.10 / 0.08 / 0.09, 99% idle ~99% headroom (≈100×)
RAM 12 GiB used, 0 swap 50 GiB available
Disk 41 GB (10%) 400 GB free

Service Status — all healthy

  • ditto-backend-green (prod, port 3400): 41.6 MB RSS
  • ditto-share-green (prod, port 3410): 20.8 MB RSS
  • caddy: 253 MB, up 51 days
  • postgresql: ~9.7 GiB RSS, 76 backend processes
  • staging / staging-2 / staging-3 / staging-4 backend+share: all active

No errors of concern. 0 status-5xx in the last 24h. 209 "request aborted: unauthorized" entries are stale-JWT clients, not server faults.

Load Tally

Window Requests Rate Unique users /v2/prompt calls
Last 1h 4,452 1.2 req/s 10 65
Last 24h 16,302 0.19 req/s avg 16 DAU 474

Most volume is cheap /prompt/status polling (~1.7/sec, sub-ms, status 200). Actual LLM calls (/api/v2/prompt) are ~20/hour.

Capacity Estimate

Current utilization is <1% of every dimension. Linear extrapolation from 16 DAU = 0.19 req/sec:

Tier Approx DAU Notes
Comfortable ~1,000 DAU (≈60×) Zero config changes needed
Stretch ~10,000 DAU (≈600×) Verify Postgres max_connections + pool sizing; shared_buffers already holds ~10 GiB
Hard hardware ceiling ~20,000+ DAU Depends on prompt-call mix

Real bottleneck before hardware: LLM provider RPM / TPM quotas (Gemini, Claude, OpenAI, etc.), not Ditto infrastructure.

Log Retention — confirmed deleting

  • systemd journal (/etc/systemd/journald.conf.d/ditto.conf): SystemMaxUse=2G, SystemMaxFileSize=500M, MaxRetentionSec=30day, Compress=yes. Currently at the 2 GB cap; oldest entry ~24h back at present volume.
  • Caddy access logs (/opt/ditto/logs/): size-rotated (~6 MB gzipped per file), ~3–4 days of compressed history retained. Total 148 MB.
  • DB backups (/opt/ditto/backups/daily/): 9-day rolling window prod, 8-day staging. Offsite B2 tier: 30 days.

No log-growth risk.

Minor observations

  • mcp GET handlers show duration: 125107 ms — that's the 125-second SSE keepalive timeout, expected, not a stall.
  • /prompt/status polled 6,252 times in 1h for 10 users (≈10/min/user). Cheapest future optimization if traffic ever grows: switch to websocket or SSE push. Today it costs nothing.

Bottom line

This box is sized for ~1000× current load. You will not outgrow the hardware for a long time — LLM provider quotas will bind first. Log retention is configured and working correctly.