The Place of Single-server Operations
The Place of Single-server Operations
When a service is in its early stage, it is often most reasonable to bring everything up on a single VPS with docker compose. Operational cognitive load is small and costs are predictable. How far this model goes, when it hits its limits, and the next step including k8s.
1. A Typical Single-server + Compose Layout
- One Linux host (Ubuntu, Debian, Rocky, and the like).
- Caddy or nginx terminating on 80 / 443.
- Application, DB, Redis, and queue coexist as containers on the same host.
- Logs and metrics go to the host or a limited external service.
- Regular backups go to a separate disk or object storage.
Vertical scaling (a beefier machine) takes you a long way. A well-built single server has been reported to handle millions of requests per day (depending heavily on the kind of work).
2. Vertical vs Horizontal Scaling
- Vertical (scale up) — a bigger single machine. Simple. Limited by machine spec and a single failure domain.
- Horizontal (scale out) — distributed across multiple machines. Brings availability and scalability. The cost is operational complexity.
Vertical scaling needs almost no code change. Horizontal scaling brings a chain of decisions — shared state, sessions, cache, DB routing, log aggregation, and so on.
3. Operations Tooling
compose.yaml— service definitions plusrestart: unless-stopped.systemd— auto-start Compose itself at boot (a unit likedocker-compose@.service).- Log rotation — Docker log driver options or logrotate.
- Backups — cron-driven
pg_dump/pg_basebackupwith object-storage upload. - Monitoring — Prometheus + Grafana containers, or external SaaS (uptime, APM).
4. The Basics of Availability
Auto-restart — Compose's restart: unless-stopped, systemd's Restart=always.
Healthcheck — container healthcheck plus an external status monitor.
These two alone auto-recover most simple failures. The host itself dying is a separate story.
5. Other Paths
| Tool | First release | Note |
|---|---|---|
| Docker Swarm | 2014–2015 | Docker's own orchestrator. Relatively simple. Active development has slowed. |
| Nomad (HashiCorp) | 2015 | Single binary. Handles non-container jobs too. Medium operational complexity. |
| Kubernetes | 2014 (Google) → CNCF 1.0 (2015) | The de facto standard for container orchestration. Huge ecosystem. |
| Nomad + Consul + Vault | — | A different combination of distributed-systems building blocks. |
It's a common observation that k8s is powerful but cognitively heavy. A small team running k8s itself can spend more time maintaining the cluster than on the actual product. Managed k8s (EKS / GKE / AKS) reduces that burden, but learning and operations costs remain.
6. Where k8s is Overkill
A single host is often enough in cases like:
- Small team (a few people) with few services (< 10).
- Traffic that one host can handle.
- Strict zero-downtime deploys aren't business-critical.
- Limited operations time.
Conversely, when these accumulate, a single host hits its limits:
- A single point of failure is a business risk.
- Traffic must be split across multiple machines.
- Multiple teams want independent deploys.
- Multi-region with latency-sensitive users.
7. Boot Automation Example
# /etc/systemd/system/myapp.service
[Unit]
Description=My App via Docker Compose
After=docker.service
Requires=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/myapp
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
[Install]
WantedBy=multi-user.target
8. A Backup Job (PostgreSQL Example)
# /etc/cron.daily/db-backup
#!/usr/bin/env bash
set -euo pipefail
TS=$(date +%Y%m%d-%H%M)
docker exec db pg_dump -U postgres app | gzip > /backup/db-$TS.sql.gz
find /backup -name 'db-*.sql.gz' -mtime +14 -delete
# offsite upload
rclone copy /backup remote:backup/
What best reduces the risk of data loss isn't the backup itself but a restore rehearsal. Regularly running the procedure to restore on a test host is the safer path.
9. Single-host Zero-downtime Deploy
- Bring up a new container on a different port.
- Once healthchecks pass, switch the proxy (Caddy / nginx) upstream to the new port.
- Stop the old container.
Caddy's dynamic config or traefik's label-based discovery helps with this flow.
10. Common Pitfalls
Single-disk SPOF — without RAID, regular backups, and offsite copies, a single disk failure can wipe everything.
Downtime during host OS updates — kernel updates need a reboot. Pre-announce or migrate temporarily to another host.
Memory limits — as containers multiply, OOM gets frequent. Tune per-container mem_limit and the DB's shared_buffers.
Disk usage from logs and metrics — without log rotation and Prometheus retention settings, the disk fills and the service stops.
Ad-hoc changes on the production host — SSH-ing in, installing a package, then forgetting. Keep changes in code (Compose, Ansible) as much as possible.
Closing thoughts
A single server plus Compose is the most reasonable starting point for a small team. We can spend 95% of the code on business logic, free of the k8s learning curve and infrastructure operations burden. Once the limits start showing, that's the time to climb just one step of horizontal scaling at a time (separate the DB → managed k8s) carefully.
Next
- local-https-mkcert
- cloud-emulator-stack
Refer to the Compose production guide, Docker restart policies, Kubernetes, Nomad, and the PostgreSQL backup docs.