SQL as the single source of truth
SQL as the single source of truth
Where should the schema's truth live? ORM models, migration files, DDL SQL — there are several candidates. We compare the cumulative-migration model with the single-SQL-file + CREATE IF NOT EXISTS model.
1. The place of migration tools
Representative migration tools:
| Tool | First appeared | Language/ecosystem | Model |
|---|---|---|---|
| Liquibase | 2006 | JVM | XML/YAML/SQL changesets accumulated |
| Flyway | 2010 | JVM | Versioned SQL files accumulated (V1__init.sql) |
| Alembic | early 2010s | Python (Mike Bayer, SQLAlchemy) | Python scripts accumulated |
| Django migrations | 2014 (1.7) | Python/Django | Models → auto-generated + accumulated |
| Rails migrations | 2005 | Ruby | Ruby DSL accumulated |
| TypeORM migrations | 2016 | TS/JS | TS classes accumulated |
| Prisma Migrate | 2020 | TS/JS | Schema file + diff-based SQL generation |
| Drizzle Kit | 2023 | TS | Schema code + diff-based SQL generation |
These all share the same view of truth: "the order of accumulated changes." A meta table tracks which migrations have been applied to a DB (e.g. Flyway's flyway_schema_history).
The other model is the declarative single SQL file. All tables of a domain are written in one SQL file with CREATE TABLE IF NOT EXISTS · CREATE INDEX IF NOT EXISTS, and changes are made by editing the same file. Application is idempotent.
2. The cumulative ALTER model
db/migrations/
├── V1__init.sql # CREATE TABLE users(...)
├── V2__add_email.sql # ALTER TABLE users ADD COLUMN email TEXT
└── V3__index_email.sql # CREATE INDEX idx_users_email ...
Tools record hashes and versions to ensure each file is applied only once. The promise that any environment (dev/staging/prod) reaches the same result through the same order is the strength. The downside is that as files accumulate, bootstrapping a new environment takes longer, and to know a table's current shape we need to merge several files.
3. Declarative single file + IF NOT EXISTS
CREATE TABLE IF NOT EXISTS users (
id BIGSERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
Strengths:
- One file shows the current schema.
- Bootstrap is short (run the file once).
- Intent shows up directly in code review.
Limits:
- Column changes/deletions do not follow automatically. Changes that need
ALTERrequire a separate procedure. - Cross-environment auto-sync is weak — there is no meta table to track changes already applied to a production DB.
- Data migrations (DML) do not fit in the same file.
4. Separating DDL and DML
DDL (schema) and DML (data) are commonly separated.
- DDL is easy to make idempotent (
IF NOT EXISTS). - DML is hard to make idempotent (the same INSERT twice gives a different result), and transaction/rollback policy differs.
- Data seeding usually lives in a separate script (
seed.sql) or fixture.
5. Tradeoffs of migration tools
- Reproducibility — the cumulative model wins. Any environment at any point in time can be brought to the same state.
- Rollback — most cumulative tools support down migrations, but in production a "no down, only forward" policy is common. After data has already been mutated, down is not safe.
- Multi-environment sync — meta-table tracking is the strength.
- Operational complexity — needs tool, CLI, and CI integration.
6. The diff-generation model
Tools like Prisma · Drizzle · atlas look at the difference between "desired state (schema file)" and "current DB" and generate SQL. It is a form that combines the convenience of the declarative model with part of cumulative tracking. The auto-generated ALTERs need review.
7. Where each model fits
Where the declarative single file + IF NOT EXISTS fits:
- Small services in early stages.
- Projects where the schema changes often but environment counts are small.
- "Regenerable" data such as caches and aggregate tables for public data or external APIs.
Where cumulative migrations fit:
- Many production environments (blue/green, region split).
- Places where data loss is a corporate risk.
- Several teams editing the same schema concurrently.
In larger organizations, a compromise where production schemas are managed with cumulative migrations while initial SQL for new domains starts declarative and is moved to the migration tool once it stabilizes is also common.
8. Common pitfalls
Silence of IF NOT EXISTS — if a table already exists, a new definition is ignored even if the column definitions differ. To catch definition drift, separate verification (a schema comparison tool) is needed.
Index name collisions — in PostgreSQL, index names must be unique within the schema. The combination of IF NOT EXISTS and a same-name policy can leave behind an index that differs from intent.
Corruption of a migration tool's meta table — manual edits to flyway_schema_history are dangerous.
Mixing DML with DDL migrations — lock-hold time and transaction semantics differ and can lead to operational incidents.
9. A case — column-level drift
In the declarative single-file model, the most frightening state is "the table name matches but the columns are off." One project nearly let this exact mistake reach production.
The crawler code called INSERT INTO stores (store_id, name, address, tel, region, area, last_crawled_at, ...) while the SSOT for that domain was stores (store_id, store_nm, addr, tel_no, region_cd, region_nm, area_cd, area_nm, updated_at, ...). Even the table names differed, and 6 column mappings were misaligned. It went unnoticed because the route is rarely called in production.
There are two ways to catch this.
① Table-level comparison
Diff between SSOT CREATE TABLE names and pg_tables names
→ Even matching 73:73 counts is not enough (definition diffs are not caught)
② Column-level comparison
Regex-extract columns from raw SQL in code → diff against information_schema.columns
→ Each mismatch is a latent BUG
The second approach is simple but powerful. One project's scripts/sql_column_audit.py is about 200 lines and does:
- Regex out INSERT/UPDATE/SELECT statements and normalize column tokens.
- One query against
information_schema.columnsto build a{table: {col1, col2, ...}}map. - For each SQL statement, print any used column not in the map.
This single tool caught the product_crawler's phantom table and the order_tracking's undefined table at the same time. This is a kind of drift the cumulative ALTER model would not catch — when environments where ALTER ran and where it did not coexist, the migration meta table cannot guarantee truth either.
The core lesson is this. The declarative SSOT model does not guarantee consistency by definition alone. There are two strands both claiming to be truth — code and SSOT — and the moment they diverge, which is truth depends on who checks. There needs to be a column-level automated audit for it to mean anything.
10. A case — collision between FK constraints and TRUNCATE
In the declarative SSOT model, adding a new FK is common. A few days after one project added the product_prices.store_id → stores.store_id FK, we found that the data_crawler normalization had been silently failing every hour.
ERROR: cannot truncate a table referenced in a foreign key constraint
DETAIL: Table "product_prices" references "stores".
HINT: Truncate table "product_prices" at the same time, or use TRUNCATE ... CASCADE.
refresh_normalized_tables() had been written as TRUNCATE + INSERT to recompute stores from 1.1M raw_items. The pattern broke the moment the FK was introduced. The POST response still returned 200, but the normalization result froze on yesterday's date — the BUG is invisible without checking the error log.
Three fix candidates:
① TRUNCATE ... CASCADE
Wipes children too → unsuitable for accumulating data like product_prices
② Make FK NULLABLE + SET NULL
Preserves child rows / loses parent reference meaning
③ DELETE WHERE NOT IN(current keys) + INSERT ON CONFLICT DO UPDATE
Cleans only parents that disappeared (children clean up via ON DELETE CASCADE)
Surviving parents UPSERT — FK respected
The third is two lines of single SQL and preserves accumulated child data.
Lesson — the moment a master-child relationship is created, a TRUNCATE of the master is an immediately backfillable BUG. Whenever a FK is added to the SSOT, search the codebase for "who wipes the master?".
Closing thoughts
The declarative SSOT model is the simplest starting point for a small team. But by definition alone it is not safe. Putting a single column-audit tool next to it cuts down operational incidents significantly.
Next
- api-handler-pattern
- jobs-apscheduler
See Flyway · Liquibase · Alembic · Prisma Migrate · Drizzle Kit · Atlas · PostgreSQL DDL.