Real-world vitest · pytest infrastructure

Test infrastructure isn't built in one shot. One project had no tests at the start either, and brought in infrastructure one area at a time only after regressions broke production a few times. This post records the shape and intent of the vitest (admin) and pytest (python-backend) infrastructure added between 2026-04-26 and 2026-04-27.

1. What lives where

Service	Library	Location	Run
frontend/web-app	vitest (4.1.5)	`vitest.config.ts` at root + `src/*/.test.ts`	`pnpm vitest run`
frontend/admin	vitest (4.1.5) — added 2026-04-27 (2026-05-01: 9 files / 44 tests)	same as above	`pnpm test`
frontend/cms-app	vitest (4.1.5) — added 2026-05-01 (3 files / 45 tests: cms·metadata·markdown)	same as above (`environment: node`)	`pnpm test`
frontend/food-app	vitest (4.1.5) — added 2026-04-25 (6 files / 29 tests: sort·food·useFoodStore·sortStore·sourceStore·exportFoods)	same as above + Tauri mock pattern	`pnpm test`
frontend/language-app	vitest (4.1.5) + jsdom — added 2026-05-01 (1 file / 15 tests: utils·logger console spy)	same as above (`environment: jsdom`)	`pnpm test`
backend/python-backend	pytest (9.x) — added 2026-04-27	`pyproject.toml [dependency-groups].dev` + `tests/`	`uv sync --group dev && uv run pytest tests/`

playwright e2e-dev is separate (each frontend's playwright.dev.config.ts). vitest's exclude lists **/tests/e2e-dev/** explicitly.

2. Shape of the pytest setup

pyproject.toml:

[dependency-groups]
dev = [
    "pytest>=8.3.0",
    "pytest-asyncio>=0.24.0",
    "pytest-mock>=3.14.0",
    "pytest-httpx>=0.30.0",
]

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
addopts = "-v --tb=short"

tests/conftest.py:

import pytest
from unittest.mock import MagicMock

@pytest.fixture
def mock_db(monkeypatch):
    db = MagicMock()
    db.fetch_all.return_value = []
    db.fetch_one_and_commit.return_value = (1,)
    monkeypatch.setattr("crawlers.product_scheduler.get_db", lambda _: db)
    return db

The monkeypatch target is the import name in the calling module. We replace what crawlers.product_scheduler imported via from db_connection import get_db, not db_connection.get_db itself. Routers follow the same rule — patch routers.web_app.product.get_db.

3. The vitest hoisting trap

vitest hoists vi.mock() to the top of the file. So this code throws ReferenceError.

const mockQuery = vi.fn();              // evaluated after the hoisted vi.mock
vi.mock("@/lib/db", () => ({ pool: { query: mockQuery } }));

vi.hoisted() lifts it together and resolves it.

const { mockQuery } = vi.hoisted(() => ({ mockQuery: vi.fn() }));
vi.mock("@/lib/db", () => ({ pool: { query: mockQuery } }));

This pattern shows up in admin's audit.test.ts and points/actions.test.ts. Server actions almost always need to replace exports of the host module, so hoisted is essential.

4. What we picked as test targets

The new infrastructure was filled in by these priorities.

1. Regression-as-test — keep already-discovered BUGs from coming back.

test_product_scheduler.py — BUG #5 regression (phantom stores table). Asserts that stores and region_nm appear in the SQL string.
test_product_crawler.py — BUG #6~#8 regression (column mapping + missing NOT NULL region_cd). Inspects the INSERT columns and the parameter tuple as-is.
audit.test.ts — BUG #3 regression (actor null when request is missing). Verifies the next/headers cookies() fallback path.

2. Specification-as-test — places where the external promise (response keys, validation rules) must not break.

test_product_router.py — /api/product/findcode response keys (name · region · area).
points/actions.test.ts — updateBalance rejects amount=0 and caps reason at 30 chars.

3. Unit utilities — pure functions with clear input → output.

common.test.ts — formatPrice · formatDateTime · sanitizeText · mention regex.
i18n/sync.test.ts — empty key set difference between ko.json and en.json.
cms-app/markdown.test.ts — generateSlug · addHeadingAnchors · highlightCode (language alias · unsupported language original preservation · HTML entity restoration) · renderMarkdown (GFM table · XSS sanitize) 18 tests.
language-app/utils.test.ts — cn · truncateText · shuffleArray (Fisher–Yates non-destructive) · logger (console.log/warn/error/debug spy + DEV guard) 15 tests.

4. External integration helpers (env mock + global stub) — wrappers around side effects like fetch/Tauri.

admin/blog-revalidate.test.ts — vi.stubEnv for BLOG_REVALIDATE_URL·SECRET branching + vi.stubGlobal('fetch', vi.fn()) for 200/401/network failure. 6 tests.
food-app/exportFoods.test.ts — @tauri-apps/plugin-dialog.save · plugin-fs.writeTextFile · sonner.toast 4 mocks. User cancel (save → null) · normal export · error branching. 5 tests.

Filling just these three or four buckets grew to 89/89 PASS in less than half a day. As of 2026-05-01, cumulative: admin 44 + cms-app 45 + food-app 29 + language-app 15 + web-app 26+ = 159+ tests.

5. What we deliberately left out

Container integration tests — stage 1 is enough with mocks. Tests that need a real DB are deferred until testcontainers is in.

E2E UI scenarios — playwright e2e-dev's territory. Run separately from vitest/pytest.

APScheduler behavior itself — sidestepped with a fixture that disables lifespan. Cron is isolated from the dev DB, so we don't add separate verification.

6. How a regression gets caught — one example

When BUG #5 was first found, only crawlers/product_scheduler.py got fixed. Days later, the same bug (the stores table) was still living in crawlers/product_crawler.py. At that point we added two more things.

1. Added SQL string verification in tests/test_product_crawler.py
   assert "stores" in sql

2. Created scripts/sql_column_audit.py
   Extract raw SQL from routers/+crawlers/ → diff against information_schema.columns

The latter caught yet another phantom table (order_tracking_urls and order_tracking_history). One line of regression automation pulled in a counter-bug discovery.

7. What to touch next

TestClient + pgvector — integration tests for the vector search router. Only meaningful when LM Studio is running in dev, so a separate fixture flag is needed.
playwright e2e-dev → CI — currently depends on the host dev compose. CI needs docker-in-docker or dedicated service containers.
Benchmark / load tests — appropriateness of the rate limiter thresholds themselves. slowapi's token bucket suits load tests more than unit tests.
desktop-app backend JUnit 5 — Spring Boot's MessageService · MessageRepository · MessageCleanupScheduler still lack unit tests. @DataJpaTest + Testcontainers postgres for native query (findRecentThreads) regression blocking has high value.
food-app/language-app component tests — starting with simple display components like ItemCard · HistoryList when introducing @testing-library/react + jsdom. Lifecycle and event handler regression defense.
mutation testing — Stryker · pytest-mutpy. Measuring whether the current pass rate reflects actual defect detection. If the mutation score of 159 tests is below 50%, it signals many ineffective assertions.

Closing thoughts

Test infrastructure isn't a one-shot job. Once one area settles, the next area's shape comes into view. Filling spots where regressions hit twice produces the most value in the shortest time.

warragon rounds 6~9 case studies (2026-05-04)

After the vitest 159 snapshot, rounds 6~9 grew the suite to 1,226 (frontend vitest 439 + e2e 334 + java @Test 217 + python pytest 197 + MCP 39). How the infrastructure decisions evolved:

testcontainers compile-only gate — CI cost of 30 ControllerTests in one round

Round 8 added 30 da2ari-api ControllerTests (R8-A1~A4). All inherit AbstractIntegrationTest (PG 17 + 21 supabase migrations, ~5 min boot if all run). Booting 30 containers per PR is unrealistic. Decision: PR gate is ./gradlew :da2ari-api:compileTestJava (compile only); full execution defers to nightly CI or local environments.

// MockMvc smoke rule — passes on 200/401/4xx without seed data
private void assertRouted(int s) {
    assertTrue(s >= 200 && s < 600, "routing abnormal status=" + s);
}

The 5xx-block rule is the only PROD gate. 200-payload validation goes into a separate round (R7-B1) and only for read-only public endpoints.

pytest monkeypatch — APScheduler 17-job idempotency simulation

Round 6-A3's tests/test_scheduler_jobs.py mocks DB calls and validates idempotency for 17 jobs in one file:

def _mock_db(monkeypatch, module_path: str, fetch_all_default=None):
    db = MagicMock()
    db.fetch_all.return_value = fetch_all_default or []
    db.execute_query.return_value = True
    monkeypatch.setattr(f"{module_path}.get_db", lambda *_a, **_kw: db)
    return db

def test_price_alerts_two_calls_db_interaction_exactly_doubled(monkeypatch):
    db = _mock_db(monkeypatch, "schedulers.price_alert_checker", fetch_all_default=[])
    from schedulers.price_alert_checker import check_price_alerts
    check_price_alerts()
    first = db.fetch_all.call_count
    check_price_alerts()
    assert db.fetch_all.call_count == first * 2  # blocks regression of WHERE NOT EXISTS

UPSERT / ON CONFLICT / WHERE NOT EXISTS patterns validated by mock call counts. Idempotency without a real DB.

sed → tsc cycle — 51-file bulk migration SOP

Round 8-D-jwks verifyJwt → verifyJwtAsync migration (commit aa91c142):

sed: bulk function-name + await-injection
tsc --noEmit: catches top-level-await errors and Promise vs JwtPayload mismatches
fix: the 3 files tsc flagged need sync → async signature changes (await propagates to callers)
re-run + vitest validation

This cycle migrates 134 call sites without errors. Use the compiler as the first line of regression detection.

Reproducible 1,226-test counts via grep

From round 6 the cumulative counts are always reproducible by grep:

# frontend vitest
find frontend/{da2ari,admin,pryzeet,dmddksl}/src -name '*.test.ts*' \
  -not -path '*/node_modules/*' | xargs grep -hE '^\s*(it|test)\(' | wc -l

# Java @Test
find backend/java-backend -name '*Test.java' -path '*/src/test/*' \
  | xargs grep -hc '@Test' | awk '{s+=$1} END {print s}'

# Python pytest
cd backend/python-backend && uv run pytest --collect-only -q | tail -3

Declared vs measured count gap is a regression signal. The rounds 1~5 declared 585 vs round 6 measured 1,011 gap was explained by (a) MCP manuals split, (b) per-round additions missed, (c) double-counting fixed.

testcontainers
vitest-philosophy

See Vitest official, pytest official, pytest-asyncio, and pytest-mock.

Real-world vitest · pytest infrastructure

Real-world vitest · pytest infrastructure

1. What lives where

2. Shape of the pytest setup

3. The vitest hoisting trap

4. What we picked as test targets

5. What we deliberately left out

6. How a regression gets caught — one example

7. What to touch next

Closing thoughts

warragon rounds 6~9 case studies (2026-05-04)

testcontainers compile-only gate — CI cost of 30 ControllerTests in one round

pytest monkeypatch — APScheduler 17-job idempotency simulation

sed → tsc cycle — 51-file bulk migration SOP

Reproducible 1,226-test counts via grep

Next

Back to quality