Real-world vitest · pytest infrastructure
Real-world vitest · pytest infrastructure
Test infrastructure isn't built in one shot. One project had no tests at the start either, and brought in infrastructure one area at a time only after regressions broke production a few times. This post records the shape and intent of the vitest (admin) and pytest (python-backend) infrastructure added between 2026-04-26 and 2026-04-27.
1. What lives where
| Service | Library | Location | Run |
|---|---|---|---|
| frontend/web-app | vitest (4.1.5) | vitest.config.ts at root + src/**/*.test.ts |
pnpm vitest run |
| frontend/admin | vitest (4.1.5) — added 2026-04-27 (2026-05-01: 9 files / 44 tests) | same as above | pnpm test |
| frontend/cms-app | vitest (4.1.5) — added 2026-05-01 (3 files / 45 tests: cms·metadata·markdown) | same as above (environment: node) |
pnpm test |
| frontend/food-app | vitest (4.1.5) — added 2026-04-25 (6 files / 29 tests: sort·food·useFoodStore·sortStore·sourceStore·exportFoods) | same as above + Tauri mock pattern | pnpm test |
| frontend/language-app | vitest (4.1.5) + jsdom — added 2026-05-01 (1 file / 15 tests: utils·logger console spy) | same as above (environment: jsdom) |
pnpm test |
| backend/python-backend | pytest (9.x) — added 2026-04-27 | pyproject.toml [dependency-groups].dev + tests/ |
uv sync --group dev && uv run pytest tests/ |
playwright e2e-dev is separate (each frontend's playwright.dev.config.ts). vitest's exclude lists **/tests/e2e-dev/** explicitly.
2. Shape of the pytest setup
pyproject.toml:
[dependency-groups]
dev = [
"pytest>=8.3.0",
"pytest-asyncio>=0.24.0",
"pytest-mock>=3.14.0",
"pytest-httpx>=0.30.0",
]
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
addopts = "-v --tb=short"
tests/conftest.py:
import pytest
from unittest.mock import MagicMock
@pytest.fixture
def mock_db(monkeypatch):
db = MagicMock()
db.fetch_all.return_value = []
db.fetch_one_and_commit.return_value = (1,)
monkeypatch.setattr("crawlers.product_scheduler.get_db", lambda _: db)
return db
The monkeypatch target is the import name in the calling module. We replace what crawlers.product_scheduler imported via from db_connection import get_db, not db_connection.get_db itself. Routers follow the same rule — patch routers.web_app.product.get_db.
3. The vitest hoisting trap
vitest hoists vi.mock() to the top of the file. So this code throws ReferenceError.
const mockQuery = vi.fn(); // evaluated after the hoisted vi.mock
vi.mock("@/lib/db", () => ({ pool: { query: mockQuery } }));
vi.hoisted() lifts it together and resolves it.
const { mockQuery } = vi.hoisted(() => ({ mockQuery: vi.fn() }));
vi.mock("@/lib/db", () => ({ pool: { query: mockQuery } }));
This pattern shows up in admin's audit.test.ts and points/actions.test.ts. Server actions almost always need to replace exports of the host module, so hoisted is essential.
4. What we picked as test targets
The new infrastructure was filled in by these priorities.
1. Regression-as-test — keep already-discovered BUGs from coming back.
test_product_scheduler.py— BUG #5 regression (phantomstorestable). Asserts thatstoresandregion_nmappear in the SQL string.test_product_crawler.py— BUG #6~#8 regression (column mapping + missing NOT NULLregion_cd). Inspects the INSERT columns and the parameter tuple as-is.audit.test.ts— BUG #3 regression (actor null whenrequestis missing). Verifies thenext/headers cookies()fallback path.
2. Specification-as-test — places where the external promise (response keys, validation rules) must not break.
test_product_router.py—/api/product/findcoderesponse keys (name·region·area).points/actions.test.ts—updateBalancerejects amount=0 and caps reason at 30 chars.
3. Unit utilities — pure functions with clear input → output.
common.test.ts—formatPrice·formatDateTime·sanitizeText· mention regex.i18n/sync.test.ts— empty key set difference betweenko.jsonanden.json.cms-app/markdown.test.ts—generateSlug·addHeadingAnchors·highlightCode(language alias · unsupported language original preservation · HTML entity restoration) ·renderMarkdown(GFM table · XSS sanitize) 18 tests.language-app/utils.test.ts—cn·truncateText·shuffleArray(Fisher–Yates non-destructive) ·logger(console.log/warn/error/debugspy + DEV guard) 15 tests.
4. External integration helpers (env mock + global stub) — wrappers around side effects like fetch/Tauri.
admin/blog-revalidate.test.ts—vi.stubEnvforBLOG_REVALIDATE_URL·SECRETbranching +vi.stubGlobal('fetch', vi.fn())for 200/401/network failure. 6 tests.food-app/exportFoods.test.ts—@tauri-apps/plugin-dialog.save·plugin-fs.writeTextFile·sonner.toast4 mocks. User cancel (save → null) · normal export · error branching. 5 tests.
Filling just these three or four buckets grew to 89/89 PASS in less than half a day. As of 2026-05-01, cumulative: admin 44 + cms-app 45 + food-app 29 + language-app 15 + web-app 26+ = 159+ tests.
5. What we deliberately left out
Container integration tests — stage 1 is enough with mocks. Tests that need a real DB are deferred until testcontainers is in.
E2E UI scenarios — playwright e2e-dev's territory. Run separately from vitest/pytest.
APScheduler behavior itself — sidestepped with a fixture that disables lifespan. Cron is isolated from the dev DB, so we don't add separate verification.
6. How a regression gets caught — one example
When BUG #5 was first found, only crawlers/product_scheduler.py got fixed. Days later, the same bug (the stores table) was still living in crawlers/product_crawler.py. At that point we added two more things.
1. Added SQL string verification in tests/test_product_crawler.py
assert "stores" in sql
2. Created scripts/sql_column_audit.py
Extract raw SQL from routers/+crawlers/ → diff against information_schema.columns
The latter caught yet another phantom table (order_tracking_urls and order_tracking_history). One line of regression automation pulled in a counter-bug discovery.
7. What to touch next
- TestClient + pgvector — integration tests for the vector search router. Only meaningful when LM Studio is running in dev, so a separate fixture flag is needed.
- playwright e2e-dev → CI — currently depends on the host dev compose. CI needs docker-in-docker or dedicated service containers.
- Benchmark / load tests — appropriateness of the rate limiter thresholds themselves. slowapi's token bucket suits load tests more than unit tests.
- desktop-app backend JUnit 5 — Spring Boot's
MessageService·MessageRepository·MessageCleanupSchedulerstill lack unit tests.@DataJpaTest+ Testcontainers postgres for native query (findRecentThreads) regression blocking has high value. - food-app/language-app component tests — starting with simple display components like
ItemCard·HistoryListwhen introducing@testing-library/react+ jsdom. Lifecycle and event handler regression defense. - mutation testing — Stryker · pytest-mutpy. Measuring whether the current pass rate reflects actual defect detection. If the mutation score of 159 tests is below 50%, it signals many ineffective assertions.
Closing thoughts
Test infrastructure isn't a one-shot job. Once one area settles, the next area's shape comes into view. Filling spots where regressions hit twice produces the most value in the shortest time.
warragon rounds 6~9 case studies (2026-05-04)
After the vitest 159 snapshot, rounds 6~9 grew the suite to 1,226 (frontend vitest 439 + e2e 334 + java @Test 217 + python pytest 197 + MCP 39). How the infrastructure decisions evolved:
testcontainers compile-only gate — CI cost of 30 ControllerTests in one round
Round 8 added 30 da2ari-api ControllerTests (R8-A1~A4). All inherit AbstractIntegrationTest (PG 17 + 21 supabase migrations, ~5 min boot if all run). Booting 30 containers per PR is unrealistic. Decision: PR gate is ./gradlew :da2ari-api:compileTestJava (compile only); full execution defers to nightly CI or local environments.
// MockMvc smoke rule — passes on 200/401/4xx without seed data
private void assertRouted(int s) {
assertTrue(s >= 200 && s < 600, "routing abnormal status=" + s);
}
The 5xx-block rule is the only PROD gate. 200-payload validation goes into a separate round (R7-B1) and only for read-only public endpoints.
pytest monkeypatch — APScheduler 17-job idempotency simulation
Round 6-A3's tests/test_scheduler_jobs.py mocks DB calls and validates idempotency for 17 jobs in one file:
def _mock_db(monkeypatch, module_path: str, fetch_all_default=None):
db = MagicMock()
db.fetch_all.return_value = fetch_all_default or []
db.execute_query.return_value = True
monkeypatch.setattr(f"{module_path}.get_db", lambda *_a, **_kw: db)
return db
def test_price_alerts_two_calls_db_interaction_exactly_doubled(monkeypatch):
db = _mock_db(monkeypatch, "schedulers.price_alert_checker", fetch_all_default=[])
from schedulers.price_alert_checker import check_price_alerts
check_price_alerts()
first = db.fetch_all.call_count
check_price_alerts()
assert db.fetch_all.call_count == first * 2 # blocks regression of WHERE NOT EXISTS
UPSERT / ON CONFLICT / WHERE NOT EXISTS patterns validated by mock call counts. Idempotency without a real DB.
sed → tsc cycle — 51-file bulk migration SOP
Round 8-D-jwks verifyJwt → verifyJwtAsync migration (commit aa91c142):
- sed: bulk function-name + await-injection
- tsc --noEmit: catches top-level-await errors and Promise vs JwtPayload mismatches
- fix: the 3 files tsc flagged need sync → async signature changes (await propagates to callers)
- re-run + vitest validation
This cycle migrates 134 call sites without errors. Use the compiler as the first line of regression detection.
Reproducible 1,226-test counts via grep
From round 6 the cumulative counts are always reproducible by grep:
# frontend vitest
find frontend/{da2ari,admin,pryzeet,dmddksl}/src -name '*.test.ts*' \
-not -path '*/node_modules/*' | xargs grep -hE '^\s*(it|test)\(' | wc -l
# Java @Test
find backend/java-backend -name '*Test.java' -path '*/src/test/*' \
| xargs grep -hc '@Test' | awk '{s+=$1} END {print s}'
# Python pytest
cd backend/python-backend && uv run pytest --collect-only -q | tail -3
Declared vs measured count gap is a regression signal. The rounds 1~5 declared 585 vs round 6 measured 1,011 gap was explained by (a) MCP manuals split, (b) per-round additions missed, (c) double-counting fixed.
Next
- testcontainers
- vitest-philosophy
See Vitest official, pytest official, pytest-asyncio, and pytest-mock.