Notes EDU Life Contact

⌕Search⌘K

Get in touch

Send without signing in. Add your email if you'd like a reply.

Leave a message anonymously →
✉ warragon112@gmail.com
KakaoTalk Open Chat ↗

© 2026 codingstairs

Notes
EDU
Life
Contact
Legal
RSS
GitHub

Building public-data crawlers | Coding Stairs | CodingStairs

EDU›Building public-data crawlers

Building public-data crawlers

Build an ethical crawler in six steps with Playwright, http_utils, and APScheduler.

Start with Step 1 →

Difficulty: intermediate
Lessons: 6
Total time: 145 min

Building public-data crawlers

Public data like NPS, DART, and HIRA is accessible to everyone, but automation comes with rules — robots.txt, rate limits, terms of service. Six steps to an ethical and sustainable crawler.

Who it's for

Developers who need more control than portal APIs offer
Anyone who has been blocked by a crawl target
Teams who want incremental collection, schedules, and observability

What you can do afterwards

Separate dynamic pages (Playwright) from static ones (BS4)
Apply robots.txt + rate limit + backoff
Schedule in KST with APScheduler
Combine public APIs, ministry CSVs, and web scraping
Incremental collection, dedup, checkpoints
Healthchecks and failure alerts

Steps

Crawler ethics and legal boundaries — robots.txt · terms · personal data
Static vs dynamic — BS4 + Playwright — pick the right tool
Rate limiting · retries · backoff — exponential + jitter
APScheduler + KST — idempotency · replace_existing=True · double-trigger defence
Incremental collection · deduplication — checkpoints · unique keys · change detection
Observability · alerts — success rate · latency · Slack · PagerDuty

Prerequisites — complete python-data-pipeline.

Lessons

Other courses

Crawler ethics and legal boundaries

20 min

Static vs dynamic — BS4 + Playwright

25 min

Rate limit · retries · backoff

25 min

APScheduler + KST schedules

25 min

Incremental collection · deduplication

25 min

Observability · alerts

25 min

All courses →

Getting Started with a Dev Environment

From HTML/CSS/JS to React, Next.js, Tailwind

Build Your First Fullstack App with Next.js 16

Backend with Spring Boot 4

Python · FastAPI · Data Pipelines

AI-native developer tooling — Claude Code · MCP

Docker · Caddy · Cloud — running on one server

Central admin platform — many domains behind one hub

Local LLM · pgvector · building a RAG chatbot

Tauri 2 — desktop · mobile in one codebase

Testing strategy and quality gates

Web security foundations — JWT · OAuth · OWASP

PostgreSQL in depth + Redis · Kafka

Monorepo · SSOT · layer separation thinking