Scheduled jobs and APScheduler

Periodic tasks show up in any backend. Nightly aggregates, external data collection, expired-token cleanup. At small scale, cron or an in-process scheduler is enough; at larger scale, distributed queues and workers appear.

1. About APScheduler

APScheduler is a Python library started by Alex Grönholm and reportedly first published around 2008 — a long-standing project. It is an in-process scheduler that bundles cron expressions, intervals, and date triggers behind a single API.

Trigger	Meaning
`cron`	Cron expressions like "every day at 03:00."
`interval`	Fixed intervals like "every 30 seconds."
`date`	A one-shot at a specific time.

There are several scheduler types — BlockingScheduler (occupies the main thread), BackgroundScheduler (its own thread), AsyncIOScheduler (event loop), and others. Job stores can be memory, SQLAlchemy, MongoDB, or Redis, allowing job restoration after restarts.

2. Triggers and jobs

from apscheduler.schedulers.background import BackgroundScheduler

sched = BackgroundScheduler(timezone='Asia/Seoul')

@sched.scheduled_job('cron', hour=3, minute=0, id='daily-aggregate')
def daily_aggregate():
    ...

@sched.scheduled_job('interval', seconds=30, id='heartbeat')
def heartbeat():
    ...

sched.start()

Specifying an id prevents duplicate registration of the same job (replace_existing=True). Even when code changes while state is preserved in the job store, the same id updates it in place.

3. Single-instance assumption and idempotency

When several processes share the same job store, APScheduler does not automatically guarantee job distribution (a separate distributed lock is needed). Running on a single worker is the simplest assumption. When multiple workers are needed, choose one of:

Job store + DB row locks so only one worker runs at a time.
Redis-based distributed lock (algorithms like Redlock).
An operational convention of "the scheduler is on only on one instance."

By design, assume the same job may run twice and write the body idempotently (the same input gives the same result, or a safe no-op).

4. misfire and coalesce

Policies for how to handle scheduled times that pass while the worker is down.

misfire_grace_time — running this late at most is still OK.
coalesce — whether to collapse accumulated executions into one.

The defaults are conservative, and explicit configuration is recommended so that accumulated executions do not strain the system.

5. Other tools

Tool	First appeared	Model
cron (Unix)	1975	OS-level scheduler. The simplest.
systemd timers	2010s	systemd's cron replacement. Single host.
APScheduler	2008	Python in-process.
Celery	2009, Ask Solem	Python distributed task queue (broker: RabbitMQ/Redis). `celery beat` schedules.
RQ	2012	Python + Redis. Simpler than Celery.
Sidekiq	2012, Mike Perham	Ruby + Redis. A standard for large-scale operation.
BullMQ	2018 (formerly Bull)	Node + Redis.
Quartz	2001	Long-standing JVM scheduler. Standard Spring integration.
Temporal	2019 (Cadence fork)	Workflow engine. State, retries, and timers are first-class.
AWS EventBridge / GCP Cloud Scheduler	late 2010s	Managed cron.

One axis of choice is "must execution be distributed across hosts?" If distribution is needed, the queue model; if a single host is enough, in-process schedulers work.

6. Combining with FastAPI

from contextlib import asynccontextmanager
from fastapi import FastAPI
from apscheduler.schedulers.asyncio import AsyncIOScheduler

sched = AsyncIOScheduler()

@asynccontextmanager
async def lifespan(app: FastAPI):
    sched.start()
    yield
    sched.shutdown(wait=False)

app = FastAPI(lifespan=lifespan)

The lifespan event aligns the scheduler's lifetime with the app's.

7. Guards in job bodies

def daily_aggregate():
    if already_done(date.today()):
        return
    do_work()
    mark_done(date.today())

This shape is the starting point of idempotency. A simple DB flag guarantees that "waking up twice for the same date does the work only once."

8. Distributed locks (Redis)

When several instances run the scheduler together and we want at most one job running at a time, the Redis SET key NX EX <ttl> pattern is common. The Redlock debate kicked off by Martin Kleppmann's article and Redis author antirez's response (2016) is well known. When strong consistency is required, consensus-based tools like ZooKeeper, etcd, or DB row locks are also seen as more appropriate.

9. Common pitfalls

Auto-reload during development — modes like uvicorn --reload can effectively spawn two workers, causing jobs to register twice. Either disable the scheduler in development, or specify replace_existing=True on the job store.

Missing timezone configuration — schedule times get interpreted as UTC and drift from intent. Specify timezone when constructing the scheduler.

Long-running job overlapping the next trigger — the next run triggers before the previous one finishes. Review options like max_instances=1 and coalesce=True.

Distributed lock TTL inverted with work duration — when work runs longer than the lock TTL, another worker can grab the lock and run twice. Measure the work duration distribution and set TTL conservatively.

Closing thoughts

The scheduler is the most efficient stop before bigger systems show up. Starting with one line of APScheduler and just keeping idempotent bodies + the single-instance assumption keeps operational burden very small. When distribution becomes necessary, moving to tools like Celery or Temporal is a natural flow.

typeorm-readonly
crawler-ethics

See APScheduler · APScheduler GitHub · Celery · Sidekiq · Quartz · Temporal · Redlock debate (Martin Kleppmann).

Scheduled jobs and APScheduler

Scheduled jobs and APScheduler

1. About APScheduler

2. Triggers and jobs

3. Single-instance assumption and idempotency

4. misfire and coalesce

5. Other tools

6. Combining with FastAPI

7. Guards in job bodies

8. Distributed locks (Redis)

9. Common pitfalls

Closing thoughts

Next

Back to backend