Skip to content

hardbyte/postgresql-job-queue-benchmarking

Repository files navigation

postgresql-job-queue-benchmarking

A benchmarking harness for comparing PostgreSQL-backed job queue systems under realistic, long-horizon workloads.

The goal is a fair, reproducible, public-API-only comparison of how different queue libraries behave when you push them past warm-up — focusing on the things that show up in production: latency tail, throughput stability, table bloat, and recovery from chaos.

What the latest run found

Eight Postgres-backed queues, same hardware, same harness. Three contracts in the lineup — event bus, job queue, visibility-timeout queue — so the throughput list isn't a single ranking. The 2026-05-09 sweep has the per-cell numbers, chaos behaviour, and bloat resistance.

Peak throughput by queue contract

Tail latency at each system's peak throughput

Headline comparisons from that run:

  • Peak clean throughput: pgque 39.9 k jobs/s in single-consumer event-bus mode; awa 14.2 k as the fastest full job queue; pgmq 11.3 k as a visibility-timeout queue before anti-scaling at higher worker counts.
  • Chaos recovery: awa, pgque, and river recover from every scenario. The other five adapters either hit zero or fail to produce recovery samples in at least one chaos cell.
  • Bloat / pressure: five adapters time out under at least one sustained-pressure cell; only awa, oban, and pgque complete all four pressure scenarios.
System Contract Chaos recovery Pressure cells Notable caveat
awa job queue 5/5 4/4 Full job-queue feature surface; fastest job queue in this run.
pgque event/message bus 5/5 4/4 Single-consumer mode; batched success ack is a different contract.
river job queue 5/5 2/4 Times out in two sustained-pressure cells.
oban job queue 4/5 4/4 Handles pressure cells but has lower throughput in this run.
pg-boss job queue 3/5 2/4 Postgres-level chaos exits the worker; times out in two pressure cells.
absurd job queue 3/5 2/4 Shutdown timeout under pressure.
procrastinate job queue 3/5 2/4 Weak repeated-kill recovery; times out in two pressure cells.
pgmq visibility-timeout queue 3/5 2/4 Anti-scales past 16 workers and has the active-readers cliff.

Feature comparison

Throughput is one shape of the question. The other shape is what each system actually gives you. This table captures the documented feature surface — things you'd reach for in real applications. Cells reflect what's available out of the box on the default open-source distribution.

awa Absurd pg-boss pgmq pgque Oban Procrastinate River
Language / runtime Rust + Python Python Node.js Postgres extension (Rust core) Postgres extension (PL/pgSQL) Elixir Python Go
Postgres extension required no no no yes1 optional2 no no no
Producer surface — bulk insert 3
Storage shape on hot path append-only + receipt ring row-mutating row-mutating partitioned archive append-only + ticker row-mutating row-mutating row-mutating
Priorities 4
Retries with backoff 5
Cron / scheduled jobs 6
Dead-letter queue 7 8 9 10 10
Unique jobs / dedup 11
Rate limiting per queue 12 13 14
Callbacks / external waits 15 16
Web UI for ops 17 18 19 20

Dashes indicate "not provided as a documented feature out of the box", not "impossible". pgmq / pgque in particular are intentionally minimal — you build the worker, you choose the lifecycle. If you spot something wrong, please open a PR — corrections welcome from the maintainers of any of the systems listed.

What's in the lineup

Each system maps onto one of three application contracts.

Job queues — send a job, a worker runs it, the queue tracks retries and dead-lettering: awa, pg-boss, river, oban, absurd, procrastinate.

Visibility-timeout queue — pgmq. Send / read with timeout / ack-or-redeliver. No per-job retry counter, no scheduling, no DLQ beyond an archive table.

Event/message bus — pgque (PgQ lineage). Append-only event log, ticker forms batch boundaries, multiple consumer groups each track a cursor over the shared log (upstream calls it Kafka-shaped). pgque also runs as a single-consumer competing-consumers queue, which is how this bench drives it: one consumer per replica, --worker-count controls in-flight handler concurrency within that consumer.

System Contract Peak (jobs/s) At
pgque (single-consumer mode) event bus 39,898 1×256 w
awa job queue 14,158 1×256 w
pgmq visibility-timeout 11,277 1×16 w
pg-boss job queue 2,387 1×64 w
river job queue 501 1×64 w
absurd job queue 410 1×128 w
oban job queue 284 1×64 w
procrastinate job queue 269 flat

pgque's number is its single-consumer mode; native fan-out across multiple consumer groups isn't exercised here. pgmq peaks at 1×16 w and anti-scales to 3.2 k at 1×256 w (audit).

What pgque trades for the throughput

In the bench's single-consumer mode, pgque competes with the job queues. Two ways it differs from awa and the other five:

  • Feature surface. Default install ships retries with backoff, per-message nack, DLQ. No priorities, no aging, no dedup, no rate limiting, no web UI. Delayed delivery (send_at) is in sql/experimental/.
  • Ack granularity. receive returns a batch and ack(batch_id) finishes the batch in one row update. Failure handling is still per-message via nack(batch_id, msg_id, retry_after, reason). A consumer that crashes mid-batch without acking redoes the whole batch on the next claim.

Whether that fits your workload is workload-specific. Analytics events that are cheap and idempotent are comfortable with batched ack. Long-running side-effecting jobs prefer the per-job ack the six job queues give you.

Earlier reference runs: 2026-05-08 awa vs pgque v2 deep-dive · 2026-05-02 alpha.3 sweep · awa under a 10-minute held writing transaction · awa extended scaling (W=256/512/1024).

Author bias: this repo is owned by the author of awa, one of the systems benchmarked. Numbers are reproducible — re-run on your hardware and check.

Chaos / correctness

Chaos scenarios run inside the same bench.py harness, as named compositions of phase types. Steady-state metrics, wait-event histograms, and per-phase aggregates carry over; the harness also emits jobs_lost and chaos_recovery_time_s into the recovery phase's summary.json.

The headline picture across all eight adapters is in the 2026-05-09 sweep — Phase B (40 cells, 5 scenarios × 8 systems). Three systems recover from every chaos scenario; the other five hit zero on at least one. The per-adapter audits in the same run name the root causes.

The available chaos scenarios are documented in docs/method.md. The cross-system chaos tracker is #12.

Adapters

  • awa (Rust + Python) — 2026-05-09 sweep on v0.6.0-alpha.9.
  • Absurd (Python)
  • Oban (Elixir)
  • pg-boss (Node.js)
  • pgmq (Postgres extension; Python adapter; needs an extension-bearing image, run separately from the shared-image matrix)
  • PgQue (plain SQL — no extension required; Python adapter; pg_cron optional, the harness runs the ticker + maint loops in-process instead)
  • Procrastinate (Python)
  • River (Go)

Design principles

  • Public APIs only. Each adapter integrates the system the way a real consumer would. No reaching into internal modules, no privileged SQL.
  • Subprocess contract. Adapters are language-agnostic processes that emit one JSON sample per line on stdout. Adding a new system means writing one binary that respects the contract — see CONTRIBUTING_ADAPTERS.md.
  • One Postgres for everyone. All systems run against the same postgres:18.3-alpine instance with the same postgres.conf — no per-system tuning advantage. (pgmq is the exception; it requires the Postgres extension and runs on a separate pg18-pgmq image.) The compose default caps Postgres at 4 CPUs for repeatable laptop and CI runs; set POSTGRES_CPUS=N when measuring a larger machine envelope.
  • Long-horizon. Bloat and latency drift only show up after the first few minutes. Default scenarios run 30+ minutes.

Quick start

# Init the pgque submodule (vendored at a pinned upstream SHA)
git submodule update --init --recursive

# Bring up Postgres (port 15555 by default)
docker compose up -d postgres

# Run a 5-minute smoke against one system
uv run bench run \
  --systems procrastinate \
  --producer-rate 200 \
  --worker-count 4 \
  --replicas 1 \
  --phase warmup=warmup:30s \
  --phase clean=clean:5m

Outputs land under results/<run-id>/<system>/ as manifest.json + summary.json + per-sample samples.ndjson. To compare runs:

uv run bench compare results/<run-id>

Method reference

Scenarios, phase types, and Postgres-side diagnostics (wait events, notification queue usage, active transactions) are documented in docs/method.md.

Repo layout

bench_harness/        # orchestrator, sample contract, comparison/plot
                      # tooling — independent of any specific SUT
tests/                # pytest suite for the harness itself
<system>-bench/       # one directory per system-under-test, each
                      # producing a binary that talks the JSON contract
docker-compose.yml    # shared Postgres + sidecars
postgres.conf         # shared tuning (work_mem, autovacuum, etc.)
bench.py              # main CLI: run | combine | compare

Contributing a system

See CONTRIBUTING_ADAPTERS.md for the JSON contract and an end-to-end walk-through.

License

MIT — see LICENSE.

Footnotes

  1. pgmq can also be installed as SQL, but the benchmark and the common packaged distribution use the pgmq Postgres extension.

  2. pgque itself is PL/pgSQL. pg_cron is needed for the convenience pgque.start() ticker; callers may drive the ticker themselves instead.

  3. River's fast bulk path uses the Postgres COPY protocol.

  4. awa priorities include aging so lower-priority work is eventually promoted.

  5. pgmq is a visibility-timeout queue: redelivery is controlled by the visibility timeout rather than a job-framework retry policy with counted attempts and backoff.

  6. pgque supports delayed visibility, but not cron-style periodic scheduling.

  7. awa DLQ routing is opt-in via dlq_enabled_by_default or a per-queue override.

  8. pg-boss keeps failed/expired job history rather than exposing a separate DLQ queue abstraction.

  9. pgmq archives messages into queue-specific archive tables; that is retention/replay storage rather than a job-framework DLQ policy.

  10. Oban and Procrastinate retain exhausted failures in discarded/failed states rather than moving them to a separate queue table. 2

  11. pg-boss deduplication is expressed through singleton keys and singleton windows.

  12. pg-boss rate limiting is exposed as throttling.

  13. Oban OSS supports local queue limits; global rate limiting is an Oban Pro feature.

  14. Procrastinate can limit concurrency with locks/queueing policy, but does not expose a named per-queue rate-limit primitive.

  15. Absurd models external waits as durable workflow steps rather than queue-level callbacks.

  16. pg-boss exposes job lifecycle events/subscriptions rather than durable external-wait callbacks.

  17. awa includes the awa serve ops UI.

  18. pg-boss has third-party dashboards such as pgboss-dashboard, not an official bundled UI.

  19. Oban Web is part of Oban Pro.

  20. Procrastinate has community/third-party admin surfaces rather than a bundled official UI.

About

Benchmarking harness comparing PostgreSQL-backed job queue systems under realistic, long-horizon workloads. Public-API-only adapters; reproducible.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors