The Most Underrated Background Job System in Node.js

Most engineers learn Kafka before they actually need Kafka.

That sentence sounds backwards, but it's the pattern I keep seeing. Teams reach for Kafka to solve problems that a database table and a worker process would have handled fine. I've watched teams introduce Kafka for PDF generation, email sending, invoice creation, Excel imports, and webhook retries — and six months later they're maintaining brokers, partitions, consumer groups, and rebalancing issues for workloads that never needed event streaming in the first place.

For a large category of backend workloads, a job scheduler like Agenda is the simpler, more practical choice. This post explains where Agenda fits, how it works internally, and how it differs from BullMQ and Kafka — and why those three aren't actually competing for the same job.

If you're newer to backend systems, the first two sections build the foundation. If you've shipped distributed systems before, skip ahead to Agenda vs Kafka — that's the part most people get wrong regardless of seniority.

The Problem: Everything Happening in the Request

Imagine a user confirms an invoice. The API needs to:

Save the invoice
Generate a PDF
Create an e-invoice
Call a government API
Generate an e-way bill
Send a notification email
Publish accounting entries

A naive implementation does all of this inside the request lifecycle:

User
  |
  v
POST /invoice
  |
  v
+--------------------+
| Save Invoice       |
| Generate PDF       |
| Generate E-Invoice |
| Generate E-Way Bill|
| Send Email         |
| Create Posting     |
+--------------------+
  |
  v
Response

The response now depends on every downstream system succeeding. If the government API takes 15 seconds, your user waits 15 seconds. If PDF generation fails, the entire request fails — even though the invoice itself was saved correctly. If email sending times out, you've turned a successful business transaction into an error page.

The problem here isn't invoice creation. It's coupling. The request thread is doing work that doesn't need to happen before the user gets a response.

Decoupling the Request

The fix is to do the minimum necessary work synchronously, and defer everything else:

User
  |
  v
POST /invoice
  |
  v
+----------------+
| Save Invoice   |
| Create Job     |
+----------------+
  |
  v
Response (~200ms)

The heavy work moves to a worker that runs independently of the request:

Worker
  |
  v
Pick Up Job
  |
  v
+---------------------+
| Generate PDF        |
| Generate E-Invoice  |
| Generate E-Way Bill |
| Send Email          |
+---------------------+

This is the point where most engineers reach for a queue. The question is which one — and that decision is usually made on reputation rather than fit.

What Agenda Actually Is

Agenda is a MongoDB-backed job scheduler. Conceptually, it's:

MongoDB + Job Definitions + Worker Processes + Retries + Scheduling

When you schedule a job:

await agenda.now("generate-invoice", {
  invoiceId: 123
});

Agenda writes a document into a MongoDB collection:

{
  "name": "generate-invoice",
  "data": { "invoiceId": 123 },
  "nextRunAt": "2026-06-17T10:00:00Z",
  "lockedAt": null
}

A worker process polls that collection on an interval, looking for jobs whose nextRunAt has passed and that aren't currently locked. When it finds one, it atomically locks the document and runs the matching handler.

+-----------+      poll       +-----------+      run        +-------------+
| MongoDB   | --------------> |  Worker   | --------------> | Job Handler |
| (jobs col)|                 |           |                 |             |
+-----------+ <-------------- +-----------+                 +-------------+
                  lock + ack

That's the entire architecture. No broker. No partitions. No cluster coordination. Just a database collection and one or more worker processes reading from it.

Why this matters more than it sounds

For a junior engineer, the appeal here is operational: there's nothing new to run. If your app already talks to MongoDB, Agenda adds zero new infrastructure. You don't provision anything, you don't learn a new ops surface, and failure modes are familiar — it's just another query pattern against a database you already monitor.

For a senior engineer, the appeal is different: it's about matching the consistency model to the actual requirement. Job state lives in the same database (often the same cluster) as the business data the job operates on. That makes certain failure scenarios — like "did this job actually run after the process crashed mid-execution" — much easier to reason about, because you're not coordinating consistency across two different storage systems.

A Concrete Deployment

                 +----------------+
                 |   HTTP API     |
                 +----------------+
                          |
                     create job
                          |
                          v
                 +----------------+
                 |    MongoDB     |
                 |  agendaJobs    |
                 +----------------+
                          |
                    poll & lock
                          |
                          v
                 +----------------+
                 | Agenda Worker  |
                 +----------------+
                          |
                   execute handler
                          |
                          v
                 +----------------+
                 | External       |
                 | Systems        |
                 +----------------+

The API server never executes the job — it only writes a record describing the work. This separation is what gives you resilience for free: if a worker crashes, the API keeps serving traffic, because it was never blocked on the worker in the first place. If the API gets hit with a traffic spike, workers keep draining the queue at their own pace, unaffected by request volume.

This is a small architectural decision with an outsized payoff, and it's the same payoff you'd get from BullMQ or Kafka. The infrastructure differs; the separation of concerns doesn't.

A Real Example: Excel Import

Suppose users upload large Excel files, and processing takes 30 seconds — parsing rows, validating data, creating records, generating a report.

Without a job queue, every upload request stays open for the full 30 seconds:

Upload Excel
     |
     v
+----------------+
| Parse File     |
| Validate Rows  |
| Create Records |
| Generate Report|
+----------------+
     |
     v
Response (30s later)

With Agenda, the request returns almost immediately, and the user polls for status or gets notified asynchronously:

Upload Excel
     |
     v
+----------------+
| Store File     |
| Create Job     |
+----------------+
     |
     v
Response (~200ms)


Worker (separately)
     |
     v
+----------------+
| Parse File     |
| Validate Rows  |
| Create Records |
| Generate Report|
+----------------+

This is the canonical Agenda use case: a single unit of deferred work, owned by one worker, with a clear start and end.

Agenda vs BullMQ

Engineers often compare these as if one replaces the other. In practice they solve the same category of problem using different storage backends — the architecture is nearly identical, the operational tradeoffs aren't.

Feature	Agenda	BullMQ
Storage	MongoDB	Redis
Scheduling	Strong	Strong
Delayed jobs	Yes	Yes
Retries	Yes	Yes
Throughput	Moderate	High
Operational complexity	Low (if you already run Mongo)	Medium (Redis tuning, persistence config)
Ecosystem / tooling	Smaller	Larger (Bull Board, strong community)

Structurally, both look the same:

Agenda:    MongoDB  ->  Jobs  ->  Workers
BullMQ:    Redis    ->  Jobs  ->  Workers

The real decision driver is almost never "which is more powerful" — it's which storage system you already operate, and what your throughput actually requires. If your platform already runs MongoDB and job volume is in the hundreds-to-low-thousands per minute range, Agenda is usually sufficient and adds no new moving parts. If you're pushing tens of thousands of jobs per second, or need Redis-specific features like priority queues and rate-limited workers out of the box, BullMQ is the better fit.

Neither is "the wrong choice" in the abstract. They're the wrong choice only relative to what you're already running.

Agenda vs Kafka

This is where the comparison actually breaks down — and where engineers at every level tend to make the same mistake. Kafka isn't a bigger, more scalable job queue. It's solving a different problem entirely.

A job queue answers: "Can somebody perform this work?"

Producer  ->  Queue  ->  Worker

One worker picks up the job. The work completes. Done. Nobody else needs to know it happened.

Event streaming answers: "Can everyone interested in this event find out about it?"

                  Order Created
                        |
        +---------------+---------------+----------------+
        |               |                |                |
   Billing Service  Analytics Service  Inventory Service  Recommendation                                                                   Engine

Multiple independent consumers react to the same event, often for entirely different reasons, and the producer doesn't know or care who's listening. The goal is distribution and replayability, not task completion.

Side by side

AGENDA (task execution)

  Create Invoice
       |
       v
  Agenda Job
       |
       v
   Worker
       |
       v
  Generate PDF


KAFKA (event distribution)

  Invoice Created Event
         |
    +----+----+--------+----------+
    |         |         |         |
Accounting Analytics SearchIndex AuditService

One is "make this thing happen." The other is "tell everyone this thing happened." Conflating them is the single most common architecture mistake I see — and it's not a junior mistake specifically. I've seen senior engineers introduce Kafka because it's the "scalable" choice, without asking whether the requirement was distribution at all.

A note for more senior readers

The nuance worth sitting with: Kafka can be used as a durable job queue (Kafka Streams, compacted topics, consumer groups partitioned by job ID), and people do this successfully at scale. The reason it's usually a bad default isn't capability — it's operational cost relative to requirement. You're taking on partition rebalancing, consumer lag monitoring, and exactly-once semantics complexity to get a feature set that a job table already gives you for free. The right question isn't "can Kafka do this," it's "does this workload need fan-out, replay, and multi-consumer distribution" — if the answer is no, the operational bill you're signing up for buys you nothing.

When to Choose Agenda

Choose it when:

You already run MongoDB
The work is a business workflow with a single owner (one job, one outcome)
You need retries and delayed/scheduled execution
Job volume is moderate, not firehose-scale

Good fits: PDF generation, invoice workflows, Excel processing, email delivery, report generation, retryable third-party API calls.

When Not to Choose Agenda

Avoid it for:

Event streaming or multi-consumer fan-out
Real-time analytics pipelines
High-volume log ingestion
Millions of messages per hour
Cross-team event distribution where other teams need to subscribe independently

Those are Kafka problems, not job-scheduling problems — and reaching for Agenda (or BullMQ) there just relocates the pain rather than solving it.

The Mistake Most Teams Make

The recurring failure mode isn't technical —it's that technology gets chosen by reputation instead of requirements. A surprising number of production systems can run successfully on:

Database + Job Table + Worker Process

instead of:

Kafka + KRaft/Zookeeper + Consumer Groups + Partitions + Rebalancing + Ongoing Ops Burden

Complexity isn't architecture. Appropriate complexity is architecture. The systems that age well are the ones where someone asked what the workload actually needed before reaching for the tool with the best conference talks.

Final Thought

A rule of thumb that's served me well, regardless of which queue or broker is fashionable this year:

If the requirement is "somebody needs to do this work later" — reach for Agenda or BullMQ.

If the requirement is "many independent systems need to know this happened" — reach for Kafka.

Confusing these two requirements is one of the most common design mistakes in backend systems, and it's not a seniority problem — it's a habit of choosing infrastructure before naming the actual question you're trying to answer.

The Most Underrated Background Job System in Node.js

The Problem: Everything Happening in the Request

Decoupling the Request

What Agenda Actually Is

Why this matters more than it sounds

A Concrete Deployment

A Real Example: Excel Import

Agenda vs BullMQ

Agenda vs Kafka

Side by side

A note for more senior readers

When to Choose Agenda

When Not to Choose Agenda

The Mistake Most Teams Make

Final Thought

Comments

More from this blog

AsyncLocalStorage: The Missing Piece Behind Request Context in Node.js

Command Palette

The Problem: Everything Happening in the Request

Decoupling the Request

What Agenda Actually Is

Why this matters more than it sounds

A Concrete Deployment

A Real Example: Excel Import

Agenda vs BullMQ

Agenda vs Kafka

Side by side

A note for more senior readers

When to Choose Agenda

When Not to Choose Agenda

The Mistake Most Teams Make

Final Thought

Comments

More from this blog