Reliable Webhook Delivery in Node.js
Webhooks sound simple: make an HTTP POST when something happens. But reliable webhook delivery is surprisingly hard. Network failures, timeouts, duplicate deliveries, and unresponsive endpoints turn a simple concept into an engineering problem.
Here’s how to build a webhook system that actually works.
The naive approach (and why it fails)
// Don't do this
async function sendWebhook(url, payload) {
await fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
});
}
This fails silently when:
- The endpoint is temporarily down (503)
- The network times out
- The server returns an error
- The process crashes between event creation and delivery
The architecture
A reliable webhook system has four components:
- Event table — records what happened
- Delivery queue — tracks pending deliveries
- Worker — processes the queue with retries
- Dead letter queue — stores permanently failed deliveries
CREATE TABLE webhook_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
type TEXT NOT NULL,
payload TEXT NOT NULL,
created TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE webhook_deliveries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL REFERENCES webhook_events(id),
endpoint_url TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending, delivered, failed, dead
attempts INTEGER NOT NULL DEFAULT 0,
max_attempts INTEGER NOT NULL DEFAULT 5,
next_attempt TEXT NOT NULL DEFAULT (datetime('now')),
last_error TEXT,
last_status_code INTEGER,
delivered_at TEXT,
created TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_deliveries_pending
ON webhook_deliveries(next_attempt)
WHERE status = 'pending';
The partial index on status = 'pending' is critical — it keeps the queue scan fast even when the table has millions of completed rows.
HMAC signatures
Recipients need to verify that webhooks actually came from you. The standard approach: sign the payload with a shared secret.
import { createHmac } from 'node:crypto';
function signPayload(payload, secret) {
const body = typeof payload === 'string' ? payload : JSON.stringify(payload);
return createHmac('sha256', secret).update(body).digest('hex');
}
function buildHeaders(payload, secret) {
const timestamp = Math.floor(Date.now() / 1000);
const body = JSON.stringify(payload);
const signature = signPayload(`${timestamp}.${body}`, secret);
return {
'Content-Type': 'application/json',
'X-Webhook-Signature': `t=${timestamp},v1=${signature}`,
'X-Webhook-Id': crypto.randomUUID(),
};
}
The X-Webhook-Id header enables idempotency — recipients can deduplicate based on this ID if they receive the same webhook twice.
Including the timestamp in the signature prevents replay attacks. Recipients should reject signatures older than 5 minutes.
Verification on the receiving side
function verifyWebhook(body, signatureHeader, secret) {
const parts = Object.fromEntries(
signatureHeader.split(',').map(p => p.split('='))
);
const timestamp = parseInt(parts.t);
const signature = parts.v1;
// Reject old signatures (replay protection)
const age = Math.floor(Date.now() / 1000) - timestamp;
if (age > 300) return false; // 5 minute window
const expected = createHmac('sha256', secret)
.update(`${timestamp}.${body}`)
.digest('hex');
// Timing-safe comparison
return crypto.timingSafeEqual(
Buffer.from(signature, 'hex'),
Buffer.from(expected, 'hex')
);
}
Use timingSafeEqual instead of === to prevent timing attacks.
Exponential backoff with jitter
When a delivery fails, retry — but not immediately. Each retry should wait longer than the last:
function nextRetryDelay(attempt) {
// Exponential: 30s, 2m, 8m, 32m, ~2h
const base = 30 * Math.pow(4, attempt);
// Add jitter (±25%) to prevent thundering herd
const jitter = base * 0.25 * (Math.random() * 2 - 1);
return Math.min(base + jitter, 7200); // cap at 2 hours
}
The jitter prevents a wave of retries all hitting the same endpoint simultaneously after a recovery.
The delivery worker
const BATCH_SIZE = 10;
const TIMEOUT = 10000; // 10 seconds
async function processQueue(db) {
const pending = db.prepare(`
SELECT d.*, e.type, e.payload
FROM webhook_deliveries d
JOIN webhook_events e ON e.id = d.event_id
WHERE d.status = 'pending'
AND d.next_attempt <= datetime('now')
ORDER BY d.next_attempt ASC
LIMIT ?
`).all(BATCH_SIZE);
const results = await Promise.allSettled(
pending.map(delivery => deliverOne(db, delivery))
);
return results;
}
async function deliverOne(db, delivery) {
const payload = JSON.parse(delivery.payload);
const secret = getEndpointSecret(delivery.endpoint_url);
const headers = buildHeaders(payload, secret);
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), TIMEOUT);
try {
const response = await fetch(delivery.endpoint_url, {
method: 'POST',
headers,
body: JSON.stringify(payload),
signal: controller.signal,
});
clearTimeout(timer);
if (response.ok) {
// Success — mark delivered
db.prepare(`
UPDATE webhook_deliveries
SET status = 'delivered', delivered_at = datetime('now'),
attempts = attempts + 1, last_status_code = ?
WHERE id = ?
`).run(response.status, delivery.id);
return;
}
throw new Error(`HTTP ${response.status}`);
} catch (err) {
clearTimeout(timer);
const newAttempts = delivery.attempts + 1;
if (newAttempts >= delivery.max_attempts) {
// Exhausted retries — move to dead letter
db.prepare(`
UPDATE webhook_deliveries
SET status = 'dead', attempts = ?, last_error = ?
WHERE id = ?
`).run(newAttempts, err.message, delivery.id);
} else {
// Schedule retry
const delay = nextRetryDelay(newAttempts);
db.prepare(`
UPDATE webhook_deliveries
SET attempts = ?, last_error = ?,
next_attempt = datetime('now', '+' || ? || ' seconds')
WHERE id = ?
`).run(newAttempts, err.message, Math.round(delay), delivery.id);
}
}
}
Running the worker
The worker runs on a simple interval. No Redis, no Bull, no external queue infrastructure:
const POLL_INTERVAL = 5000; // 5 seconds
setInterval(async () => {
try {
await processQueue(db);
} catch (err) {
console.error('Queue processing error:', err.message);
}
}, POLL_INTERVAL);
For higher throughput, decrease the poll interval and increase the batch size. SQLite handles this comfortably up to hundreds of deliveries per second.
Monitoring the queue
function queueStats(db) {
return db.prepare(`
SELECT
status,
COUNT(*) as count,
AVG(attempts) as avg_attempts
FROM webhook_deliveries
GROUP BY status
`).all();
}
Expose this as a health endpoint. Alert if pending count grows faster than it shrinks, or if dead count spikes.
The complete pattern
- Something happens → Insert into
webhook_events - For each subscriber → Insert into
webhook_deliverieswith statuspending - Worker polls → Find pending deliveries where
next_attempt <= now - Deliver → POST with HMAC signature and timeout
- Success → Mark
delivered - Failure → Increment attempts, schedule retry with exponential backoff
- Max retries → Mark
dead, alert operator - Dead letters → Manually retry or investigate
This entire system runs in a single Node.js process with SQLite. No message broker, no background job framework, no separate queue service. The database IS the queue — and SQLite is remarkably good at it.
When to use a real queue
Upgrade to Redis/BullMQ when:
- You need sub-second delivery latency (SQLite polling has 1-5 second delay)
- Delivery volume exceeds ~1000/second sustained
- You need multiple worker processes on different machines
For most applications — SaaS webhooks, notification delivery, event forwarding — the SQLite approach handles the load with far less operational complexity.