Rolling Updates for n8n Using Docker

I’ve run n8n in production behind reverse proxies where even a 10-second restart could break inbound webhooks and annoy paying users.

Rolling Updates for n8n Using Docker let you ship new versions while keeping triggers alive, the editor reachable, and executions safe.

What “zero downtime” actually means in n8n

If you update the container and your instance disappears for a moment, you usually see one (or more) of these failures:

Missed inbound webhooks: external services retry later (or never), and you lose events.
Broken OAuth callback flows: users get redirected into a dead request path.
Half-finished executions: long-running workflows get killed mid-flight if your shutdown is abrupt.
Editor disconnects: not fatal, but it looks like instability.

Your goal is simple: keep at least one healthy n8n “main” available to accept web traffic while updates happen, and make shutdowns graceful so in-flight work finishes safely.

Non-negotiables before you attempt rolling updates

1) Put n8n behind a reverse proxy and set the webhook URL correctly

If you run n8n behind a proxy (which you should for TLS and routing), set WEBHOOK_URL so n8n generates and registers the correct external URLs. This is the difference between “it works in the editor” and “it works in production.”

Use the official reference for reverse proxy webhook configuration here: n8n reverse proxy webhook URL configuration

Real-world challenge: The most common rolling-update “bug” isn’t Docker—it’s n8n producing internal URLs (like port 5678) when WEBHOOK_URL isn’t set correctly.

Fix: Set WEBHOOK_URL to your public HTTPS URL and verify new webhook registrations show the correct domain.

2) Use a real database (not SQLite) and keep it stable across replicas

Rolling updates only make sense when your state survives container churn. That means a persistent database (commonly Postgres) and predictable connectivity from each n8n instance.

If you’re using Docker, follow the official n8n Docker hosting guidance: n8n Docker installation docs

Real-world challenge: Database latency spikes during deployments can look like “n8n is unhealthy,” causing your proxy or orchestrator to flap containers.

Fix: Add health checks that test the app, not the database directly, and keep your DB on stable storage with proper resources.

3) Decide if you need Queue Mode (you probably do if you scale)

If you plan to run multiple containers, Queue Mode is the cleanest way to separate web-facing “main” instances from execution “workers.” It reduces the risk of concurrency conflicts and gives you predictable scaling.

Reference the official Queue Mode documentation: n8n Queue Mode configuration

Real-world challenge: Queue Mode adds Redis and introduces more moving parts to monitor.

Fix: Keep Redis private on an internal network, use a strong Redis password if applicable, and start with one worker, then scale gradually while watching execution throughput.

Two production-ready patterns for rolling updates

You can do rolling updates in Docker in two different ways depending on your environment:

Pattern A (recommended for true rolling updates): Docker Swarm services with rolling update policy.
Pattern B (works on a single host with Docker Compose): Blue/green using two n8n “main” containers behind a proxy.

Pattern A: Docker Swarm rolling updates (true start-first behavior)

Swarm gives you first-class rolling updates: it can start a new task, wait until healthy, then stop the old task. That “overlap” is what prevents downtime.

Core idea

Run 2+ replicas of the n8n “main” service behind a reverse proxy.
Use update order = start-first so a new container is ready before the old one exits.
Use health checks so Swarm only routes traffic to healthy tasks.

Swarm stack example (main + workers in queue mode)

version: "3.8"

services:

  n8n-main:

    image: n8nio/n8n:latest

    environment:

      - NODE_ENV=production

      - N8N_HOST=your-domain.com

      - N8N_PROTOCOL=https

      - WEBHOOK_URL=https://your-domain.com/

      - N8N_PROXY_HOPS=1

      - N8N_ENCRYPTION_KEY=REPLACE_WITH_LONG_RANDOM_SECRET

      - DB_TYPE=postgresdb

      - DB_POSTGRESDB_HOST=postgres

      - DB_POSTGRESDB_DATABASE=n8n

      - DB_POSTGRESDB_USER=n8n

      - DB_POSTGRESDB_PASSWORD=REPLACE_ME

      - EXECUTIONS_MODE=queue

      - QUEUE_BULL_REDIS_HOST=redis

      - QUEUE_BULL_REDIS_PORT=6379

    networks:

      - internal

      - edge

    deploy:

      replicas: 2

      update_config:

        parallelism: 1

        delay: 10s

        order: start-first

        failure_action: rollback

      rollback_config:

        parallelism: 1

        order: stop-first

      restart_policy:

        condition: on-failure

    healthcheck:

      test: ["CMD-SHELL", "wget -qO- http://localhost:5678/healthz || exit 1"]

      interval: 10s

      timeout: 3s

      retries: 6

      start_period: 25s

  n8n-worker:

    image: n8nio/n8n:latest

    environment:

      - NODE_ENV=production

      - N8N_ENCRYPTION_KEY=REPLACE_WITH_LONG_RANDOM_SECRET

      - DB_TYPE=postgresdb

      - DB_POSTGRESDB_HOST=postgres

      - DB_POSTGRESDB_DATABASE=n8n

      - DB_POSTGRESDB_USER=n8n

      - DB_POSTGRESDB_PASSWORD=REPLACE_ME

      - EXECUTIONS_MODE=queue

      - QUEUE_BULL_REDIS_HOST=redis

      - QUEUE_BULL_REDIS_PORT=6379

    command: worker

    networks:

      - internal

    deploy:

      replicas: 2

      update_config:

        parallelism: 1

        delay: 10s

        order: start-first

  postgres:

    image: postgres:16

    environment:

      - POSTGRES_DB=n8n

      - POSTGRES_USER=n8n

      - POSTGRES_PASSWORD=REPLACE_ME

    volumes:

      - postgres_data:/var/lib/postgresql/data

    networks:

      - internal

  redis:

    image: redis:7

    networks:

      - internal

networks:

  internal:

    driver: overlay

  edge:

    driver: overlay

volumes:

  postgres_data:

Real-world challenge: Swarm is solid for rolling updates, but the ecosystem is smaller than Kubernetes, and some teams don’t have standardized Swarm ops.

Fix: Keep your stack minimal, document your update commands, and test updates in staging with a realistic webhook load before you do it on your primary domain.

Deploy and update commands

# First deploy

docker stack deploy -c stack.yml n8n

# Update image (rolling update happens automatically due to update_config)

docker service update --image n8nio/n8n:latest n8n_n8n-main

docker service update --image n8nio/n8n:latest n8n_n8n-worker

# Optional: force a rolling restart (even if the tag didn't change)

docker service update --force n8n_n8n-main

For official Docker service update behavior, use: Docker service update reference

Pattern B: Docker Compose blue/green (practical on a single VPS)

If you’re on a single VPS and don’t want Swarm, you can still get near-zero downtime by running two “main” containers and switching traffic at the proxy.

Core idea

Run n8n-a and n8n-b (only one is “active” at a time).
Both point to the same database and use the same N8N_ENCRYPTION_KEY.
Your reverse proxy routes traffic to the active container.
Update the inactive container first, verify it’s healthy, then flip the proxy route.

Compose example for blue/green mains

services:

  n8n-a:

    image: n8nio/n8n:latest

    environment:

      - NODE_ENV=production

      - WEBHOOK_URL=https://your-domain.com/

      - N8N_PROXY_HOPS=1

      - N8N_ENCRYPTION_KEY=REPLACE_WITH_LONG_RANDOM_SECRET

      - DB_TYPE=postgresdb

      - DB_POSTGRESDB_HOST=postgres

      - DB_POSTGRESDB_DATABASE=n8n

      - DB_POSTGRESDB_USER=n8n

      - DB_POSTGRESDB_PASSWORD=REPLACE_ME

    healthcheck:

      test: ["CMD-SHELL", "wget -qO- http://localhost:5678/healthz || exit 1"]

      interval: 10s

      timeout: 3s

      retries: 6

      start_period: 25s

    networks: [internal]

  n8n-b:

    image: n8nio/n8n:latest

    environment:

      - NODE_ENV=production

      - WEBHOOK_URL=https://your-domain.com/

      - N8N_PROXY_HOPS=1

      - N8N_ENCRYPTION_KEY=REPLACE_WITH_LONG_RANDOM_SECRET

      - DB_TYPE=postgresdb

      - DB_POSTGRESDB_HOST=postgres

      - DB_POSTGRESDB_DATABASE=n8n

      - DB_POSTGRESDB_USER=n8n

      - DB_POSTGRESDB_PASSWORD=REPLACE_ME

    healthcheck:

      test: ["CMD-SHELL", "wget -qO- http://localhost:5678/healthz || exit 1"]

      interval: 10s

      timeout: 3s

      retries: 6

      start_period: 25s

    networks: [internal]

  postgres:

    image: postgres:16

    environment:

      - POSTGRES_DB=n8n

      - POSTGRES_USER=n8n

      - POSTGRES_PASSWORD=REPLACE_ME

    volumes:

      - postgres_data:/var/lib/postgresql/data

    networks: [internal]

networks:

  internal:

volumes:

  postgres_data:

Real-world challenge: Compose can’t natively “roll” replicas the way Swarm does, so you must manage the traffic switch at the proxy.

Fix: Automate your switch with a simple runbook: update inactive → health verify → flip route → drain old → remove old.

Traefik or NGINX: which proxy should you choose?

Traefik is popular with Docker because it can discover containers automatically and health-check upstreams.

Official Traefik service/healthcheck docs: Traefik routing services & health checks

Real-world challenge: Label-based configuration can become messy, especially when you’re switching between n8n-a and n8n-b.

Fix: Keep one stable router name (your domain) and only change the service target, not the public route.

NGINX is a strong option if you prefer explicit config files and predictable behavior.

Official NGINX documentation: NGINX docs

Real-world challenge: Misconfigured reloads or upstream timeouts can cause short request failures during a flip.

Fix: Use a health-checked upstream, a quick reload command, and verify the new upstream responds before switching traffic.

Comparison table: pick the right update approach

Approach	Downtime risk	Operational complexity	Best fit
Single container (stop/start)	High	Low	Internal testing, non-critical workflows
Compose blue/green (n8n-a / n8n-b)	Low (if proxy flip is clean)	Medium	Single VPS, you want near-zero downtime without Swarm
Docker Swarm rolling service updates	Very low (with health checks)	Medium	Production webhooks, multiple replicas, repeatable deployments

Hardening your rolling updates: the details that prevent surprises

Use graceful shutdown so executions don’t get cut off

When you stop a container, Docker sends a termination signal and eventually kills the process if it doesn’t exit quickly. Long webhooks, retries, and active executions can get disrupted if your shutdown window is too short.

Practical move: extend stop timeouts so n8n can finish current work before exit.

# Docker Compose example

services:

  n8n:

    image: n8nio/n8n:latest

    stop_grace_period: 60s

Health checks must reflect “ready for traffic,” not just “process is alive”

If your health check is too weak, your proxy may route traffic to a container that hasn’t fully started. If it’s too strict, you’ll get unnecessary restarts.

Real-world challenge: Many teams only check the TCP port, which passes even when the app isn’t ready.

Fix: Hit a lightweight HTTP endpoint and require consistent success before the instance is considered healthy.

Sticky sessions: helpful for the editor, risky during flips

Sticky sessions can reduce “log in again” moments in some setups, but they can also pin a user to an old instance that you’re trying to drain.

Real-world challenge: During a flip, a sticky cookie can keep sending editor requests to a container that is stopping.

Fix: If you enable stickiness, shorten cookie lifetime and always do start-first updates so an alternate healthy instance is ready.

A practical blue/green runbook you can execute in minutes

This sequence works reliably on a single VPS with Compose and a proxy flip:

# 1) Pick the inactive main (example: n8n-b is inactive)

docker compose pull n8n-b

docker compose up -d n8n-b

# 2) Verify it becomes healthy (use your own check)

docker inspect --format='{{json .State.Health.Status}}' $(docker compose ps -q n8n-b)

# 3) Flip the reverse proxy to route traffic to n8n-b

# (This step depends on your proxy: update upstream/labels and reload)

# 4) Drain old main (n8n-a) by removing it after traffic is confirmed stable

docker compose stop n8n-a

docker compose rm -f n8n-a

Common mistakes that quietly cause downtime

Updating the only “main” container: if there’s no second main, there’s always downtime.
Forgetting WEBHOOK_URL: external integrations register incorrect callback URLs and stop triggering.
Changing N8N_ENCRYPTION_KEY between replicas: credentials become unreadable across instances.
Not validating health before routing: your proxy routes traffic to a container still warming up.
Deploying big version jumps without staging: migrations happen under live traffic and amplify risk.

FAQ

Can you run multiple n8n “main” containers at the same time?

Yes, as long as they share the same database and the same N8N_ENCRYPTION_KEY. For higher throughput and safer scaling, use Queue Mode so workers handle executions while mains focus on web traffic and orchestration.

Do rolling updates break active webhooks?

They don’t have to. If your proxy always has one healthy upstream and you update with start-first behavior (Swarm) or blue/green flip (Compose), inbound webhooks keep landing on a live instance.

What’s the safest way to update when you’re running heavy workflows?

Use Queue Mode and scale workers separately. Update workers one at a time, then update mains. Keep a longer stop grace period so running executions have time to finish cleanly.

How do you prevent credential issues during rolling updates?

Keep N8N_ENCRYPTION_KEY identical across every instance, and don’t rotate it casually. If you must rotate secrets, plan it as a controlled maintenance event and validate credential decryption on a staging clone first.

Should you pin the n8n image tag instead of using latest?

Pinning a version reduces surprise changes and makes rollbacks predictable. Use a controlled update cadence, validate the version in staging, then deploy the exact same tag in production.

Can you do rolling updates without a reverse proxy?

Not reliably. You need something that can keep a stable public endpoint (TLS + routing) while containers come and go. That stable edge is what makes “start-first” and blue/green switches possible.

Conclusion

If you want the most reliable rolling updates, run two n8n mains behind a proxy and let Docker Swarm handle start-first updates with health checks. If you’re on a single VPS and prefer Compose, blue/green with a clean proxy flip gets you very close to zero downtime—just make health verification and WEBHOOK_URL configuration non-negotiable.

Toolient