If you’ve ever deployed a Django monolith that uses Celery for background tasks, you probably ran into this:

“I want to ship a new version of my app… but what happens to the long-running tasks that my Celery workers are crunching?”

Do I kill the workers? Do I just restart everything with docker compose up -d and pray? Spoiler: if you do that, you’ll interrupt tasks mid-way, lose work, and probably confuse users. Not fun.

Let’s break down the problem and see how to solve it in a way that’s simple for DevOps but also makes the most of Celery’s features.


Why is this tricky?

In a typical Django + Celery setup, you have something like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
services:
  web:
    image: myapp:latest
    command: gunicorn myproj.wsgi:application

  worker:
    image: myapp:latest
    command: celery -A myproj worker -l info

  beat:
    image: myapp:latest
    command: celery -A myproj beat -l info

  redis:
    image: redis:7

The worker service connects to Redis (or RabbitMQ) and pulls tasks off a queue. If you simply redeploy the worker container, Docker sends it a SIGTERM. By default Celery treats this as a “please exit now” — and if you’re not careful, that can mean your tasks get cut off halfway.

That’s the nightmare scenario: imagine a worker processing a payment, resizing a video, or sending out 5,000 emails… and suddenly poof it disappears.


What we want

A sane deployment should let us:

  1. Start new workers (with the new version of the code).
  2. Let old workers finish what they’ve already started.
  3. Stop old workers once they’re done.
  4. Never lose a task.

All without adding Kubernetes, service meshes, or a PhD in DevOps. Just Docker Compose + Celery’s built-in powers.


Solution 1: Lean on Celery’s graceful shutdown

Good news: Celery already knows how to shut down gracefully. If you give it enough time, it will stop taking new tasks and finish the ones it’s working on.

In docker-compose.yml, add a long stop_grace_period:

1
2
3
4
5
worker:
  image: myapp:latest
  command: celery -A myproj worker -l info
  stop_grace_period: 1h   # give long tasks time to finish
  init: true              # tiny init system forwards signals properly

Now when you run:

1
docker compose up -d worker

Docker will stop the old container by sending SIGTERM. Celery sees that and says: “Okay, no new tasks, let me finish what I’ve got.”

If your tasks usually finish in a few minutes, set stop_grace_period: 10m. If they can run for an hour, give it longer. Simple.


Solution 2: Run new and old workers side by side

Sometimes you want to be extra safe: bring up new workers before touching the old ones. That way there’s always someone available to pick up tasks.

Example:

1
2
3
4
worker_v2:
  image: myapp:v2
  command: celery -A myproj worker -l info
  stop_grace_period: 1h

Then:

  1. docker compose up -d worker_v2 (new workers join the party).

  2. Tell old workers to stop taking tasks:

    1
    
    docker compose exec worker celery -A myproj control cancel_consumer default
    
  3. Wait until they finish current tasks:

    1
    
    docker compose exec worker celery -A myproj inspect active
    
  4. Shut them down:

    1
    2
    
    docker compose stop worker
    docker compose rm worker
    
  5. Rename worker_v2 back to worker in your Compose file.

You’ve now done a zero-downtime worker rollout.


Solution 3: Versioned queues (for breaking changes)

What if your new release changes the shape of tasks or the database schema? Old workers wouldn’t even know how to handle those tasks.

This is where versioned queues shine. Example:

  • Old release writes to default.v1.
  • New release writes to default.v2.
  • Old workers keep listening on .v1 until empty, then shut down.
  • New workers only listen on .v2.

This way you never have code mismatch between producer and consumer.


A quick example

Let’s say you deploy a new version while a worker is resizing a video.

Bad approach

1
docker compose up -d

The worker gets killed → video job lost. User uploads again → angry email.

Good approach

1
2
3
4
5
docker compose up -d worker_v2
docker compose exec worker celery -A myproj control cancel_consumer default
docker compose exec worker celery -A myproj inspect active
# wait until nothing’s active
docker compose stop worker

Old worker finishes the video resize. New worker is already online. No user even noticed.


Tips to make this smoother

  • Use worker_prefetch_multiplier = 1 in Celery so workers don’t “hog” a pile of tasks during shutdown.
  • Use acks_late = True so tasks are only marked “done” after the worker actually finishes them.
  • For very long jobs, add soft_time_limit so they don’t hang forever.
  • If you run beat, make sure you don’t have two beats running at the same time - they can duplicate schedules.

Wrapping up

You don’t need Kubernetes or fancy blue-green deploy systems to safely redeploy Celery. Just:

  • Start new workers.
  • Stop old ones gracefully.
  • Let Celery do the heavy lifting.

This keeps your DevOps simple, avoids interrupted jobs, and makes deployments boring, which is exactly what you want.