Making a major RabbitMQ version upgrade without breaking Celery ETA tasks | Engineering Blog

Executing code in the background is a very common thing in our industry.

This obviously opens a lot of possibilities when it comes to how to do it, and we’ve historically been using Celery, which allows you to achieve this via Celery tasks.

One thing we also leverage out of this framework is the ability to run these tasks at a specific time, or at least not before it.

This is possible by setting an ETA, and that’s something we rely on alot.

Celery requires you to choose a broker to publish tasks to, and we have been using RabbitMQ v3.13.x, which we needed to upgrade to v4.x.

This upgrade came with the removal of two critical features

The Global QoS meant that all tasks that have ETAs will now block celery workers.

The removal of classic queue mirroring meant that we had to switch to Quorum Queues if we ever wanted to use RabbitMQ’s queue replication.

And since RabbitMQ doesn’t allow you to change a queue’s definition “in-place”, it meant that we have to transfer all the messages that exist in a classic queue to a quorum queue.

To give you an idea about the scale, some environments can peak at 8M messages per day.

However, transferring these messages wasn’t as straightforward as a “copy-paste” kind of operation, because:

Celery behaves differently when enqueuing tasks with ETAs in Quorum queues as it uses what they call Native Delayed Delivery.
We had to do this with zero downtime per our SLAs.

Considering all of this, we had to come up with a migration strategy that’ll allow us to move forward cautiously.

In a nutshell, this involved:

Creating another virtual vhost that will host the new quorum queues
Choosing migration windows per environment, where traffic of incoming messages would be at its lowest
Transferring all the messages from classic queues to their “sister” quorum queues in the vhost
- We had to do a special transformation on tasks that had ETAs.

I’ve written a detailed technical breakdown on my blog, covering the full migration strategy, code examples, and lessons learned.

There has been a lot of trial and error of course, but after all the lessons learned during the testing phase we were able to safely and seamlessly migrate environments at a rate of ~340k messages per minute.

Would you like to help us build a better, greener and fairer future?

We're hiring smart people who are passionate about our mission.

Posted by Amrou Bellalouna Software Engineer on Jan 9, 2026