In this article, we will understand what asynchronous processing is, how to set up asynchronous processing for a Django project and what are the use-cases.
Let's start with the problem statement, lets say you have a website and on signup, you want to send an email to the new customer, does not seem like a problem right! The developer decides to insert the logic in the signup flow like the following.
At first glance this looks perfect, the developer pushes the code, deploys it and everyone is happy. After the latest change, they noticed that signup API latency has gone up. Still, the latency was not that bad enough so no one bothered. Days later, management wanted to send an email, a text message, and a notification. The developer refactored the code again as per requirement.
Again the change looks fine, however, the latency this time has gone up by 2-3 seconds for the signup API and there is an increase in drop-offs. Can you help our developer here?
Well, this is a good place to introduce asynchronous tasks. Asynchronous events are those occurring independently of the main program flow. That is if we use
celery for asynchronous processing for our use-case, celery processes will run along with the main process which is the Django process.
Celery is a distributed asynchronous task queue/job system to process messages or events. It’s a task queue with a focus on real-time processing, while also supporting task scheduling.
Let's see a pseudo-code of how we refactor our code and use asynchronous processing.
send_communication is now asynchronous and we will see our latency of the API going back down again. But what happened under the hood? How did the latency go back down again? How do we know, if those communications are reaching users?
Now that we have some context about asynchronous event handling, let's see different components of celery.
- Producer: Application that sends the messages.
- Consumer: Application that receives the messages and processes them.
- Message Queue/ Message Broker: Often used interchangeably both are message buffers. These queues/broker keeps messages that need to be picked by the asynchronous processor. In the above example, we wanted the asynchronous handler to send communication to end-users, this info is stored in the queue/broker. Anything that can hold data can be used as a broker, so what can hold data you ask, well databases right? So database can be a potential broker. Often Redis is used as a broker for its fast operations. Also, there are software specifically build as queues.
RabbitMqis one such example, at its core, Rabbitmq is just a buffer of messages with some additional queue-specific functionality.
- Task (Celery): Tasks are the unit of execution in celery, a task is nothing but a function that will be processed asynchronously, in our example
send_communicationis a task.
- Workers (Celery): Tasks are handled by workers, workers listen to particular queues, picks the tasks to be processed, and handles them.
- Scheduled/Periodic Tasks (Celery): Tasks can be pre-scheduled just like cronjobs, an example would be if we want to send an email report at the end of the day from Monday to Friday, we can have a periodic task doing that for us.
As we can see in the architecture, producers dump events/messages in the queue which can be RabitMq/Redis. When a task is written to the queue, each task is assigned a
task_id, which can be used to find out the status of the task. Note that tasks are picked as per the availability of celery workers i.e if workers are busy processing a big task example sending 1 million emails, other tasks can wait in the queue for a longer time, Therefore, it is not possible to guarantee by when a particular task will be picked by celery. However, a task can be delayed by a particular time. In celery to specify a minimum time after which task will be picked can be done by