select_related and prefetch_related are ways to optimize database querying patterns when accessing related items. Both works on the same principle of prefetching data from the database, however, they take different approaches to do so.

We will be using a decorator to print all the stats for the query.

from django.db import connection
from django.db import reset_queries


def database_debug(func):
    def inner_func(*args, **kwargs):
        reset_queries()
        results = func()
        query_info = connection.queries
        print('function_name: {}'.format(func.__name__))
        print('query_count: {}'.format(len(query_info)))
        queries = ['{}\n'.format(query['sql']) for query in query_info]
        print('queries: \n{}'.format(''.join(queries)))
        return results
    return inner_func

Let's define our models and have some related entries. We have quizzes, which have multiple questions and each question has multiple options.

from django.db import models


class Quiz(models.Model):
    name = models.CharField(max_length=10)
    start_time = models.DateTimeField()
    end_time = models.DateTimeField()


class Option(models.Model):
    content = models.TextField()
    is_correct = models.BooleanField()


class Question(models.Model):
    OBJECTIVE = 'OBJ'
    SUBJECTIVE = 'SBJ'
    QUESTION_TYPES = (
        (OBJECTIVE, 'Objective'),
        (SUBJECTIVE, 'Subjective'),
    )
    content = models.TextField()
    quiz = models.ForeignKey(Quiz, related_name='questions', on_delete=models.CASCADE)
    question_type = models.CharField(max_length=3, choices=QUESTION_TYPES, default=OBJECTIVE)
    options = models.ManyToManyField(Option, related_name='options')

Case I: Without Any Prefetching.

We will simply try to get the quiz's name from questions. Please note that we are not using any prefetching.

@database_debug
def without_prefetching():
    questions = Question.objects.all()
    return [question.quiz.name for question in questions]
function_name: without_prefetching
query_count: 9
queries: 
SELECT `common_question`.`id`, `common_question`.`created_at`, `common_question`.`updated_at`, `common_question`.`content`, `common_question`.`quiz_id` FROM `common_question`
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 1 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 1 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 1 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 1 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 2 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 2 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 2 LIMIT 21
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` = 2 LIMIT 21

We can see that it does in a total of 9 queries. 1st query to get all the questions and 8 queries to get the quiz data for each question, also, we can see that there are 4 queries for each quizId which is repeated. So, we have a scope of optimization.

Case II: With select_related

We will try to get the same quizzes name from questions, let's do the same but use select_related.

@database_debug
def with_select_related():
    questions = Question.objects.select_related('quiz').all()
    return [question.quiz.name for question in questions]
function_name: with_select_related
query_count: 1
queries: 
SELECT `common_question`.`id`, `common_question`.`created_at`, `common_question`.`updated_at`, `common_question`.`content`, `common_question`.`quiz_id`, `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_question` INNER JOIN `common_quiz` ON (`common_question`.`quiz_id` = `common_quiz`.`id`)

Now, we see that query_count has reduced to 1 from 9, while checking the query we can see that INNER JOIN is being used to get all the questions and quizzes data at once.

Case III: With prefetch_related

The same query again and this time let's use prefetch_related.

@database_debug
def with_prefetch_related():
    questions = Question.objects.prefetch_related('quiz').all()
    return [question.quiz.name for question in questions]
function_name: with_prefetch_related
query_count: 2
queries: 
SELECT `common_question`.`id`, `common_question`.`created_at`, `common_question`.`updated_at`, `common_question`.`content`, `common_question`.`quiz_id` FROM `common_question`
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz` WHERE `common_quiz`.`id` IN (1, 2)

Let's see, what's happening here, the query count is now 2, 1st query is to get all the questions and 2nd query is to get the quiz data for all the quiz_ids. We have often read that in prefetch_realted join happens in python, which is happening here, all distinct quiz_ids are collected and then called at once.

Let's see a more advanced example.

@database_debug
def with_prefetch_related_advanced():
    quizzes = Quiz.objects.prefetch_related('questions')
    objective_questions_count = {}
    for quiz in quizzes:
        objective_questions_count[quiz.name] = quiz.questions.filter(question_type=Question.OBJECTIVE).count()
    return objective_questions_count
function_name: with_prefetch_related_advanced
query_count: 4
queries: 
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz`
SELECT `common_question`.`id`, `common_question`.`created_at`, `common_question`.`updated_at`, `common_question`.`content`, `common_question`.`quiz_id`, `common_question`.`question_type` FROM `common_question` WHERE `common_question`.`quiz_id` IN (1, 2)
SELECT COUNT(*) AS `__count` FROM `common_question` WHERE (`common_question`.`quiz_id` = 1 AND `common_question`.`question_type` = 'OBJ')
SELECT COUNT(*) AS `__count` FROM `common_question` WHERE (`common_question`.`quiz_id` = 2 AND `common_question`.`question_type` = 'OBJ')

We want to get the question's count for OBJECTIVE question_type in the above case, while we have already prefetched questions for all the quizzes. We can see that 4 queries were fired, 1 to get all the quizzes, 1 to get all the quizzes for all the quizzes. and 2 more for each quiz to get the count the objective type. The issue here is that prefetched has already happened and at a later point, we want to get only the objective types question count, however, this is not optimized.

Let's use Prefetch to fix the above issue.

from django.db.models import Prefetch


@database_debug
def with_prefetch_related_advanced_corrected():
    quizzes = Quiz.objects.prefetch_related(
        Prefetch('questions', queryset=Question.objects.filter(question_type=Question.OBJECTIVE))
    )
    objective_questions_count = {}
    for quiz in quizzes:
        objective_questions_count[quiz.name] = quiz.questions.count()
    return objective_questions_count
function_name: with_prefetch_related_advanced_corrected
query_count: 2
queries: 
SELECT `common_quiz`.`id`, `common_quiz`.`created_at`, `common_quiz`.`updated_at`, `common_quiz`.`name`, `common_quiz`.`start_time`, `common_quiz`.`end_time` FROM `common_quiz`
SELECT `common_question`.`id`, `common_question`.`created_at`, `common_question`.`updated_at`, `common_question`.`content`, `common_question`.`quiz_id`, `common_question`.`question_type` FROM `common_question` WHERE (`common_question`.`question_type` = 'OBJ' AND `common_question`.`quiz_id` IN (1, 2))

In the above case, only 2 queries are fired, 1st one to get all the quizzes and 2nd one to get all the questions of OBJECTIVE type.

And there we have perfect optimized queries using select_related and prefetch_related.

Conclusion:

  • select_related uses native database joins, whereas in the case of prefetch_related additional queries are fired later and can be therefore said that joining happens in python rather than the database.
  • Since select_related uses database joins Django allows this only in case of OneToOneField or ForeignKeys , but, this restriction is not present in the case of prefetch_related. Therefore, all selected_related can be technically replaced with prefetch_related, however, another way round is not possible.
  • Always use Prefetch() while prefetching, if we knew the filters we are going to apply on the prefetched rows for further optimization.

QuerySet API reference | Django documentation | Django