db.models.query.QuerySet.prefetch_related()

prefetch_related(*lookups)

Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.

This has a similar purpose to select_related, in that both are designed to stop the deluge of database queries that is caused by accessing related objects, but the strategy is quite different.

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.

For example, suppose you have these models:

from django.db import models

class Topping(models.Model):
    name = models.CharField(max_length=30)

class Pizza(models.Model):
    name = models.CharField(max_length=50)
    toppings = models.ManyToManyField(Topping)

    def __str__(self):              # __unicode__ on Python 2
        return "%s (%s)" % (
            self.name,
            ", ".join(topping.name for topping in self.toppings.all()),
        )

and run:

>>> Pizza.objects.all()
["Hawaiian (ham, pineapple)", "Seafood (prawns, smoked salmon)"...

The problem with this is that every time Pizza.__str__() asks for self.toppings.all() it has to query the database, so Pizza.objects.all() will run a query on the Toppings table for every item in the Pizza QuerySet.

We can reduce to just two queries using prefetch_related:

>>> Pizza.objects.all().prefetch_related('toppings')

This implies a self.toppings.all() for each Pizza; now each time self.toppings.all() is called, instead of having to go to the database for the items, it will find them in a prefetched QuerySet cache that was populated in a single query.

That is, all the relevant toppings will have been fetched in a single query, and used to make QuerySets that have a pre-filled cache of the relevant results; these QuerySets are then used in the self.toppings.all() calls.

The additional queries in prefetch_related() are executed after the QuerySet has begun to be evaluated and the primary query has been executed.

If you have an iterable of model instances, you can prefetch related attributes on those instances using the prefetch_related_objects() function.

Note that the result cache of the primary QuerySet and all specified related objects will then be fully loaded into memory. This changes the typical behavior of QuerySets, which normally try to avoid loading all objects into memory before they are needed, even after a query has been executed in the database.

Note

Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query. So, if you write the following:

>>> pizzas = Pizza.objects.prefetch_related('toppings')
>>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]

...then the fact that pizza.toppings.all() has been prefetched will not help you. The prefetch_related('toppings') implied pizza.toppings.all(), but pizza.toppings.filter() is a new and different query. The prefetched cache can’t help here; in fact it hurts performance, since you have done a database query that you haven’t used. So use this feature with caution!

You can also use the normal join syntax to do related fields of related fields. Suppose we have an additional model to the example above:

class Restaurant(models.Model):
    pizzas = models.ManyToManyField(Pizza, related_name='restaurants')
    best_pizza = models.ForeignKey(Pizza, related_name='championed_by')

The following are all legal:

>>> Restaurant.objects.prefetch_related('pizzas__toppings')

This will prefetch all pizzas belonging to restaurants, and all toppings belonging to those pizzas. This will result in a total of 3 database queries - one for the restaurants, one for the pizzas, and one for the toppings.

>>> Restaurant.objects.prefetch_related('best_pizza__toppings')

This will fetch the best pizza and all the toppings for the best pizza for each restaurant. This will be done in 3 database queries - one for the restaurants, one for the ‘best pizzas’, and one for one for the toppings.

Of course, the best_pizza relationship could also be fetched using select_related to reduce the query count to 2:

>>> Restaurant.objects.select_related('best_pizza').prefetch_related('best_pizza__toppings')

Since the prefetch is executed after the main query (which includes the joins needed by select_related), it is able to detect that the best_pizza objects have already been fetched, and it will skip fetching them again.

Chaining prefetch_related calls will accumulate the lookups that are prefetched. To clear any prefetch_related behavior, pass None as a parameter:

>>> non_prefetched = qs.prefetch_related(None)

One difference to note when using prefetch_related is that objects created by a query can be shared between the different objects that they are related to i.e. a single Python model instance can appear at more than one point in the tree of objects that are returned. This will normally happen with foreign key relationships. Typically this behavior will not be a problem, and will in fact save both memory and CPU time.

While prefetch_related supports prefetching GenericForeignKey relationships, the number of queries will depend on the data. Since a GenericForeignKey can reference data in multiple tables, one query per table referenced is needed, rather than one query for all the items. There could be additional queries on the ContentType table if the relevant rows have not already been fetched.

prefetch_related in most cases will be implemented using an SQL query that uses the ‘IN’ operator. This means that for a large QuerySet a large ‘IN’ clause could be generated, which, depending on the database, might have performance problems of its own when it comes to parsing or executing the SQL query. Always profile for your use case!

Note that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two optimizations do not make sense together.

You can use the Prefetch object to further control the prefetch operation.

In its simplest form Prefetch is equivalent to the traditional string based lookups:

>>> Restaurant.objects.prefetch_related(Prefetch('pizzas__toppings'))

You can provide a custom queryset with the optional queryset argument. This can be used to change the default ordering of the queryset:

>>> Restaurant.objects.prefetch_related(
...     Prefetch('pizzas__toppings', queryset=Toppings.objects.order_by('name')))

Or to call select_related() when applicable to reduce the number of queries even further:

>>> Pizza.objects.prefetch_related(
...     Prefetch('restaurants', queryset=Restaurant.objects.select_related('best_pizza')))

You can also assign the prefetched result to a custom attribute with the optional to_attr argument. The result will be stored directly in a list.

This allows prefetching the same relation multiple times with a different QuerySet; for instance:

>>> vegetarian_pizzas = Pizza.objects.filter(vegetarian=True)
>>> Restaurant.objects.prefetch_related(
...     Prefetch('pizzas', to_attr='menu'),
...     Prefetch('pizzas', queryset=vegetarian_pizzas, to_attr='vegetarian_menu'))

Lookups created with custom to_attr can still be traversed as usual by other lookups:

>>> vegetarian_pizzas = Pizza.objects.filter(vegetarian=True)
>>> Restaurant.objects.prefetch_related(
...     Prefetch('pizzas', queryset=vegetarian_pizzas, to_attr='vegetarian_menu'),
...     'vegetarian_menu__toppings')

Using to_attr is recommended when filtering down the prefetch result as it is less ambiguous than storing a filtered result in the related manager’s cache:

>>> queryset = Pizza.objects.filter(vegetarian=True)
>>>
>>> # Recommended:
>>> restaurants = Restaurant.objects.prefetch_related(
...     Prefetch('pizzas', queryset=queryset, to_attr='vegetarian_pizzas'))
>>> vegetarian_pizzas = restaurants[0].vegetarian_pizzas
>>>
>>> # Not recommended:
>>> restaurants = Restaurant.objects.prefetch_related(
...     Prefetch('pizzas', queryset=queryset))
>>> vegetarian_pizzas = restaurants[0].pizzas.all()

Custom prefetching also works with single related relations like forward ForeignKey or OneToOneField. Generally you’ll want to use select_related() for these relations, but there are a number of cases where prefetching with a custom QuerySet is useful:

  • You want to use a QuerySet that performs further prefetching on related models.
  • You want to prefetch only a subset of the related objects.
  • You want to use performance optimization techniques like deferred fields:

    >>> queryset = Pizza.objects.only('name')
    >>>
    >>> restaurants = Restaurant.objects.prefetch_related(
    ...     Prefetch('best_pizza', queryset=queryset))
    

Note

The ordering of lookups matters.

Take the following examples:

>>> prefetch_related('pizzas__toppings', 'pizzas')

This works even though it’s unordered because 'pizzas__toppings' already contains all the needed information, therefore the second argument 'pizzas' is actually redundant.

>>> prefetch_related('pizzas__toppings', Prefetch('pizzas', queryset=Pizza.objects.all()))

This will raise a ValueError because of the attempt to redefine the queryset of a previously seen lookup. Note that an implicit queryset was created to traverse 'pizzas' as part of the 'pizzas__toppings' lookup.

>>> prefetch_related('pizza_list__toppings', Prefetch('pizzas', to_attr='pizza_list'))

This will trigger an AttributeError because 'pizza_list' doesn’t exist yet when 'pizza_list__toppings' is being processed.

This consideration is not limited to the use of Prefetch objects. Some advanced techniques may require that the lookups be performed in a specific order to avoid creating extra queries; therefore it’s recommended to always carefully order prefetch_related arguments.

doc_Django
2016-10-09 18:36:17
Comments
Leave a Comment

Please login to continue.