db.models.query.QuerySet.distinct()

distinct(*fields)

Returns a new QuerySet that uses SELECT DISTINCT in its SQL query. This eliminates duplicate rows from the query results.

By default, a QuerySet will not eliminate duplicate rows. In practice, this is rarely a problem, because simple queries such as Blog.objects.all() don’t introduce the possibility of duplicate result rows. However, if your query spans multiple tables, it’s possible to get duplicate results when a QuerySet is evaluated. That’s when you’d use distinct().

Note

Any fields used in an order_by() call are included in the SQL SELECT columns. This can sometimes lead to unexpected results when used in conjunction with distinct(). If you order by fields from a related model, those fields will be added to the selected columns and they may make otherwise duplicate rows appear to be distinct. Since the extra columns don’t appear in the returned results (they are only there to support ordering), it sometimes looks like non-distinct results are being returned.

Similarly, if you use a values() query to restrict the columns selected, the columns used in any order_by() (or default model ordering) will still be involved and may affect uniqueness of the results.

The moral here is that if you are using distinct() be careful about ordering by related models. Similarly, when using distinct() and values() together, be careful when ordering by fields not in the values() call.

On PostgreSQL only, you can pass positional arguments (*fields) in order to specify the names of fields to which the DISTINCT should apply. This translates to a SELECT DISTINCT ON SQL query. Here’s the difference. For a normal distinct() call, the database compares each field in each row when determining which rows are distinct. For a distinct() call with specified field names, the database will only compare the specified field names.

Note

When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.

For example, SELECT DISTINCT ON (a) gives you the first row for each value in column a. If you don’t specify an order, you’ll get some arbitrary row.

Examples (those after the first will only work on PostgreSQL):

>>> Author.objects.distinct()
[...]

>>> Entry.objects.order_by('pub_date').distinct('pub_date')
[...]

>>> Entry.objects.order_by('blog').distinct('blog')
[...]

>>> Entry.objects.order_by('author', 'pub_date').distinct('author', 'pub_date')
[...]

>>> Entry.objects.order_by('blog__name', 'mod_date').distinct('blog__name', 'mod_date')
[...]

>>> Entry.objects.order_by('author', 'pub_date').distinct('author')
[...]

Note

Keep in mind that order_by() uses any default related model ordering that has been defined. You might have to explicitly order by the relation _id or referenced field to make sure the DISTINCT ON expressions match those at the beginning of the ORDER BY clause. For example, if the Blog model defined an ordering by name:

Entry.objects.order_by('blog').distinct('blog')

...wouldn’t work because the query would be ordered by blog__name thus mismatching the DISTINCT ON expression. You’d have to explicitly order by the relation _id field (blog_id in this case) or the referenced one (blog__pk) to make sure both expressions match.

Links:

https://docs.djangoproject.com/en/1.10/ref/models/querysets/#django.db.models.query.QuerySet.distinct

doc_Django

2025-01-10 15:47:30

Comments