distinct(*fields)
Returns a new QuerySet
that uses SELECT DISTINCT
in its SQL query. This eliminates duplicate rows from the query results.
By default, a QuerySet
will not eliminate duplicate rows. In practice, this is rarely a problem, because simple queries such as Blog.objects.all()
don’t introduce the possibility of duplicate result rows. However, if your query spans multiple tables, it’s possible to get duplicate results when a QuerySet
is evaluated. That’s when you’d use distinct()
.
Note
Any fields used in an order_by()
call are included in the SQL SELECT
columns. This can sometimes lead to unexpected results when used in conjunction with distinct()
. If you order by fields from a related model, those fields will be added to the selected columns and they may make otherwise duplicate rows appear to be distinct. Since the extra columns don’t appear in the returned results (they are only there to support ordering), it sometimes looks like non-distinct results are being returned.
Similarly, if you use a values()
query to restrict the columns selected, the columns used in any order_by()
(or default model ordering) will still be involved and may affect uniqueness of the results.
The moral here is that if you are using distinct()
be careful about ordering by related models. Similarly, when using distinct()
and values()
together, be careful when ordering by fields not in the values()
call.
On PostgreSQL only, you can pass positional arguments (*fields
) in order to specify the names of fields to which the DISTINCT
should apply. This translates to a SELECT DISTINCT ON
SQL query. Here’s the difference. For a normal distinct()
call, the database compares each field in each row when determining which rows are distinct. For a distinct()
call with specified field names, the database will only compare the specified field names.
Note
When you specify field names, you must provide an order_by()
in the QuerySet
, and the fields in order_by()
must start with the fields in distinct()
, in the same order.
For example, SELECT DISTINCT ON (a)
gives you the first row for each value in column a
. If you don’t specify an order, you’ll get some arbitrary row.
Examples (those after the first will only work on PostgreSQL):
>>> Author.objects.distinct() [...] >>> Entry.objects.order_by('pub_date').distinct('pub_date') [...] >>> Entry.objects.order_by('blog').distinct('blog') [...] >>> Entry.objects.order_by('author', 'pub_date').distinct('author', 'pub_date') [...] >>> Entry.objects.order_by('blog__name', 'mod_date').distinct('blog__name', 'mod_date') [...] >>> Entry.objects.order_by('author', 'pub_date').distinct('author') [...]
Note
Keep in mind that order_by()
uses any default related model ordering that has been defined. You might have to explicitly order by the relation _id
or referenced field to make sure the DISTINCT ON
expressions match those at the beginning of the ORDER BY
clause. For example, if the Blog
model defined an ordering
by name
:
Entry.objects.order_by('blog').distinct('blog')
...wouldn’t work because the query would be ordered by blog__name
thus mismatching the DISTINCT ON
expression. You’d have to explicitly order by the relation _id
field (blog_id
in this case) or the referenced one (blog__pk
) to make sure both expressions match.
Please login to continue.