When I get API with ordering by product__company__name, the API responds with the result as expected, but with 2 records duplicated. I read the Note in this docs https://docs.djangoproject.com/en/3.2/ref/models/querysets/#distinct that If you order by fields from a related model, those fields will be added to the selected columns and they may make otherwise duplicate rows appear to be distinct. Since the extra columns don’t appear in the returned results (they are only there to support ordering), it sometimes looks like non-distinct results are being returned.
So how can I resolve this problems ? I just want ordering product base on company name.
My code example:
It might be helpful if you posted the models involved.
What database engine are you using?
Are you needing all the fields in MyModel? Or are you just working with some of them in this situation?
Depending on these factors, you may need to perform the “deduplication” of data yourself within your code - the ORM may not have a facility available to limit this directly in a query.
Since you’re using PostgreSQL, you might be able to use the distinct function with specifying the fields that you want to use. For example, if you just want distinct entries of DefaultShelf, you may be able to use .distinct('id'). (I’m not sure if that’s going to work if you’re not including it in the ordering of those rows. If it doesn’t work, then it may be up to you do manage this in code.)
I can’t use distinct(‘id’) because SELECT DISTINCT ON expressions must match initial ORDER BY expressions from docs: QuerySet API reference | Django documentation | Django. I also tried with qs.distinct("product__company__name").order_by("product__company__name") and the result still has the duplicate record.
I was thinking of approaching this from the other direction - keep your query as you have it written, then perform the “deduplication” on the resulting queryset.
Thank you so much. Could you explain more about this? I really don’t know a way to resolve deduplication except to use distinct(). In my case, the respond API still count=2 as me as expected, but with 3 records in the result with 2 records duplicated.
While the specifics are going to depend upon precisely what you’re doing in the query and in the view, the general idea is that you write code to examine the queryset to determine which elements of the queryset are duplicates of other elements. There are a couple different ways to do this - you can either examine the queryset before you iterate over it for your template, or you write code that tracks which rows you are processing in your template, and skip any that you have already seen.
If you need more information, a web search for “removing duplicates from a list in python” (or some such similar phrasing) should help you locate any number of examples.