Although I know how this sounds (it reminds me of when I thought I found a bug in perl in 2000), I think I found a bug in Django… but before I go through the effort of filing a bug report, I thought I would ask here to see if perhaps this is not a bug…
I have a rather complex database and an advanced search interface that creates queries through numerous many-to-many tables and it’s been working great for over a year now.
I recently added a feature to count distinct related table record occurrences in the joined results but I ran into a bug of my own in my code where I wasn’t counting those M:M related table records completely. When I applied a fix to supply missing fields to .distinct()
, I couldn’t get the test to execute without hitting an InvalidColumnReference error. And though I’m supplying the same expanded dict to .order_by()
that I am to .distinct()
, the error claims that SELECT DISTINCT ON expressions must match initial ORDER BY expressions
…
When I print the SQL, the place where it notes a difference has a weird T8
reference where the model name should be, e.g. ..., "DataRepo_peakgroup"."id" ASC, T8."name" ASC, "DataRepo_compoundsynonym"."name" ASC, ...
. Note the **T8**."name"
.
The corresponding DISTINCT ON
has “DataRepo_compoundsynonym” instead of T8
.
Here’s the full SQL created:
SELECT DISTINCT ON ("DataRepo_peakgroup"."name", "DataRepo_peakgroup"."id", "DataRepo_compoundsynonym"."compound_id", "DataRepo_compoundsynonym"."name", "DataRepo_compoundsynonym"."name", "DataRepo_compound"."name", "DataRepo_peakgroup_compounds"."compound_id", "DataRepo_study"."name", "DataRepo_animal_studies"."study_id") "DataRepo_peakgroup"."id", "DataRepo_peakgroup"."name", "DataRepo_peakgroup"."formula", "DataRepo_peakgroup"."msrun_id", "DataRepo_peakgroup"."peak_group_set_id" FROM "DataRepo_peakgroup" INNER JOIN "DataRepo_msrun" ON ("DataRepo_peakgroup"."msrun_id" = "DataRepo_msrun"."id") INNER JOIN "DataRepo_sample" ON ("DataRepo_msrun"."sample_id" = "DataRepo_sample"."id") INNER JOIN "DataRepo_tissue" ON ("DataRepo_sample"."tissue_id" = "DataRepo_tissue"."id") LEFT OUTER JOIN "DataRepo_peakgroup_compounds" ON ("DataRepo_peakgroup"."id" = "DataRepo_peakgroup_compounds"."peakgroup_id") LEFT OUTER JOIN "DataRepo_compound" ON ("DataRepo_peakgroup_compounds"."compound_id" = "DataRepo_compound"."id") LEFT OUTER JOIN "DataRepo_compoundsynonym" ON ("DataRepo_compound"."id" = "DataRepo_compoundsynonym"."compound_id") LEFT OUTER JOIN "DataRepo_compound" T8 ON ("DataRepo_compoundsynonym"."compound_id" = T8."id") INNER JOIN "DataRepo_animal" ON ("DataRepo_sample"."animal_id" = "DataRepo_animal"."id") LEFT OUTER JOIN "DataRepo_animal_studies" ON ("DataRepo_animal"."id" = "DataRepo_animal_studies"."animal_id") LEFT OUTER JOIN "DataRepo_study" ON ("DataRepo_animal_studies"."study_id" = "DataRepo_study"."id") WHERE UPPER("DataRepo_tissue"."name"::text) = UPPER(Brain) ORDER BY "DataRepo_peakgroup"."name" ASC, "DataRepo_peakgroup"."id" ASC, T8."name" ASC, "DataRepo_compoundsynonym"."name" ASC, "DataRepo_compound"."name" ASC, "DataRepo_peakgroup_compounds"."compound_id" ASC, "DataRepo_study"."name" ASC, "DataRepo_animal_studies"."study_id" ASC
And here’s the code that generates it:
q_exp = Q(msrun__sample__tissue__name__iexact="brain")
fmt_distinct_fields = ['name', 'pk', 'compounds__synonyms__compound', 'compounds__synonyms__name', 'compounds__synonyms__pk', 'compounds__name', 'compounds__pk', 'msrun__sample__animal__studies__name', 'msrun__sample__animal__studies__pk']
all_fields = ['name', 'pk', 'compounds__synonyms__compound', 'compounds__synonyms__name', 'compounds__synonyms__pk', 'compounds__name', 'compounds__pk', 'msrun__sample__animal__studies__name', 'msrun__sample__animal__studies__pk', 'msrun__sample__animal__name', 'peak_data__labeled_element', 'compounds__name', 'msrun__sample__name', 'msrun__sample__tissue__name', 'msrun__sample__animal__tracer_compound__name', 'msrun__sample__animal__studies__name', 'msrun__sample__animal__feeding_status', 'msrun__sample__animal__tracer_infusion_rate', 'msrun__sample__animal__tracer_infusion_concentration']
PeakGroup.objects.filter(q_exp).order_by(*fmt_distinct_fields).distinct(*fmt_distinct_fields).values_list(*all_fields)
The above code works using smaller examples. Previously, I was only adding things to fmt_distinct_fields
in specific instances based on the display of the joined table in our view, but I realized this was leading to inaccurate counts in the displayed stats, so I basically decided I needed to include all M:M table fields in that list. And it’s when I include them all that I hit the error.