Multiple Columns using MYSQL IN(...) clause

Does Django support multiple columns defined in a MYSQL IN() clause?

SELECT
*
FROM
contacts_people
WHERE
(name, age, gender) IN (
(‘Jared’,22,‘M’),
(‘Tom’,24,‘M’)
);

Lets say I have a CSV file that contains hundreds of rows, and I want to check if the columns in the CSV match those in the database. One very efficient method of doing this is to use multiple columns with a MYSQL SELECT IN() clause. As shown above,

Ideally I would like to avoid using RAW SQL if possible, so I was wondering does Django have this functionally built?

After doing research i have found some possible solutions such as using & and | to combine columns.

Q(name=‘Jared’) & Q(age=22) & Q(gender=‘M’) | Q(name=‘Tom’) & Q(age=24) & Q(gender=‘M’)

However this is nowhere neither as efficient as just using an IN() clause, and I haven’t found any examples of Django code that contain multiple columns with an IN clause.

1 Like

To directly answer your question, no, the ORM syntax as provided does not allow that. However, it may well be possible for you to construct a function to implement it.

Just out of curiousity, how have you determined that? (Did you compare the query execution plans of each?)

How did you arrive at that conclusion? When utilizing the IN clause, the MYSQL optimizer enjoys considerable flexibility in selecting rows. In contrast, using Q() & Q() | … introduces significant complexity to the query, especially when dealing with hundreds of rows. This approach results in a complex MYSQL query that the optimizer must grapple with.

There are also comments on Stack Overflow suggesting that using Q() becomes slower when dealing with a large volume of data. It’s unfortunate that Django doesn’t provide built-in support for this functionality. Are there any plans to incorporate such a feature in the future?

I don’t disagree that that may be the case. On the other hand, it’s all conjecture without looking at the results of an EXPLAIN on both queries. It would be interesting to see what the optimizer does with “hundreds of rows” in both cases.

Side note: I have learned the hard way to trust nothing from SO without independent verification. It’s a bad resource in many ways. (Useful as a starting point to be sure, but never to be blindly trusted.)

Side note 2: In a simpler case (3 rows at least), the PostgreSQL query planner transforms that syntactical form into the equivalent sets of AND and OR predicates. In other words it internally treats your first form as if it were written in the second. And so there would be no difference between the two aside from the size of the query itself.

<opinion>
I think it would be difficult to identify a syntax for this that would “fit” within the general patterns of the ORM conditionals as they exist. I think also that given the relatively niche needs for this syntactical form that it’s probably not a good candidate for inclusion in core.
</opinion>

But those thoughts don’t preclude the opportunity for someone to implement this as a third-party package to see just how much interest there is for it.

Addendum - this is documented for PostgreSQL at PostgreSQL: Documentation: 16: 9.24. Row and Array Comparisons, where it identifies that the “IN” predicate is “… a shorthand notation for …” a series of “OR” conditions.

I will use a format similar to the one shown below in that case then.

from contacts.models import People

csv_data = [('Jared', 22, 'M'),('Tom', 24, 'M'),...] # Data fetched from HTTP POST request.FILES

conditions = Q()
for row in csv_data:
    conditions |= Q(name=row[0], age=row[1], gender=row[2])

qs = People.objects.filter(conditions)

There is primitive support for IN (...) in the ORM but it’s only used for multi-column JOINs (source)

Yes, the ORM support multi-column joins when ForeignObject(from_fields, to_fields) is used with multiple columns but it’s not a well known fact because it is only used indirectly through ForeignKey which doesn’t allow multiple fields to be used.

I suspect that a lot of this will have to change to if we want to support composite fields though. Hopefully when this work lands what you’re after should as simple as

People.objects.filter(
    CompositeIn(
        ExpressionTuple("name", "age", "gender"),
        tuples,
    )
)
2 Likes

We have merged TupleIn to core, but it’s not a public API yet and it’s not easy to use (due to the ColPairs class).
If we’re going to make this a public API, I wonder what interface would be the best.

1 Like

TupleIn can be used the following way now:

.filter(
    TupleIn(
        (F("name"), F("age"), F("gender")),
        [("Jared", 22, "M"), ("Tom", 24, "M"), ...]
    )
)

It’s not a public API yet though, so things might change.

1 Like