SerialFields - Yay or Nay?

Hello!

We’re looking to merge SerialFields #27452.

If you’re not familiar with it, serial is a PostgreSQL data type for storing an auto-incrementing sequence of integers. Django used serial for AutoFields in the past but now it uses identity.

I am looking to add SerialFields to django.contrib.postgres.

AutoFields must be a primary key, but SerialFields can be used anywhere where a non-primary auto-incrementing sequence of integers is required.

Use cases:

  1. I’m looking to merge composite primary keys. AutoFields are not compatible with composite primary keys. SerialFields would provide an alternative, so users can still use surrogate keys in composite primary keys.
  2. An “order column”, where objects are automatically assigned an auto-incremented value, but the users can change this value for the purpose of ordering the items in a list.

The alternative is to allow AutoFields to not be primary keys #8576. This would only work on certain database backends.

We wanted to discuss what does the community think about this?
So let’s align please.
cc. @apollo13 , @sarahboyce

I think we should view this independent of postgres. Are auto fields as part of a composite key useful? If yes (and I think so), then having a serial field in django.contrib.postgres is not the answer.

I agree with this sentiment. Let’s not add and support PostgreSQL-specific fields if we can make non-PK AutoField work on multiple backends. It might be more work to make that happen, but I think it will lead Django to a better place.

Yes, I think auto fields would be useful in composite primary keys.
Of course, users could always use django-sequences… still, I feel like Django should encourage the use of surrogate keys in composite primary keys out of the box.

With serial fields, I was aiming to kill two birds with one stone. I was under the impression this was already needed by the community before I came along with my use case.

Supporting reuse of auto fields makes it work on multiple databases, so I’d say that would be preferred over a PostgreSQL specific serial field

Are there any databases other than Postgres that would work if we went forward with this change to AutoField? #8576 lists problems with MySQL, sqlite and Oracle. I don’t know if the situation has changed for these in the past 15 years.

If Postgres is the only database this would work for, I think keeping with the SerialField is the better route.

Edit: Actually, would AutoField even work on Postgres without re-introducing use of SERIAL as an implementation detail?

1 Like

My thinking here is that it would be trivial to implement SerialField on Postgres first and we could then consider other backends. A bit like we did with postgres.JSONField and how postgres.ArrayField could now be implemented on all backends by using JSON primitives.

In the case of SerialField, adding support on other backends is far from trivial and full of gotcha. For example, on MySQL we can’t simply use AUTO_INCREMENT on InnoDB as it has its own limitation

  • AUTO_INCREMENT must be used part of a unique key
  • Only a single column can use AUTO_INCREMENT at a time. Even if it wasn’t the case the lack of RETURNING support forces the usage of LAST_INSERT_ID() but the latter can only return a single value at a time (same reason why we don’t support primary key assignment on bulk_create)
  • Naive emulation through INSERT triggers (e.g. SELECT MAX(col) + 1)) is prone to race conditions particularly under READ COMMITTED

I wouldn’t be surprised if Oracle also had limitations in this regard and haven’t tried with SQLite but it seems to have a few gotchas that would force the usage of a trigger.

All that to say that I don’t believe that we can simply disregard this accepted ticket by saying that it should be implemented on all backends given these limitations.

The main motivations for adding it on Postgres first was that by migrating AutoField to identity we left our user base in the dust with their possibly valid usage of the serial type and that it’s relatively trivial to implement on Postgres (hell it powered AutoField for a decade). Neither of these arguments apply to MySQL, Oracle, or SQLite.

1 Like

I don’t necessarily see the MySQL limitations as a blocker. Just because a database requires an Autofield to be part of a key or only allows for a single one shouldn’t be a blocker to remove the primary_key=True requirement from it (even though mainly other databases would benefit from it.

My thinking goes like this:

  • If removing the primary_key=True requirement of AutoField allows for usage in compound primary keys on more than Postgresql then we should do that no matter the other issues.
  • Once the first step is done the question remains if SerialField is still needed in django.contrib.postgresql as opposed to have it in a 3rd party library (covering the 80:20 usecase an all that – Aside from semantically being nicer, is there an upside to have a serial instead of an identity?).

Please note that I am not suggesting to implement an actual SerialField for MySQL.

Afaik, SQLite only supports AUTOINCREMENT if it’s combined with PRIMARY KEY. And I mean combined inline, so it wouldn’t work with composite primary keys.

Given we need to rebuild SQLite tables on most operations and have the logic to do so is it really an issue?

Thank you for sharing your thinking @apollo13, I missed that you were suggesting adding support for AutoField(primary_key=False) as an alternative to postgres.SerialField.

It seems to have been an alternative that was pushed against a few times already in Trac so I assumed we didn’t want to venture there? That’s what motivated the creation of a ticket to add postgres.SerialField IIRC.

My take on it is that a few packages out there assume that isinstance(field, AutoField) implies that it represents the primary key so they could break but that’s something we’ll have to deal with anyway when moving to composite primary keys.

Aside from semantically being nicer, is there an upside to have a serial instead of an identity?

That’s a good question. Once you are using identity I see very little reasons for going back to serial unless you have exotic needs (e.g. sharing the same sequence between multiple columns) that we don’t plan to support anyway. In cases where you were using serial before we moved AutoField to identity and are stuck with it as altering the primary key of a hot table is not always possible I believe there remains a need to use SerialField to ensure a coherent Django / Database representation of schemas.

These are the reasons listed in Postgres Wiki as well.

I would argue that we should not have moved AutoField to identity without providing postgres.SerialField in the first place but the ship has sailed now so maybe it’s time to revisit AutoField(primary_key=False) as well.

The migrations are another thing, I’m saying it’s impossible to have both a composite primary key and an AUTOINCREMENT column (AutoField) in the same SQLite table.

Ah ok I assumed that by inline you meant that the PRIMARY KEY (...) statement had to be inlined in the CREATE TABLE statement.

So, here’s my review of database backends with regard to how they support AutoField:

  • SQLite
    • The least flexible
    • AUTOINCREMENT only works if the field is also a PRIMARY KEY.
    • It doesn’t work with composite primary keys.
  • MySQL
    • AUTO_INCREMENT column must be a key.
    • Setting unique=True on the AutoField should do the trick.
    • One allowed per table.
    • It works with composite primary keys.
  • Oracle
    • One IDENTITY column allowed per table.
    • It works with composite primary keys.
  • PostgreSQL
    • The most flexible.
    • Multiple IDENTITY columns allowed per table.
    • It works with composite primary keys.
Feature SQLite MySQL Oracle PostgreSQL
Multiple AutoFields no no no yes
AutoField in composite PK no yes yes yes
  1. If we’re considering using AutoFields in composite primary keys, all backends except SQLite support that.

  2. If we’re considering using AutoFields for any other purpose (e.g. “order column”), then only PostgreSQL supports that.

SerialFields were originally intended for (2). We just also saw the opportunity to use it for (1).

2 Likes

Any thoughts on the above @apollo13 ?

I suppose the question is, is it okay to change the ORM in a way that it won’t work on one of the supported backends (SQLite)? Doesn’t that go against the philosophy of the ORM? I’m asking because I’m not sure.

We have other features as well that only work on certain databases. Ie select_for_update supports no_key which is only supported on postgresql.

Alright. In that case, I could submit a PR for this later to remove the primary key limitation. If the other use case is irrelevant, we should close the serial field ticket as wontfix.

If the other use case is not compelling enough to add this, please close the ticket.
cc. @apollo13 , @adamchainz , @sarahboyce

Hey everyone, thank you for contributing to the discussion. Since I’ve worked on the ticket and my intention was to merge it, I feel like it’s not appropriate for me to close it myself. Please, if there’s no intention to merge (which I understand), close it as wontfix. Thanks!