GSoC 2024 proposal: Django ORM support for composite primary keys

Hi!

I have a proposal for GSoC 2024.
Your feedback would be greatly appreciated :pray:

Hello,

I am delighted to have been able to read your proposal and to learn new knowledge from it. Your mastery of the subject is truly impressive. However, I encourage you to review the work of @Lily-Foote on this link: Support multiple column fields #17279. It seems she is working on the same issue. I wonder if it would be possible for you to work on this problem for the GSoC 2024 project, given that Lily is already working on it.

1 Like

Indeed, thanks @HamaBarhamou. The last commit to that PR has been 6 months ago so I’m not sure if it’s being worked on anymore. My proposal is a little different, of course I’m willing to adjust it if there’s a consensus on how this should be implemented.

It is likely that the Django team will comment on your proposal. Regardless, I wish you all the best in this endeavor.

1 Like

Lily wrote on the PR that she will return to it in the spring. I think we should let her continue work on it. Maybe she would be interested in mentoring a project to finish off the work, let’s wait to see if she responds here.

2 Likes

@Lily-Foote , let us know what you think. :pray:

Hi @csirmazbendeguz. Thanks for your proposal, I just had a quick read. The big problem here is that composite primary keys are the easy part. Where things get really difficult is composite foreign keys. My understanding is that we would need to see a solid solution for foreign keys to consider this solved.

As mentioned above, I took a look at this last year. For various reasons I haven’t made any progress on this since my last comments on the PR. I would love to see this get further, but it’s the oldest open ticket in Django for a reason. There is a lot of prior art trying to solve this, which has been partially landed, which means its quite difficult to know how much work is actually left to do.

When I picked it up I was hoping I could experiment my way to something that worked cleanly, but I got rather stuck with how ForeignKey configures itself to store data in the database.

Specifically, I was trying to get something like this to work:

class House(models.Model):
    address = models.CompositeField(
        street=models.CharField(max_length=255),
        number=models.PositiveIntegerField(),
        primary_key=True
    )


class Owner(models.Model):
    name = models.CharField(max_length=255)
    home = models.ForeignKey(House, on_delete=models.DO_NOTHING)
    

class CompositePKTests(TestCase):
    def test_set_composite_foreign_key(self):
        house = House.objects.create(
            street="Candlewood Lane",
            number=698,
        )

        Owner.objects.create(name="Jessica Fletcher", home=house)

But most of the Django ORM expects a field to correspond one-to-one to a column, so working out what to change is not straightforward. It may very well be the case that Django’s internals would need a fairly major refactor still to allow this to work cleanly.

I’m not certain this would be suitable for GSoC, but on the other hand I’m also not certain it would be unsuitable. If it is picked, I would be happy to be involved.

@charettes Do you have any additional thoughts here? I think you have more day-to-day understanding of the design of Django’s ORM.

1 Like

Hello guys!
Just giving my tiny bits here.
Instead of this design:

I believe it would be great to keep a consistent design with the GenericForeignKey that already works for a similar use case, using more than one field to “generate” another, like:

class House(models.Model):
  street = models.CharField(max_length=255)
  number = models.PositiveIntegerField()
  address = models.CompositeField("street", "number", primary_key=True)

Yes, I also ran into the composite foreign key issue when I was experimenting.

As far as I understand [1], ForeignObject kind of works like a virtual composite foreign key, it could be used with composite primary keys. Of course, I agree it would be best if ForeignKey supported this as well, with a database-level constraint and automatic indexing.

I would be happy to help implement this, it’s a feature I would like to use in my personal projects [2], hence my GSoC proposal. And, if necessary, we could add composite ForeignKey support to the acceptance criteria.

2 Likes

@charettes, let us know what you think. :pray:

Your proposal look solid to me @csirmazbendeguz.

The research you did on subject such as prior requests to add support for non-primary key auto fields, ForeignObject support for multi-joins, and the challenges associated with changing Meta.pk to return a tuple are spot on.

On the subject of Meta.pk I think that targeting a small part of the ORM API surface first like you suggested doing (3.1) is the right way to approach the problem. Having someone go through the lengthy process of adjusting the source to explicitly raise NotImplementedError("Feature X is not implemented yet for composite primary keys") while supporting the most common use case (migrations being able to track and generate the right SQL, foreign keys support, JOIN support) should go a long way in getting us on track for incremental improvement such as generic composite field support which @Lily-Foote has started to explore.

The fact that it is possible today to define composite foreign keys by using ForeignObject and defining a ForeignKeyConstraint(BaseConstraint) subclass makes me believe that composite primary key is the area we should focus on resolving and I think your proposal makes a good job at describing the scope of the problem and proposing a plan to resolve it.

The only part in your proposal I would suggest to remove focus from is GenericForeignKey and friends. It’s the kind of problem I would personally put in the someday / maybe bucket and default to raising NotImplementedError for now.

Really excited to see work on this front happening and I’m sure many are as well given the general interest on the work @Lily-Foote spearheaded a few weeks ago already.

1 Like

cc. @adamchainz, @Lily-Foote

Thank you so much for the positive feedback!

If I apply to GSoC, can I expect to be accepted?

Also, let’s try to reach a consensus on the API.

I feel like it’s been decided to go with the CompositeField approach, is that correct? And that’s because there’s another use case for a generic CompositeField other than composite pk/fk that I don’t quite understand? I’ve seen #5929, but as far as I understand what they’re asking for has more to do with storing multiple values in a single field (e.g. PostgreSQL’s Composite Types)? @HamaBarhamou , maybe you have some insight?

Let me know, please. I would like to contribute, I have some time as I’m not working right now. If you could give me some pointers I could start working on the task right away. I just don’t want to do spec work where I get my PR rejected after I submit it because I made the wrong design decision - it’s better to discuss the issue first.

I’m not involved in the GSOC selection process, so I can’t answer if this will be accepted.

I understand @Lily-Foote , I would still be interested in your opinion about the API though. What’s the vision for the CompositeField? I’m just trying to understand the use cases this needs to support. I understand how it relates to composite primary keys and composite foreign keys, but what else can it be used for? The concept of a generic composite field doesn’t seem very useful to me, but I’m sure I’m missing something.

For your work to be accepted, it’s not necessary for it to be approved for GSoC. In my opinion, the most important thing is that we push the boundaries further together or solve the problem definitively. The real question is how to combine efforts between the work already started and your proposals, considering that @Lily-Foote is already engaged in it. I believe we can work together on the already initiated PR. @csirmazbendeguz, your understanding of the issue will be invaluable. What do you think?

1 Like

Yeah I agree @HamaBarhamou :+1:

1 Like

@csirmazbendeguz For the API question, there’s some discussion of that at #373 (Add support for multi-columns fields.) – Django. Since that discussion was 11 years ago, it’s possible that the decision could be reconsidered now. I suggest you review that old discussion and if you still think the decision is wrong you should explain why here. As for the extra use-cases from #5929 (Allow Fields to use multiple db columns (complex datatypes)) – Django, I’m not really sure if they’re still relevant.

I don’t see how CompositeField relates to #5929.
As far as I understand, #5929 tries to solve the issue of storing composite types in a single field,
e.g. money (1 USD, 2 GBP, etc.), weight (1 kg, 2 lbs), temperature (1 °C, 2 °F), etc.
I don’t think CompositeField can solve this issue, since the users would still need to create separate fields. I agree I don’t think this is relevant.

Thanks for the link! I don’t have anything against CompositeField, as a Django user, virtual fields are not the first thing that come to my mind when someone mentions composite primary keys, but it’s just a personal preference and I accept it.