Manually writing JOINS possible?

I’ve been really trying hard to write a JOIN directly but it seems like that might not be possible based on this other thead.

Overall Goal

I have an ETL job that unifies 2 or 3 different datasources into a single table and I want to annotate a query with some “far away” data.

Models

# Source A
class SourceACompany(models.Model):
  pass
class SourceADirector(models.Model):
  company = models.ForeignKey(SourceACompany, on_delete=models.CASCADE)

# Source B
class SourceBCompany(models.Model):
  pass
class SourceBDirector(models.Model):
  company = models.ForeignKey(SourceBCompany, on_delete=models.CASCADE)

# Unified
class UnifiedCompany(models.Model):
  source_a = models.OneToOneField(SourceACompany, on_delete=models.CASCADE)
  source_b = models.OneToOneField(SourceBCompany, on_delete=models.CASCADE)
class UnifiedDirector(models.Model):
  company = models.ForeignKey(UnifiedCompany, on_delete=models.CASCADE)
  source_a = models.OneToOneField(SourceADirector, on_delete=models.CASCADE)
  source_b = models.OneToOneField(SourceBDirector, on_delete=models.CASCADE)

Desired Query

At this point, I have ETL’ed Companies together and started to fill Directors. So, as I’m inserting a new UnifiedDirector, I want to find which UnifiedCompany it belongs to. My goal is to annotate the unified_company.id while I’m obtaining SourceADirector.

So the query would be:

SELECT source_a_directors.*, unified_company.id AS unified_company_id 
  FROM "source_a_directors"
  JOIN "unified_company" ON "unified_company"."source_a_id" = "source_a_directors"."company_id"

The problem seems to be there is no direct FK relationship between UnifiedCompany and SourceADirector. Their columns are already being used in FK relationships for sourcing.

Questions

  1. Is it possible to write this join through ORM?
  2. Or, is it possible to relate UnifiedCompany and SourceADirector by “reusing” the existing columns ("unified_company"."source_a_id" = "source_a_directors"."company_id") so that Django can know how to relate the two models directly? It seems like a trivial JOIN if Django knows how they’re related.

I tried looking at extra, and am really trying to avoid manual query because there are downstream filter+annotates.

Thanks so much for any help.

Welcome @nitsujri !

I’m sorry, I’m not sure I’m following what it is you’re trying to do here.

If I’m understanding this, your situation is that all models other than UnifiedDirector are populated with the correct relationships. Is that correct?

You’re trying to add a UnifiedDirector and determine which UnifiedCompany it should be related to, based upon the SourceADirector, which you know. Is this correct?

It appears to me then that what you’re really trying to do is identify a UnifiedCompany from a SourceADirector. Is that correct?

If all that is correct, then you have the issue from an ORM perspective that for any instance of SourceACompany, you could have multiple UnifiedCompany related to it.
In other words, if you have an instance of SourceADirector named source_a_director, then the ORM expression for all of the UnifiedCompany related to it would be source_a_director.company.unifiedcompany_set.all().

(If you know that there is only ever going to be one UnifiedCompany for any one SourceACompany, then this would be appropriately modeled as a OneToOneField and not as a foreign key.)

Ah, thanks so much for the ultra fast reply! Terribly sorry you’re right about the OneToOne, I’ve updated the above.

You’re trying to add a UnifiedDirector and determine which UnifiedCompany it should be related to, based upon the SourceADirector , which you know. Is this correct?

Correct, the SourceADirector.company = SourceACompany and thusly has a SourceACompany.unifiedcompany.

It appears to me then that what you’re really trying to do is identify a UnifiedCompany from a SourceADirector . Is that correct?

Exactly! Apologies again on the wrong relationship, built the example too hastily.

No apologies necessary, we’re here to help.

Yes, given the revision, then if source_a is your SourceADirector, then source_a.company.unifiedcompany is the reference to the UnifiedCompany.

AH! I can do the double join instead of going so directly. Wow I feel dumb

SourceADirector.objects.annotate(
   unified_company_id=F('company__unified_company__id')
).filter(
   company__unified_company__source_a_id=F('company_id')
)

Thanks so much!