Handling models with foreign key to possibly non-existing users

I am working on an enterprise LMS powered by Django REST framework.

Authentication is done via Google OAuth 2.0 using the package drf-social-oauth2, and the target organizations for my software are schools and universities.

Some models have foreign keys to the user model; however, due to the nature of my application, oftentimes a user may want to reference a user that isn’t present in the database yet. This happens because users are created in the database upon their first login in the application via OAuth, yet the person who wants to create a specific model instance referencing another user may want to do so before they’ve logged in for the first time: for example, a teacher may want to pre-enroll a list of students into their new course, but those users might not have logged in for the first time yet, and therefore might not exist in the database.

I’ll give a concrete example with a model in my application:

class UserCoursePrivilege(models.Model):
    """
    Represents the administrative permissions a user has over a course.
    See logic.privileges.py for the available permissions.
    """

    user = models.ForeignKey(
        User,
        on_delete=models.CASCADE,
        related_name="privileged_courses",
    )
    course = models.ForeignKey(
        Course,
        on_delete=models.CASCADE,
        related_name="privileged_users",
    )
    allow_privileges = models.JSONField(default=list, blank=True)
    deny_privileges = models.JSONField(default=list, blank=True)

This object is created in the frontend by accessing a table which shows all registered users and allows turning on switches that correspond the specific permissions for that user.

More than once have I found myself in the situation in which a teacher would email me telling me they couldn’t find their colleague to add their permissions for a course, and I would tell them to have them log in first and then come back to find the in the user table.

However, this isn’t very user-friendly and somehow counterintuitive considering that my application doesn’t provide an explicit user creation process, so the mental model for users is that their account somehow “already exists” and they just need to sign in.

I’m looking for a way to handle this in as transparent way as possible.
The target user experience is something like this: if the user cannot find the person they want to create the object for, the interface shows them a banner like “Can’t find the person you’re looking for?” and allows them to type in the email address of that person and proceed like normal (in the example above, that would entail showing them all the toggles to select permissions to grant).

Then, an instance of the correct model would be created, but with a null foreign key to user.
Then I would have a model that looks like this:

class PendingModelInstance(models.Model):
    """
    Represents a model instance which references a user that doesn't exist yet
    """

    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.TextField()
    content_object = GenericForeignKey("content_type", "object_id")

    email_address = models.TextField()

an instance of this would be created referencing the “partial” instance with the missing FK and with the email address of the user.

Then, upon user creation, a query is made to retrieve all instances of PendingModelInstance which have the email of the newly created user, their referenced models are then retrieved and updated with a FK to the new user instance.

This approach seems like it could work fine, but it introduces an issue I don’t really like: it makes foreign keys nullable, which they don’t need to be and shouldn’t be.

Can you think of a better alternative? Have you ever faced this kind of situation?

Maybe you’re missing a Student model, and instead of referencing the FK to the User model that may not yet exist, you can then create a Student that has a nullable reference to User. So when the student logs in, a User is then created and assigned to it. So you can define the permissions directly on the Student model, so they won’t need to login to be able to set the permissions.
Doing this way you might going to need to set a custom authentication backend to authorize this Students, i’m not familiar with drf-social-oauth2 to give you advice on how to do this.

This idea is good from a standpoint of (almost) not touching existing models and keeping existing FK’s non-nullable.

However, I am not sure about the permanent added level of indirection—I fear that might cause too many extra queries each time you have to access a user and first have to pass through their Profile (I’m calling it profile here, instead of Student, because this kind of issues don’t just arise from associating models to students but also to not-yet-registered teachers, however that’s easily generalizable by having a mode that applies to all users and not just students).

That’s a misplaced fear. It doesn’t need to add any additional queries. It may add an additional join to your query, but that’s not necessarily a bad thing. Relational databases exist to optimize that type of data reference. (You will also want to consider whether that relationship between “Profile” and “User” should be a ForeignKey or OneToOne relationship.)

You’re correct—that actually wasn’t the real issue with this solution.

I examined this possible solution a little deeper, and what I realized is that it does have some pros, but it’d require several changes in different places of my application, which is pretty big:

  • all affected models would need to change the model that their fk points to
  • all places where those models are used would require re-wiring to the new relation (e.g. in permission checking, to retrieve the object that represents user permissions, I would now need to query for the profile, requiring changes in ORM methods usage)
  • a profile would need to be created for all existing users, and all existing models that reference users would need to be migrated

The advantage would be that most of the new logic would be transparent with regards to the fact that a profile’s user might not exist yet in the database.

I have thought of another possible alternative solution, which is similar to the one I had originally proposed but doesn’t require fk be made nullable.

I could have a model like the initially mentioned PendingModelInstance , which looks like this:

class PendingModelInstance(models.Model):
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    fields = models.JSONField()
    email_address = models.TextField()

fields would hold a JSON representation of the fields of the to-be-created model. Then, when a new user is created, something like this would be run to create all pending instances for that user:

pending_instances = PendingModelInstance.objects.filter(email_address=new_user.email)
for instance in pending_instances:
    cls = instance.content_type.model_class()
    cls.objects.create(**instance.fields, user=new_user)

The only issue I can see arises from storing raw model data in a JSON field, but I guess that shouldn’t be too big a problem due to the fact that: (1) validation still happens when actually using that JSON object to create the model instance, and (2) users wouldn’t be able to create these PendingModelInstances arbitrarily through the API, as their creation would be tightly regulated by business logic rules which, in some sense, ensure the JSON field is only filled with valid data.

What do you think?

Actually, I’d go the other way.

If you’re matching people by email address, I’d perform the registration process by creating the User object for currently-unknown individuals and set the is_active flag to false.

Then, when a person tries to log in for the first time, I’d check that flag. If the user already exists and is_active is False, I’d process that registration as a new registrant.

1 Like

I decided to give your approach a try. I’ll show where I landed here, so you can give me some feedback as to how my solution looks.

One thing I didn’t want to change was the fact that I use user ID’s in requests. I also didn’t want to duplicate logic over different API endpoints, having one specialized for requests which contain a user ID and another one for requests by email.

So I chose to use this solution:

  • by default, the endpoint accepts the user pk as a lookup parameter, and attemps to get the user by calling get_object
  • if the method returns a user, we’re fine and can proceed as before the change
  • otherwise, we see if there’s an email query param in the request
  • if there is not, we raise 404
  • otherwise, we use that to create a new user account on the fly and proceed using that user

My frontend will know when the user is trying to reach a user who doesn’t exist, and will perform a call to a url which looks like this: /my/endpoint/-1/?email=<email_address>, where -1 is a dummy value given as the user id lookup arg.

As an exampe, here’s an endpoint to set permissions for a user in a course. I now want to allow setting permissions of a user that doesn’t exist yet.

    @action(detail=True, methods=["patch"])
    def privileges(self, request, **kwargs):
        params = request.query_params
        if "course_id" in params:
            course = get_object_or_404(Course, pk=params["course_id"])
        else:
            return Response(status=status.HTTP_400_BAD_REQUEST)

        try:
            user = self.get_object()
        except Http404:
            # view has been called with a dummy id, which means user may be trying to create
            # permissions for a nonexisting user on purpose. if requestor supplied an email
            # address, create a new user account with that address and associate the permissions
            # with the newly created account. this allows preemptively assigning permissions to
            # users who haven't registered yet
            email_address = params.get("email")
            if email_address is None:
                raise
            user = User.objects.create_user(username=email_address, email=email_address)

        try:
            new_privileges = request.data["course_privileges"]
            course_privileges, _ = UserCoursePrivilege.objects.get_or_create(
                user=user, course=course
            )
            course_privileges.allow_privileges = new_privileges
            course_privileges.save()
        except Exception:
            return Response(status=status.HTTP_400_BAD_REQUEST)

        serializer = UserSerializer(
            user,
            context=self.get_serializer_context(),
        )
        return Response(serializer.data)

Before the change, the method looked the same except for the try - except around get_object.

I also had to make one more change: since I use drf_social_oauth2 to handle authentication, I had to add a method to the authentication pipeline: social_core.pipeline.social_auth.associate_by_email—without this, a new user would be created when the person would log in for the first time, regardless of whether someone had already created an account with their email address with the above method. This method allows actually re-using the existing account upon the user’s first login.

A few things I still need to think about:

  • are there any possible security risks associated with having the email input like that in a query param and uses as-is to create an account?
  • drf_social_oauth2 calls this method to give a username to new users. Currently, I’m just setting the username as the email address. This isn’t terribly important as the username is never used anywhere, but for consistency I’d like to re-use the logic of the method by social_core, but I can’t quite just put it in there as it depends on the auth backend used at runtime and a bunch of other things.

How does this approach look to you?

I’m going to assume that this view is protected against access by unauthenticated requests. I’m also assuming you’ve got some “sanity” test for the input. (Rule #1 is to never trust input.)

If so, then I would say that the risks are minimal-to-negligible.

As for your other questions or comments, I’m not really the person to answer those. I don’t use heavy front ends, oauth, or DRF.

1 Like

Correct. In fact, only users with certain authorization privileges can access this view.

I crafted the example above in a hurry, but this is actually what’s going on: I created a specialized serializer which looks like this:

class UserCreationSerializer(serializers.ModelSerializer):
    """
    A write-only serializer to create a user from an email address.
    It's used in certain views, such as the one to set user permissions, to
    preemptively create user accounts to assign certain relationships.
    """

    class Meta:
        model = User
        fields = ["email"]

    def create(self, validated_data):
        # set username to hold the same value as email
        validated_data["username"] = validated_data["email"]
        return super().create(validated_data)

which should provide more than enough validation, and fits in nicely with the rest of my application, since serializers are how I’m validating all incoming data. The email address provided in the query params is now being passed as data to this serializer, instead of directly as an argument to the user manager.

1 Like