Trigger announcement posting on model update following REST convention

I have the following use case that I want to implement:

an LMS app powered by DRF has an Exam model, which has a state field which can either be DRAFT or PUBLISHED.

When a teacher publishes an exam, I want them to have the option to also create an announcement on the course (there a model called Announcement for that).

I’m looking for an elegant and efficient way to encode and utilize the information of whether the user wants to contextually publish an announcement when publishing an exam.

This requires identifying three steps:

  • how to encode whether the user wants to publish the annoncement in a REST request to publish the exam
  • how to intercept when an exam is being published to optionally publish an exam
  • how to pass the information about whether the user wants to publish an announcement over to whichever part of the application will be responsbile for actually publishing it.

For the second step, I decided to go with the django-lifecycle package.This beatiful package allows defining a method on the Exam model class itself to look something like:

 @hook(AFTER_UPDATE, when='state', is_now=PUBLISHED)
 def on_publish(self):
    # ... logic

As per the rest, here’s two possible solutions I’ve thought of, which I’d like some feedback on:

  1. define a boolean field publish_announcement on Exam. Inside of the update request for the exam, together with the other payload fields, the frontend will be able to either set that to true or false. Then, in the handler for the exam update, I can do something like:
 @hook(AFTER_UPDATE, when='state', is_now=PUBLISHED)
 def on_publish(self):
    if self.publish_announcement:
        publish_announcement(...)
    self.publish_announcement = False
    self.save()

The upside of this method is that it’s trivially easy to pass the information of whether, for a given update request, the user wants to trigger the task of publishing an announcement—all they have to do is put that into the PUT/PATCH request made to the exam. There’s also no additional code to handle that information as it’s just a field on the model.

The downside is that I’m actually using up a column to store something that isn’t a db information at all. As you can see, I’m resetting the value of that field as soon as I’m done updating (this is done to prevent subsequent updates to unintentionally trigger announcement publishing): all I really want to do is pass that information to the AFTER_UPDATE hook and then throw it away.

  1. use a field on the Exam model class which isn’t actually a Django field. The idea is that, when receiving an update request, the user could state they want to trigger announcement publishing via, say, a query param. The view would check for the existence of this query and, if truthy, would set a field on the Exam model. The difference here is that the field isn’t saved to the db, it’s just a class instance field created on the fly and thrown away as the object is disposed of. The hook method could just test that field similarly, this time without having to reset its value at the end because the model doesn’t have such a field.

This has the advantage of not having to create db fields which aren’t used as real fields, but it adds the need for some “glue” code that adds the temporary field on the model based on the request params.

Can you think of a better alternative?

Have you considered making the act of publishing itself a model (e.g. ExamPublish) that has one foreign key to Exam and one nullable foreign key to Announcement (along with timestamp, user-who-published, and potentially other metadata like what changed)? So instead of making it a side-effect of setting that attribute on the Exam model, its a POST to a different resource entirely.

On the project that I primarily work on, the list of things that have to be done during/after a publish kept growing over time. It might start with an announcement, but then it gets to search indexing, schedule updates, etc. Elevating it to its own concept/model can help you deal with that when the time comes. Also, treating the act of publishing as a first class entity can also be useful if you want to look over history later on.

FWIW, I’m trying to model something like this in some new code I’m developing. It’s not really the same use case, but if you find looking at any of it useful, please feel free:

(Please note that the code above was committed last week, is very incomplete, and not ready for production. I expect it to be used in a real system in a couple of months or so.)

Good luck!

I think this is a great idea.

I’m less convinced about this though: I believe a million things can happen in between two API calls, and I’d rather this be “atomic” from the point of view of the client.

Maybe I’m being paranoid about this… but I’d rather a single request both update the exam and trigger the task.

Anyway, I guess I should also mention that the pattern I’m looking to use for this use case will also be adopted for another, more complex feature, which it’s probably worth discussing here:

I’m developing some integrations with Google Classroom for my LMS, and one of them consists of being able to post an announcement on the Classroom class when an announcement is posted on a course hosted by my LMS.

Essentially, this integration would make the content on my LMS synchronize with that posted on a class over at Classroom. This is optional though, and should be done on a per-announcement basis.

Every time the state of an announcement gets changed from draft to published, the user is prompted in a modal whether they want to also publish that announcement on Classroom. I need a way to encode that information and use it to optionally do the posting on Classroom.

Taking a look at the Google Calendar API docs for inspiration, I found that a similar feature, the ability of sending email notifications to invitees when an event is created, is handled by having a query parameter sendNotification in the create request. This is useful to draw some inspiration as to how to treat that information, i.e. apparently Google doesn’t think it belongs in the payload to somehow describe the resource being created, but of course it doesn’t tell me anything about how they’re using it from then on (not that it would be applicable to any meaningful degree to my application).

So, I like your idea of treating something that’s conceptually an action as a model/entity, but now the problem is how to know when to create an instance of such a model. Of course, this can’t just be a check on the query params in the request of my viewsets, because I would have to put a ton of code to check the nature of the request (i.e. I need to know I’m actually publishing the announcement, and not just patching some other field) which wouldn’t belong there, and lose the ability of exploiting the handier API provided by django-lifecycle.

You are not being paranoid at all–having two dependent REST API calls like that (where you set Exam.state in one and create ExamPublish in another) will absolutely cause trouble at some point, and I didn’t mean to imply that you should do that. I’m suggesting that you don’t allow a direct update to the draft/publish status of an Exam model, and that the state either becomes read-only for the exam REST endpoint, or is really a computed value that checks to see if an ExamPublish exists (joining against the foreign key or OneToOneField there, depending on your business requirements).

This would be treating the publishing step as a separate action altogether, and not the same as updating other Exam attributes. I think this is intuitive to end-users as well–publishing is usually it’s own button, and not a simple checkbox or something.

So in a fat-models world, your ExamPublish model has some classmethod like:

    @classmethod
    def publish(cls, exam_id, create_announcement=False):
        with transaction.atomic():
            # Create the Announcement, update the Exam publish/draft
            # state, and create a new ExamPublish that links the two together.

I’ve never used django-lifecycle, but I think I get the gist of it based your link. I personally find this pattern difficult to maintain and debug in the long term. Say three hooks will fire based on the latest model changes–what order do they fire in? Say I have a test system that I’m anonymizing emails for, and I go through and change 20,000 email addresses–it might not be at all clear to me that this automatically triggers notifications to all those users, or that it takes an hour to run because of that.

Maintaining a consistent data model is important, as is making sure that certain actions accompany state changes. Raw models aren’t great for this, so you get this sort of signals/lifecycle approach of monitoring state changes to try to reactively force consistency. But it makes it difficult to reason about the side-effects and costs of any operation, and it can get hairy to debug when there are a chain of these buried in different models that are all being invoked in the same view.

The approach I prefer for this sort of thing is to avoid directly manipulating model fields, and instead call model classmethods or functions in a separate api module for your app. That’s not something you have to use everywhere. If all you’re doing is simple CRUD on a model, then it’s probably overkill. But the moment you get to doing things that cause actions across multiple models, emails to be sent, etc., it’s much easier for maintainers to see what’s going on if your view calls a classmethod, and that classmethod goes step by step through all the data it’s creating and actions it’s performing.

FWIW, I do think that signals are good for allowing third-party code that you don’t control to react to events. So I could easily imagine creating a custom signal that’s emitted when an Announcement is created, passing along few parameters describing the announcement, and letting plugin code do something with that.

1 Like

Oh okay, I misunderstood what you meant by “separate API call.” Definitely makes sense now. I really like this idea: as of now, every state change in my application is a PATCH request made to alter a specific field. Of course, I’m not mapping that to a checkbox or similar, but rather to a dedicated button on the UI, but under the hood it’s just a PATCH request. I’ve always felt this somehow needed to be treated differently, but I was also trying my best to stay true to the REST architecture so I didn’t just want to have a “publish” endpoint which, behind the scenes, would just end up updating that state field.

I think treating the action of publishing an entity as a first-class entity is a great idea.

However, as of now, you can probably imagine that would entail some significant rewriting of my codebase, which unfortunately is something I don’t have the resources to do (believe, I wish I would), as all of my resources are currently employed in developing the Classroom integration I mentioned, and what I’m looking for is a way to bridge my application with the new integration module that I’ll be developing which will touch as little existing code as possible.

To this avail, …

I am probably better off treating these new features as third party code, which is the best way to avoid messing too much with existing code. One other thing to keep in mind is that this integration will be something entirely optional which may be disabled on a per-user basis, or even per-application-instance basis, so it has to be something not very intrusive for the rest of my app. This is why I was thinking about a hook to begin with. The missing piece of the puzzle is how to communicate the hook that a certain action needs to be fired towards the “integration engine”, i.e. this new separate module I’ll be creating.

But again,

I’ll keep this approach in mind for the future because I think it’s really good.

If you had to work with the constraints I mentioned, though, how would you approach this?

I guess if I were trying to add it in a hurry while touching as little existing code as possible, I would send extra params to the PATCH request for whether to send an announcement or not–either via querystring, or just extra params next to your other fields. Then I would extend whatever view method you’re using in DRF to look for those things (and pop them out if they’re fields). That view method would then call to its super class implementation for normal CRUD, but your view would know whether it had to take extra steps related to announcements, and it would take care of that immediately afterwards.

The integration between your LMS and Google Classroom is an interesting case. Normally, I would think that you’d want to build this in such a way that all the Google Classroom integration happens in one app that the rest of your LMS doesn’t know about. If that were the case, that integration app could listen for certain things to happen via custom signals–so for instance, there would be a signal that your LMS emits when an announcement has been created, and the integration app would listen for that signal and decide whether or not it needs to be posted in Google Classroom as well.

But in this case, it sounds like your view is specifically asking the user whether or not they want to post in Google Classroom. So I feel like there are a few ways you can go with this:

  1. If Google Classroom is the only integration like this in the foreseeable future, or has a special paired relationship with your LMS, it’s fine for your app/view to know about it, and you can invoke it directly from there.
  2. If you want to keep things more decoupled because you plan for many other integrations to be built like this, you could do it the signals way but remove the user choice–either always echo it out or make some policy decision based on settings or app logic.
  3. Another possible variant on (2) is to re-frame the user prompt to be less Google Classroom specific. For instance, if you have categories of announcements, you could have the user select a category. Your integration app could then only echo certain “important” categories to Google Classroom. So you might get the same effect of choosing to send to Google Classroom or not without the direct coupling (or building extra plugin-like code).
  4. You could also make this a genuine plugin-like interface for your LMS. So you could put whatever your LMS needs to know about this Google Classroom integration point (message to display, integration app method/class to call, etc.) into settings, and have your LMS dynamically hook those together. This is can be more work though, particularly over the long term if you want to maintain that plugin API contract over time. I personally also find these to be hard to get right without a few different implementations.
1 Like

Thank you for your detailed answer,

That is pretty much what I’m aiming for, except I still don’t know how to achieve it because, as you mentioned, there are some Classroom-specific parts involved. The idea of using signals is pretty much what I was gonna go with, except I like django-lifecycle and already use it in other places to simply checking whether specific fields have changed, therefore the idea was to use the hook methods to fire events to some sort of interface that hides everything from the main app and exposes methods to signal that certain events happened.

This way, the main app would know there may be someone listening for certain events, but not much else–this “engine” might very well discard the events if no integration is available. This would also be done on a per-user basis, since individual permissions need to be granted for my app to access Classroom with the user’s identity.

This is really interesting. I don’t know if my LMS will ever get to the level of popularity where people might want to code integrations with arbitrary third party services, but on the other hand I’m not quite sure myself which direction I’m heading with the application, so I don’t want to prematurely rule that out.

I believe other integrations may be developed in the future, for example with Moodle, but for the time being, Classroom is where it’s at.

I have been thinking for a while about how a completely generic plugin system could be developed using Django, allowing the creation of integrations which could be handled in a uniform manner. Unfortunately, at this time I don’t have much of a clue about how I could do that, partly because I still don’t fully understand what integrating LMS systems entails. I have spoken with many teachers about this and they all have different and vague ideas about what they’d want from an integration between my system and another one. Some had excellent suggestions, like the one we’re discussing here – syncing exams and announcement on my system with coursework on Classroom --, but I haven’t been able to generalize these ideas and come up with an interface that I could expose for integrations to be created.

To this avail, since you mentioned the plugin solution, do you have any working code to share as to help me get a better idea what it would look like? That would be extremely helpful to me.

Entry Points

The Open edX platform primarily uses setuptools entry points for plugins. You can see a bunch of them implemented in the main edx-platform repo itself:

So that first one shows a list of different Course Tab implementations, with a mapping of course tab types to implementing classes. The system that renders course tabs will expect some set of methods/attributes to be present on those classes and invokes those to render the content.

The plugin app could have its own entry point defined in its setup.py, like this open-ended assessment XBlock (where XBlock is the plugin interface):

So when you pip install this edx-ora2 package, this entry point mapping gets added and edx-platform can read all the installed XBlocks for that Python install.

Django Settings

All that being said, I don’t think you should use entry points for your own project. As long as your plugins only have to work within Django, it’s easier to make something that you load in settings, like how Django has you list the middleware as string paths to the module/classes you want.

So your settings entry might look something like:

EVO_LMS_PLUGINS = {
    'ANNOUNCEMENTS' = [
        'path.to.your.google.classroom.integration.ClassName',
    ]
}

I have no idea what the shape of those plugin APIs should look like for you, but that would be the general mechanism I’d use to read them in. Your LMS would then dynamically load that class with django.utils.module_loading.import_string and invoke it in the right places, assume that it has the right methods, and be very defensive and log verbosely when the plugin inevitably fails in some weird way two years later, after a seemingly very minor upgrade of some dependent package.

For Later On

Our platform is really big and heavy to install, and so a more recent effort has been to make two smaller repos specifically for plugin-related events and filters:

There’s a presentation about this effort here:

I think this is likely overkill for your situation, but you did ask what we had. :slight_smile:

1 Like

Thank you very much—I took a look at entrypoints and they’re an interesting pattern.

I like this approach and it was something I originally thought about—Once again, it’s hard for me to imagine this working if the integration cannot be “general” enough to work without my application knowing it’s calling Classroom specifically.

To see if I could come up with a general interface, I tried to write a bullet point list of some of the features I want to implement, and see how Classroom-specific they are.

This is (part of) what I came up with:

  • Courses - ability to “pair” a course on Evo – my LMS – with a class on Classroom. This is what enables all the following actions
  • Exams - when an exam is published on Evo, create a coursework entry on Classroom. Still unsure whether to also do something with exam participations, e.g. creating a corresponding submission on Classroom
  • Lessons - when a lesson is posted on Evo, create a lesson (“material”) on Classroom with a link back to the lesson on Evo
  • Announcements - when an announcement is posed on Evo, also post it to Classroom + a link to the original on Evo

And some fancier ones which aren’t Classroom-only but also involve other G-suite apps:

  • Use Evo’s engine for creating randomized exams for creating Forms on Google Forms to be used as exams
  • Allow teachers to pick files from their Drive to insert into lessons on Evo
  • Export Evo exercises to Classroom in some form, for example via creating PDF’s on Drive

Something I realized about the first 4 ones is that, taking inspiration from your first advice of treating certain actions as entities, maybe the relationship of courses being paired, or exams being synced across Evo and Classroom, should be represented using a model.

For example, I could have an ExamIntegrationInstance which is created when I publish an exam and that exam is created as coursework on Classroom. It would contain a fk to my Exam on Evo and the id of the coursework created on Classroom. The act of actually sending the API call to Classroom may be done in a class method or by its manager (probably via a retriable Celery task).

I am still unsure whether I would create such a model instance for each integration, or whether a single instance of such a model per exam would serve as the interface for all possible integrations. But the immediate advantage is that I’d keep a handle on the remote object created on Classroom, as opposed to just shooting the action and forgetting about it.

I guess the next step is to try and think of what other possible integrations might come up in the future and see if there are any similarities.

I’m considering something with MS Teams, but thinking about what the features of such an integration may be, it appears it’d be a completely different set. For example, with such an integration, I could allow users to pick files directly from the Team files folders and insert them into lessons, but that’d be very different from what I would expect from the integration with Classroom.

It’s great that you’re thinking ahead about these things, but I’d be careful about trying to come up with a generalized set of integrations this early on. By all means, make an integrations package, and apps for Classroom and Teams in that package. As you add more integrations later, you can extract the most important bits and turn them into something more pluggable that third parties can use. But dynamically loading code from settings and thinking through common functionality with theoretical future integrations probably isn’t the best use of your time.

Plugin APIs will take a lot of effort to create, debug, document, and maintain over time. You might discover that you get the vast majority of your benefits from only one or two specific APIs, and that there isn’t much value in generalizing the rest of what your Classroom integration does. Trying to generalize too early before having multiple real integrations can lead you to create unnecessary abstractions/models/coupling that make it harder to refactor things into the shape you really want later.

In any case, it sounds like you’re working on a really exciting project. Good luck!

1 Like

Thank you for the tips you’ve given me.

I started working on this integration and I came up with a first draft of part of the architecture & implementation. If you’re willing to give me some feedback based on your experience, which is clearly greater than mine, I’d appreciate it a lot.

The integration framework I designed is composed of three main parts:

  • remote twin resources
  • integration classes & the registry
  • controller classes

I’ll give a brief outline of what each one is and does.

Remote twin resouces
These are models that represent resources on third party services, (e.g. Classroom) which are paired with resources on my LMS.

Here’s the base abstract model: sai_evo_backend/models.py at classroom_integration · Evo-Learning-project/sai_evo_backend · GitHub

The base fields include an id to the remote resource and a json field to store additional data about the remote resource that could be useful. For example, when a Google Classroom course is paired with a course on my LMS, an additional piece of information that I’m interested in keeping is the permalink to the Classroom course, which may be displayed in the UI for the user’s convenience.

Subclasses of this base model add a foreign key to a resource on my LMS. For example, the GoogleClassroomCourseTwin model is the key integration between a course on my platform and one on Classroom.

These models help with keeping track of what resources have a paired resource over at Classroom, and makes it easy to dispatch any actions related to my models, as well as reflect any updates/deletes on the remote resources

Integration classes & the registry
I created an abstract base class named BaseEvoIntegration which contains a series of handler methods which can be called when certain actions happen: when an exam is published, when a participation to an exam is turned in, when a lesson is published, and so on.

The GoogleClassroonIntegration subclass implements these methods in a specific way for Google Classroom. These integration classes know about the remote service they’re interacting with, about user credentials, and they are responsible for mainly dispatching actions that affect the remote Google Classroom resources, such as creating a coursework item on Classroom when an exam is published on my LMS.

Here’s an example handler method which is called when an exam is published on my LMS:

    def on_exam_published(self, user: User, exam: Event):
        course_id = self.get_classroom_course_id_from_evo_course(exam.course)
        service = self.get_service(user)
        exam_url = exam.get_absolute_url()
        coursework_payload = get_assignment_payload(
            title=exam.name,
            description=messages.EXAM_PUBLISHED,
            exam_url=exam_url,
        )
        results = (
            service.courses()
            .courseWork()
            .create(
                courseId=course_id,
                body=coursework_payload,
            )
            .execute()
        )

As I mentioned earlier in this thread, I wanted to use Django lifecycle hooks to dispatch actions, and I wanted it to be something that the models can just “fire and forget,” without knowing the details of any integrations. So, in order to decouple models from all the integration stuff, I created a registry class.

Here’s its dispatch method:

    def dispatch(self, action_name: str, course: "Course", **kwargs):
        integrations = self.get_enabled_integrations_for(course)

        # loop over all the integrations enabled for the given course, and
        # for each of them, if the dispatched action is supported, schedule
        # the corresponding handler to be run with the given arguments
        for integration_cls in integrations:
            integration = integration_cls()
            # check the current integration supports the dispatched action
            if action_name in integration.get_available_actions():
                method = getattr(
                    integration, integration_cls.ACTION_HANDLER_PREFIX + action_name
                )
                self.schedule_integration_method_execution(method, **kwargs)
            else:
                logger.warning(
                    f"{str(integration_cls)} doesn't support action {action_name}"
                )

For now, get_enabled_integrations_for just checks whether there is a Classroom twin resource for the passed course in order to determine whether the Classoom integration is enabled for it—this is because that’s the only type of integration we have for now, of course.

action_name is expected to be a string that’s the name of a method on the integration class minus the on_ prefix. So if I get an exam_published action name, I’ll call a method named on_exam_published on the integration class.

Controller classess
As I progressed with my implementation, I reliazed not all operations could be achieved via handlers that could be called from models at specific timings.

For example, all of the above methods assume a twin resource for a course could exist, but—how is it created? Another example is roster syncing: putting aside the pub-sub notifications from Classroom which are paid, if I wanted to go the free route, one way would be to periodically poll enrolled students from the Classroom API and create enrollment model instances for each one of them who isn’t already enrolled on my LMS. The key differences between these actions and the ones found on the integration classes are: (1) they aren’t called as handlers from lifecycle hooks—in fact, they can be called by the user itself via the REST API of my application, or they may be scheduled as periodic tasks, (2) they have a different set of responsabilities—they can create, update, and delete models on my application.

So I developed a controller class which exposes some methods that essentially combine primitives from the integration classes and also use methods of model managers and other pieces of business logic of my application. Some of these methods will be called in views of my REST API, others will be used as periodic tasks, and others could possibly be called in other ways.

Here’s an example from the controller class for Google Classroom:

def associate_evo_course_to_classroom_course(
        self,
        requesting_user: User,
        course: Course,
        classroom_course_id: str,
    ) -> GoogleClassroomCourseTwin:
        # fetch Google Classroom course using given id
        classroom_course = GoogleClassroomIntegration().get_course_by_id(
            requesting_user,
            classroom_course_id,
        )
        # create a twin resource that links the given course to the
        # specified classroom course
        twin_course = GoogleClassroomCourseTwin(
            course=course,
            remote_object_id=classroom_course_id,
        )
        twin_course.set_remote_object(classroom_course)
        twin_course.save()

        return twin_course

This will fetch the remote Classroom course and create a twin resource associating it to the specificed course. Notice the call to set_remote_object: this is to supply the twin model with a dict representing the remote object, which will allow it to fill up the data JSON field I mentioned earlier.

So this is a rough description of what I have in mind for now. There’s still a lot of missing details which I’ll have to work on, such as for example authentication and error handling.

Overall, how does it look so far?

I wish I could give you a more thoughtful analysis, but this month is going to be really crazy for me, so this is more of a set of quick take reactions. FWIW, I think your general approach is reasonable, and that you should feel fine going forward with it and iterating. My overall advice is to just critically examine whether you really need to make some parts (like the registry), and to be wary of ergonomic magic that may make the expected behavior difficult to understand for future maintainers (like the on_ dispatching). This isn’t something you have to decide up front, just something to keep in mind as you’re iterating.

Minor note: I didn’t understand what “twin resources” was at first, but “paired resources” was clear to me.

I’m not clear on why there’s a JSONField copy of the attributes of the remote object? Isn’t the twin going to be linked by OneToOneField anyway?

Why have the separate REMOTE_OBJECT_FIELDS here instead of making those actual model fields that you can query by? I can see using something like a JSONField if each row’s data could differ, but if they’re all going to have the same data, wouldn’t it be easier to query and validate using model fields?

Is an abstract base class really what you want for this? All extending classes will have to implement all methods or get an instantiation error, but that’s at odds with using the existence of methods to determine what actions are available–since other integrations might have to make an empty method just to satisfy the runtime. Also, doing it with an ABC means that adding a new abstractmethod to the ABC will immediately break all integrations because they’ll be missing that method in their implementation.

Another thing to watch out for in the long run with the kind of thing you’re doing with GoogleClassroomIntegration is that it’s going to encourage having a large, catch-all class, with exams, students, credentials, announcements, etc. It’s going to grow in size and complexity over time as you incrementally add more stuff to it, and it may become difficult for people to hold in their heads and find things.

Maybe I’m missing something, but I don’t think that a custom dispatch mechanism like this gets you much over defining custom signals in a signals.py file and sending them using Django’s built-in mechanisms.

Also, it seems like you’re currently making it the dispatcher’s responsibility to figure out whether or not the integration is enabled/set for a particular course based on the integration’s model. That’s okay, if the design you’re going for more centralized control/reporting and expect a very standard way to do those lookups. Another option is to just always do the dispatch and let the integrations figure out if they want to do anything with it, since the models they’d have to look through belong to them.

(Your idea of having this functionality live outside of a model/per-model class makes total sense, btw.)

Okay, quick word of caution here: data sync is a pain. It’s very useful, but things to beware:

  • It will fall out of sync at some point, whether because of some operational issue or a bug in someone’s code, and you’re going to need some kind of admin action or management command to re-sync it. And error logging/watching to see when it breaks.
  • Whenever possible, try to make the data flow unidirectional. Polling their list of enrollments to update yours is fine. Managing those concurrently with the ones you have (e.g. they enroll via sync from Google Classroom, but unenroll manually using your LMS) will lead to some weird edge cases and bugs. And doing bi-directional sync where you’re pushing up your LMS changes and syncing their Classroom changes can cause a lot of buggy/weird behavior that will suck up way more of your time than it’s worth.

When you are doing data sync tasks like this, you might want to make a model that always reflects the state of the truth-as-you-know it from your remote source, and then locally do the logic for how to bridge that with your local LMS model for whatever that thing is. This can have its own challenges, but it does make it easier to debug when weird interactions happen.

Take care, and best of luck.

1 Like

Thank you very much for the insights.

I heard this term used somewhere in the IoT and industry 4.0 world, and thought it sounded nice haha, but I do agree it isn’t the clearest name in the world.

Those are fields from the remote object, and specifically fields that my local model doesn’t have. For example, for classroom courses, I’m keeping the link to the course on classroom, the enrollment code, the name, and the descripion of the remote course. Those are all information that belong to the remote object and wouldn’t belong in my local model.

I was probably being a little lazy when I chose to use the JSON field, but the rationale was that I didn’t really ever need to query those fields – realistically, the only querying I would do on twin resources would be by local paired object, i.e. accessing the one-to-one field – and making changes to the list of fields wouldn’t require generating migrations. Overall, that field was pretty much meant to be a “raw” representation of a subset of some object returned by an external service, with no processing, querying, or changing ever done to its fields, just meant to be used “as is” and mostly for cosmetic purposes (think of the data of classroom courses used to display a message like “this course is paired with [name of classroom course] - [link to classroom course]”).

I agree this also feels sketchy to me. I’ll have to keep an eye on this.

I like the idea of not sticking everything inside a single class, but I can’t really imagine another way without increasing the complexity too much (at least now I have to fetch and instantiate a single class for the integration), maybe “mixins as interfaces”? A set of classes with abstract methods and the integration class inherits from them all?

Probably not. The thing is that using lifecycle hooks vs custom signals at least takes away the burden of checking if specific changes have been made, that I would need to check by hand. This comes in handier than having a lot of homemade code to do all the checks, but at that point I necessarily have to fire the event from the main application as opposed to registering the signals in the integration application, and that’s where I thought the registry would come in to help.

I felt pain just imagining this feature before I wrote the first line of code for it haha.

I know this is less of a technical question and more related to the application design itself, but what would you do if you were me with regards to enrollment sync? I understand that having a unidirectional flow makes things a lot easier. However, I cannot identify either Classroom or my LMS as the primary source of data when it comes to course enrollments. There is an equal chance that either will be used by students to enroll in courses, or by teachers to enroll students.

I also want to bring the best value proposition and give the most ergonomic experience to teachers and students, and I imagined the best possible experience would be to be able to have the ability to seamlessly navigate from resources in courses in my LMS to the ones in paired Classroom courses and vice versa, and to that regard, giving the ability to enroll from either of the two systems and keeping the other one synced sounded like the best option.

There are other types of resources in which it is easier to identify the primary application. My LMS has much more complex types of exercises and exam settings than what Classroom offers (i.e. just google forms), so my application will not bother fetching assignments from Calssroom. Instead, when an exam is published on my LMS, it’ll be pushed to Classroom with a link back to the exam on my LMS, and when a student turns in their submission, the corresponding submission will be turned in on Classroom and a “url attachment” will be added to it, linking to the exam participation on my LMS.

This sounds like the idea that prompted the creation of the twin resources we discussed.

I haven’t implemented roster syncing yet, but this is the general, high level algorithms I have in mind:

(first of all, here’s the simple model I use to represent a student enrollment, called UserCourseEnrollment: https://github.com/Evo-Learning-project/sai_evo_backend/blob/classroom_integration/courses/models.py#L141)

(push mode)

  • when a student enrolls in a course, call the classroom endpoint to enroll them in the paired course
  • when a student unenrolls from a course, call the classroom endpoing to unenroll them from the paired course

(pull mode)

  • call the classroom endpoint to get the list of enrolled students in the paired course
  • for each student in that list for whom there is no enrollment on my LMS for the given course, create a UserCourseEnrollment for that user and course
  • for each student for whom there exists a UserCourseEnrollment instance in the given course but who doesn’t appear in the list of enrolled students in the classroom course, delete that enrollment

Essentially, it seems to me there are a few caveats that I might need to keep in mind:

  • for the push mode, did the (un)enrollment correctly pushed to (or deleted from) classroom? Here I might solve this creating a twin resource model for enrollments, and having a boolean field propagated which tells me if the request to sync with classroom was successful. I can preriodically retry failed tasks looking them up by this field.

  • for pull mode, a faliure in the initial call to classroom wouldn’t be much of a deal, because I haven’t made any changes to my application state at that point. It also appears to me the whole algorithm is idempotent, for should there be a failure mid-way, it would be safe to retry the whole thing in a bit.

How are you suggesting to use the model that reflect remote state in this instance? Maybe I misunderstood your comment there.

I keep coming back to it, but you could emit custom Django signals for these events, and make it the particular integration’s responsibility to catch them however they want. So in integrations/signals.py, you define something like:

from django.dispatch import Signal

exam_published = Signal() 

Somewhere in your exam code, you do:

# send or send_robust, depending on desired behavior
if integration_active_for_course(course):
    exam_published.send_robust(self, course, user, exam)

And whatever receives it would do:

from django.dispatch.dispatcher import receiver

from integrations.signals import exam_published

@receiver(exam_published)
def on_exam_published(sender, course, user, exam, **kwargs):
    # probably spin off some celery task here

If you don’t like signals, it’s straightforward to have something in the specific integration call a function in your registration API that does the connecting of events to functions. Then your integration API doesn’t have to care about what shape the specific integration decides to use (functions, methods, etc.)

FWIW though, this isn’t that big a deal either way. If your API expects a certain class to hold all the event handlers, you can just organize it so that that class is relatively empty and calls to smaller classes, or make it so that the class is composed of mixin classes that are more discretely organized.

Side note: I wouldn’t worry too much about ABCs and forcing your integrations to inherit from them. If you ever do branch out into letting third parties make integrations, it’s going to be easier if you just let them duck-type, i.e. “we’ll call these methods on your integration class, and it’s your responsibility to make sure the right methods are defined”.

So the example way early in this thread about Announcements was fairly simple because there was one source of truth (your LMS), and you replicated it out to your integration. Data goes one way, and you pretty much only have to record whether (a) the data was intended to be replicated to your integration; and (b) whether it actually made it there.

A more complex version of this can be when you need to compose data from multiple sources to be your truth, but it’s still uni-directional. So let’s say there’s a system out there that defines due dates for certain exams, based on some central content system’s publishing workflow. Most of the time, that’s the due date that a student receives when they take the exam in your LMS. But we also decide that for particular exams, a course instructor is able to change that in your local LMS (but it does not get pushed back up to the central content system). Later on, we decide that we want to be able to change that due date for individual students in our course, to give extensions for when they get sick.

In that scenario, there three moving pieces, we probably want to use three separate models for:

  1. The content publishing system’s notion of a due date.
  2. A course-exam level notion of a local LMS due date.
  3. A student-specific override of a due date.

It’s good to keep them separate because they don’t have to step on each other. Say there’s a local LMS exam due date set (2), and then later the content publishing system’s due date for that exam changes. Since they’re in separate models, you don’t really have to worry about it overwriting your LMS local dates. It’s always clear why a certain due date is shown for a given user.

Things get trickier when your data flow is bi-directional, and I would suggest that you really think about whether that complexity is worth it, and whether you can get away with other things like:

  • one time imports of class rosters from Google Classroom (after which it’s managed in your LMS only)
  • toggling at the course level to always grab enrollments from Google Classroom and use that as your source of truth.

It’s not that doing the two way sync thing is impossible. It’s just that it’s almost always sucked up more time than I expect it to, and there are going to be so many more important things to work on. :stuck_out_tongue: Enrollment issues in particular are always high priority because they are so disruptive for students when there are problems.

It sounds like a reasonable first pass at it. I’d probably need to think about it for longer than I have time for to give more useful feedback about edge cases.

That being said, when weird issues do come up and you need to figure out why these four students keep getting unenrolled from this one class, you’ll want really solid logging–not just of what is being done but why. (“Deleted user {} enrollment in course {} because it has been deleted from Google Classroom course {}…”). You probably also want a history table using something like django-simple-history. And probably soft-deletes if you don’t already do that (so marking an enrollment as inactive instead of deleting the row). In particular, you want to be really careful about cascade deleting anything based on the enrollment going away–just in case that deletion was due to some weird bug.

:slight_smile: I’m afraid I’m pretty behind on my work this week, so this is likely the last I’m going to post on this thread until the weekend. Good luck!

1 Like

Thank you once again for your feedback, I kept reflecting on it.

I realized that maybe, for the time being, it’s not necessary to sync Classroom enrollments to my LMS; what’s really crucial is the other way around: since I need to sync exams on my LMS with coursework assignments on Classroom, and likewise sync participations to my exams with submissions on Classroom, and submissions are automatically created for all enrolled students, I need to ensure a student that takes an exam on my app is also enrolled on Classroom. But that’s way easier: before taking an exam, I’m already forcing the student to enroll in the course on my platform. All I have to do is to create the enrollment on Classroom whenever it is created on Evo: sai_evo_backend/integration.py at ae0b2c2db262d9067fe69dd04fb034706f7a874d · Evo-Learning-project/sai_evo_backend · GitHub

Now I’m facing another issue though, and it’s related to using the students’ access token to perform requests to the Classroom API. While for the teacher I use incremental auth and I have a button they need to click to give my app scopes when they want to use the integration, for students I would need to obtain permissions AND keep the token in my db during their normal login, so as to not add any critical steps to be performed possibly right before an exam (and by in-the-trench experience I can tell you there’s a lot that can go wrong with a student even if they just need to click a login button and select the correct Google account).

I am in need of some help or suggestions as to how I could modify the login flow in order to be able to keep the access and refresh token stored on my backend when the student logs in.

Here’s how my current login flow works, and a description of the problem:

If you would be willing to take a minute and give me some tips for that, if you maybe have already worked on something similar, I’d be very thankful.

Note that my “custom” process for incremental Auth currently works like this:

Unfortunately, I cannot quite just do that because the frontend also needs to have the token returned by Google in order to make a request to my endpoint convert-token from drf-social-oauth2 to exchange it for an in-house token.

So as you see, it’s pretty complicated. I need the frontend to have the access token in order to get the in-house token, but I also need to keep both the access token and the refresh token by Google on my backend. But using gapi library, the refresh token isn’t contained in the response. :weary:


As an aside, one solution I’ve thought of would be to have a domain administrator grant permissions to my app and never have students actually grant them—this way, the domain administrator could then be used to perform actions on behalf of the students, removing the need to ever request them extra scopes. The issue is that this is impossible for me to test as of now, because I am just using plain gmail accounts for that, so there is no “domain” and no domain administrator. For how appealing it sounds to skip students completely, I don’t think it’s practical for me to choose that route.


Anyway, on the bright side, here’s what I have been able to achieve thus far:

  • lessons on Evo are also published on Classroom
  • same for announcements
  • exams on Evo are also published on Classroom
  • when a student participates in an exam on Evo, their submission to the corresponding coursework on Classroom gets updated with an attachment containing a link to their participation on Evo
  • when the teacher finalized the grades on the exam on Evo, the grades are copied to the submissions on Classroom and published to students there
  • when a student enrolls in a course on Evo, they are also enrolled on Classroom in the paired course

I can’t complain! Once I figure out this auth issue, I pretty much have an MVP

I’m sorry, I don’t know off the top of my head, and I don’t have the time to look into this long enough to give you a thoughtful reply. Perhaps it would be worth making a new post thread in this forum for this topic (we’ve wandered pretty far away from the original)?

Good luck.

1 Like

No worries! After a fair amount of digging, I was able to put something together that achieves it.

I’m leaving this here for anybody who might need it in the future. It’s not the most elegant thing ever, but it does the job.

The first thing is to (ouch) monkey patch the SocialTokenGrant class from drf-social-oauth2 and, at this line, turn the if guard into if request.token is None and request.code is None to allow requests with a payload not containing a token but an authorization code.

Afterwards, you supply your own backend, which in my case must inherit from social_core.backends.google.GoogleOAuth2, and you override its do_auth method like this:

from google_auth_oauthlib.flow import Flow
def do_auth(self, access_token, *args, **kwargs):
        code = self.data.get("code")
        """
            An authorization code was provided - use it to fetch a pair of
            access and refresh tokens, store them, and complete normal
            authentication flow
            https://developers.google.com/identity/protocols/oauth2/web-server#exchange-authorization-code
        """
        if code is not None:
            flow = Flow.from_client_config(
                {
                    "installed": {
                        "client_id": os.environ.get("GOOGLE_INTEGRATION_CLIENT_ID"),
                        "project_id": os.environ.get("GOOGLE_INTEGRATION_PROJECT_ID"),
                        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
                        "token_uri": "https://oauth2.googleapis.com/token",
                        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
                        "client_secret": os.environ.get(
                            "GOOGLE_INTEGRATION_CLIENT_SECRET"
                        ),
                    }
                },
                redirect_uri=os.environ.get("BASE_FRONTEND_URL"),
                scopes=None,
            )
            response = flow.fetch_token(code=code)
            access_token = response["access_token"]

            # now that we've obtained an access token, complete normal flow
            user = super().do_auth(access_token, *args, **kwargs)
            # store user's credentials for offline use
            GoogleOAuth2Credentials.create_from_auth_response(user, response)
        else:
            user = super().do_auth(access_token, *args, **kwargs)

        return user

That’s it! Took some work to figure it out, but I got it to work. An important note, which took hours of work to figure out, is that the client id, client secret, and redirect uri must match those used on the frontend to get the authorization code (even if technically the latter isn’t used by the backend).