Best practice for async API/WS to third party service

Hello everyone, sorry if this question may seem off-topic, but I need some guidance to understand what the best practice is for a web app I’m developing with Django, Uvicorn, and Redis. In my web app, each user can:

  • request a series of machine data that I provide through an API call to a third-party server that has a database
  • enable a WebSocket connection with the machine they want in order to allow real-time information exchange (both incoming and outgoing to the machine)

The points I’m uncertain about are as follows:

  1. Since multiple users can request the same data from the third-party database, I thought of creating a local copy of that database to minimize the API calls (as they are also limited). The issue is that the database is really huge, so it’s not feasible unless I allocate a lot of space and resources to keep the system in sync. What’s the best approach here?
  2. To allow the client to communicate with the machine via WebSocket, I need to go through the Django server (as there’s also a unique access key for each user) because the communication must be maintained even when the client is closed. Do I need to establish a WebSocket connection for each user?

What are the best practices to make such a solution scalable over time? It’s important, however, that the WebSocket communication latency is as low as possible.

Any thoughts or suggestions?

Thank you

Can you confirm or clarify what connections are needed here?

From what I’m reading, I’m understanding the situation to be this:

[ Third party DB ] ← API request/response → [ Django server ]

[ Client browsers ] ← websocket → [ Django Channels consumer ]

Is this correct?
(Your references using terms of “machine” and “client” here is ambiguous because there are multiple machines involved and from the perspective of the third party db, the Django server would be a client.)

It might also help if you quantified some of the scale involved here.

Limited how?

What’s the rate of updates?

What do you consider huge? What do you consider “a lot of resources”?

What communications?

Side note:

What are your timing constraints for this?

1 Like

Hi Ken, thanks for your response.

I apologize if my question was unclear, I’ll try to explain the application context better: by “client” I mean the front-end part of the application, and by “machine” I mean a production machine in a manufacturing department.

Specifically, here is the breakdown of the two scenarios:

  1. [ FE Client Browsers ] → API request → [ Django server ] → API request → [ Third party Server (with DB) ] → API response → [ Django server ] + (saving data to my DB?) → API response → [ FE Client Browsers ]
  2. [ FE Client Browsers ] ← WebSocket → [ Django Channels consumer ] ← WebSocket → [ Production Machine ]

My concern, in point 1, is related to the best method for handling multiple potentially identical API requests (for example, when 10 FE Client Browsers request the exact same data); what’s the best way to handle this situation? Should I create a local copy of the third-party DB?

In point 2, my concern is how to handle the potential WebSocket connections that might open between different FE clients and different production machines.

Additionally, the API call discussed in point 1 is simply the historical data that the machine sends via WebSocket in real time (with the WebSocket, I only receive one piece of data at a time, which I could eventually store, but this brings me back to the issue in point 1).

Moreover, the API call in point 1 is currently being handled using Celery, in order to run the request in the background.

What’s the rate of updates?
Nearly 100 updates per minute

What do you consider huge? What do you consider “a lot of resources”?
By “huge” i consider nearly 1 TB of disk space
By “a lot of resources” I consider constantly syncing my local DB to the third party DB (not sure how many resources, but surely a 24/7 work.

What communications?
The websocket communications of the point 2

What are your timing constraints for this?
Less than 5 seconds

If you need further clarification, I’m available.

Thanks a lot!

Without having a complete understanding of the data being retrieved, how it’s structured, what the usage patterns are, amount of data per request, number of users, and how it changes over time, the frequency of retrievals, it’s tough to say. It might be possible to cache data retrieved and use that to augment data retrieved from the server. Or you might be better off duplicating the data. It’s just tough to say from the information you can provide here. (And this really isn’t something practical or even appropriate for us to try and address here, because this gets outside the realm of being a Django issue.)

From the diagram is doesn’t look like the clients are connecting to the production machines. What you have are websocket connections being opened by Channels to the production machine, which would be separate connections from the clients connecting to Channels.

What I would recommend in this situation would be to create a Channels worker to connect to the production machine, and forward the data being received to a Channels group. Then, whatever client browsers want to retrieve that data, they can join that group. Those (client browser) consumers would get the data sent by the worker and be able to forward it out to those respective browsers.

2 Likes