Hi there, I am trying to create a django app that polls crypto prices from multiple exchanges, and I’m wondering what would be the best practice for doing this?
My ideas are to use celery to schedule a background worker task for each exchange that polls different assets prices.
An alternative would be to run a crontab script but I’m not sure if cron is a valid use case. The task is intended to run on an infinite loop in the background instead of a one-off task that runs periodically.
Appreciate advice if there’s a better alternative!
In this case, I would suggest setting it up as a custom Django management command that is started and managed by a process manager such as systemd or supervisord.
Hey thanks Ken for the fast response! Hmm, sorry just to further clarify, I am planning to create “poll” task for each exchange via an api (controlled from the frontend). The user can also terminate the active “poll” task from the frontend (via an api call).
My understanding is django commands are usually used for one-off admin tasks. Another concern would be how easy it would be to run a django command in the background or terminate one via a api view?
Sorry if these questions sound silly! Still quite new to django, but in view of my requirements, would you still think django commands is the way to go? Thanks so much for sharing your thoughts.
Yes, I’m still inclined toward using a Django command for this. But Celery Beat is another possibility to consider. I’d need more details about the requirements before trying to provide specific suggestions.
How many of these exchanges may need to be polled? (1, 10, 100, 1000, more?) How frequently do they need to be polled? (Once per second, minute, hour, day, etc?) How long does each poll request/response take?
Is the situation one of “Person starts a poll of exchange “X”, and that exchange continues to be polled at the defined intervals until someone explicitly stops it”?
Is it possible (or meaningful) for two different people to each start the polling process for the same exchange? If so, does that mean that the same exchange is going to be polled twice as often? Do both people see both sets of responses?
If not two separate sets of polling, what happens if one person starts a poll and a different person stops it?
Hey Ken! Great questions! And thanks for the clarity on the django commands.
Please find my answers to your questions below
Each user can poll up to 10 exchanges. Each poll fetches a list tradeable assets and latest market prices for each asset. The poll is repeated every hour, but I intend it to be customisable up to a minute frequency.
Is it possible (or meaningful) for two different people to each start the polling process for the same exchange? If so, does that mean that the same exchange is going to be polled twice as often? Do both people see both sets of responses?
Yes. The use case of this platform on a very high level, is for users to be notified of trading opportunities. Each poll will feed the results through a trading algorithm rule engine and notify the user if a suitable trading opportunity presents itself for an asset type on an exchange to open a long/short position.
Hence for a given exchange, there can be n-level of users polling it.
If not two separate sets of polling, what happens if one person starts a poll and a different person stops it?
The polls are separate for each user. The result of a poll are not shared amongst users. Hence, a user’s decision to terminate a specific poll task does not affect other users polling tasks.
First, ultimately, what is the maximum number of requests that you think you may need to perform in a 60-second window?
If each user can poll up to 10 exchanges, and the frequency can be customized to once/minute, then 100 users could create a situation where there are 1000 requests in a minute.
Second, how long does it take each request to process? For example, if a complete request takes 5 seconds, then each instance of your process can complete 12 requests per minute. If you have a situation where you need to complete 1000 requests in a minute, then you’re going to need about 85 processes to handle the load. On an 8-core server, that’s 9 servers dedicated just to this.
Third, what happens when someone starts a poll but never stops it? Are you planning any kind of limitation on the duration of those polling processes?
Thank you for your response. I did some deep thinking, and I think you’re right. I could simply do a system wide poll across all exchanges and update the users trades with the historical prices.
I think there’s no reason for having every user set up a long running background task to poll the individual exchanges. That could result in a lot of redundant requests made which will incur unnecessary load on the server.
Second, how long does it take each request to process? For example, if a complete request takes 5 seconds, then each instance of your process can complete 12 requests per minute. If you have a situation where you need to complete 1000 requests in a minute, then you’re going to need about 85 processes to handle the load. On an 8-core server, that’s 9 servers dedicated just to this.
Each poll request should take no longer than 2 seconds as it simply fetches historical market prices from an exchange. But the stats you posted is very important to me. Could I ask if there is any django documentation that allows me to estimate the number of processes to handle the load? Or could I ask how did you arrive on the computation of 85 processes?
This information is very valuable to me in estimating costs.
Third, what happens when someone starts a poll but never stops it? Are you planning any kind of limitation on the duration of those polling processes?
I think this is a valid point. I had planned to terminate the polls automatically for certain groups of users (eg: on a free tier plan), but its probably better to have a system wide poll that runs every 30 minutes and update the trades for every user.
I think if I transit to a system wide poll, then it makes sense to use crontab which executes a management command to fetch historical prices and update the active trades for all the users on platform.
If a request were to take 5 second, then the process can complete 12 requests in a minute. 60 seconds per minute / 5 seconds per request = 12 requests per minute
if one process can handle 12 requests in a minute and you need to support 1000 requests per minute, then: 1000 requests per minute / 12 requests per minute per process = 83.3 processes
(and I rounded it up to 85 include the minimum processes you need to also serve your Django instance and PostgreSQL.)
Again, nothing fancy, and certainly nothing you can bank on - you’d really need to do a more detailed study to see what the scaling issues may be. But it serves as a rough guide to give you a general idea of what you may be dealing with.
Or perhaps one execution per exchange. (10 exchanges is definitely manageable)