Hi everyone, I’ve been thinking about how to solve a certain problem and wouldn’t mind getting the opinions of others, to validate my thinking or maybe providing a different enlighten way to solve the problem.
Below is a simplified view of the models, to provide some context.
The technology is leveraging Bluetooth Gateways, which posts to a Django endpoint every 180 seconds, This gateway will submit any beacon data included in its whitelist. So far, I parse and update beacons and create temps in the table for storage. (Yes, I know its timeseries and potentially not super scalable in the long play, but when I have a scale problem, I tend to have $$$ to fix the scale problem).
I’m now extending to include temperature beacons with an external probe, this will be used to data log the temps of particular products, where the user would assign the probe/beacon to the Production Log, and while that log is active any temps from that beacon need to be stored with some form of marker (relational or not) to relate back to the production log for querying.
The idea being, while the smallgoods are cooling, the user opens the production log, assigns the probe, until the end-user unassigned the probe or marks the production log as completed. All temps from that probe need to be stored with relationship to the production log.
My line of thinking is this might be better split out in its own model and temperature table with another row in the temp table called LogID and LogType, which will help be flexible with different log types which exist in the application. The beacon table would include a simple Boolean for is_recording, which when true would cause the POST data from the Gateway to actually save, otherwise it will just ignore it.
Would appreciate any advice or counterpoints to help me consider other options.
I don’t see a more “straight-forward” approach, which is where I always start in the absence of more contextual information.
What I don’t know about your environment are the quantities, rates, volumes, and other potential relationships among the data, along with knowing what type of information you may be looking to extract from it, frequency and type of queries, etc. (Ultimately, collections of sensor data is useless if you can’t get the information you need from that data.)
For just an example for discussion purposes and not that I am actually recommending this, those factors above might lead me to consider factoring out LogType and LogID to a separate table, which would have the foreign key to BeaconProbe, and the Temperatures table would have an FK to “LogTypeId”.
Thanks Ken, to provide more context.
Rates, volumes (timeseries data, which is simply Datetime + Temperature) approx. 1 record every 180 seconds per device. Right now its like 20 devices, and if I get to couple of thousand, I will have more income and i would most likely hand this to a dedicated timeseries DB.
This data is used to graph the temperatures, so nothing too complex wise here.
Relationship wise, each beacon generates a temperature and that is the main focus, we then need to provide a relationship to the Production Log somehow, hence my thinking about the LogType and LogID, as the app already has multiple type of logs which are separate models (techdebt problem).
Don’t sell the “standard solutions” short.
Doing some “hand-wavey” math, if you have 3000 devices generating 1 row every 3 minutes, that’s 1000 rows / minute, or about 17 / second - pretty trivial rates. That also works out to about 1.5 million rows / day.
Just to give you an idea of what’s possible, we have (had - it’s now currently down) a project that was collecting traffic and traffic signal controller data. The traffic data was gathering (at peak) about 10 million rows / day, and traffic signal controllers generate data at 10/second, which across 82 intersections works out to 820 rows per second.
With just a couple minor tweaks to configuration, this data was all collected across three PostgreSQL databases across a 3-year span.
PostgreSQL is an amazing DB toolset and i’m not selling the standard solution short, saying it would be better to use a dedicated timeseries once I reach the scale issue, instead of having to tweak the PostgreSQL which is also holding all the other data and information used by the app.