Inefficient Query Execution

Ah well, took me some time to get this straight, always ended up writing an essay about postgres mechanics, but thats not really helpful here, so lets make this blunt:

I cannot tell you, whether TimescaleDB is a win or hindrance for your usecase, as I dont know your other business constraints.

From what you have presented here, the select query performance is bad. There are several reasons for this (beside hypertable chunk compression) and I think, that you can speed it up by multitudes with a customized vanilla table like this:

  • remove all indexes, instead go with just one BRIN index on updateat (here I assume that updateat has in fact the lowest selectivity and comes in ordered, which might not be true)
  • try to get rid of varying types for the ...id columns (e.g. use int if its an integer, or fixed width string types), with fixed width types you can easily avoid “out-of-line” storage indirections
  • condense value into a non-array field (e.g. a single text or bytea type)
  • make sure, every field is stored “in row” and not toasted/compressed at all

Thats a lot to test (and better be done on a copy-cloned table) and might not fit your demands at all, esp. around time-based aggregations, as it removes the hypertable functionality completely. Still it should make the select query a lot faster.

Sorry, I think I made it clear earlier, thanks anyway for the explanations regarding postrges, it was helpful to know them

We use timescaledb for its compression and for its optimization with time series.

What does this mean?

Did some measuring with fake data on my laptop (i7-haswell with SSD, dockered timescaledb/pg-15):

  • CREATE TABLE data (
        id int GENERATED BY DEFAULT AS IDENTITY,
        updateat TIMESTAMPTZ NOT NULL,
        lineid TEXT,
        machineid TEXT,
        itemid TEXT,
        value TEXT[]  -- test: with just TEXT here
    );
    
  • timescaledb setup:
    SELECT create_hypertable('data', 'updateat');
    -- extended test with chunk compression:
    ALTER TABLE data SET (timescaledb.compress);
    SELECT compress_chunk(i, if_not_compressed=>true)
        FROM show_chunks('data', older_than => INTERVAL ' 2 weeks') i;
    
  • vanilla table setup:
    CREATE INDEX data_updateat_idx ON data USING btree(updateat desc);
    -- test with BRIN index
    CREATE INDEX data_updateat_idx ON data USING brin(updateat);
    
  • fake data - 961921 rows (since I dont know the cardinality of your ...id columns I just faked the time progression with everything else static):
    INSERT INTO
        data(updateat, lineid, machineid, itemid, value)
    SELECT
        generate_series('2023-01-01', '2023-12-01', INTERVAL '30 sec'),
        '1',
        'blabla',
        'nix',
        '{0, ... ,249}';
    
  • test command:
    time docker exec -u postgres some-postgres psql -c "\
      SELECT value, id, (updateat + '2:00:00') AS time FROM data WHERE (
        itemid='nix'\
        AND lineid='1'\
        AND machineid='blabla'\
        AND updateat BETWEEN '2023-01-04 22:00:00+00:00' AND '2023-01-05 22:02:00+00:00'\
      ) ORDER BY updateat ASC;" > /dev/null
    

Test - select query on uncompressed hypertable vs. vanilla table: both at ~650ms, no difference measured (was < 1% with no clear trend). timescaleDB wins here in terms of offered functionality while not showing any runtime penalty. Furthermore its automatic partitioning will win the big table race in the long run.

Test - select query on uncompressed hypertable vs, compressed: Compressed shows bigger variance in runtime than uncompressed. To my surprise compressed is actually slightly faster, if the data resides in one chunk only (avoiding IO bottleneck?). But it looses ground, if rows span multiple chunks (additional decompress overhead?). The difference though is really tiny with ± <3% in runtime. Seems to be a good idea to have hypertable compression enabled.

Test - select query on TEXT[] vs. TEXT for value: Plain TEXT is ~20% faster than TEXT[] in both vanilla and hypertable setup - roughly 650ms vs. 500ms query time. (sidenote: Inserts take much longer with TEXT[] - for the fake data above it was ~42s vs. ~14s)

Did a few more tests with the vanilla table and different toast/compression settings, and BRIN vs BTREE index type:

  • toast compression only adds ~3%, thus should not be a real concern
  • BRIN was slightly faster than BTREE (~5%). It is known, that BRIN will start to pay off in much bigger tables (>10M), which I did not test with my ~1M entries. Since BRIN is more tricky to use and maintain healthy (field data should have a natural preordering), I would not bother with it until hitting a row count far beyond 10M. It is also not clear to me, if a hypertable can take advantage of it (did not check the timescaledb docs).

Summary
You should stick with the hypertable, as it provides really great additional functionality out of the box like automated partitioning and compression with almost no runtime penalty, even if you dont plan to use any of the “hyperfunctions” later on.
The switch of value from TEXT[] to TEXT gives a significant benefit for your select query (~20%), and even more for insert (~3x faster). Idk if that enough of a benefit for you to write additional field converters on python side.

Limitation
My tests have one big flaw - I did not use a multicolumn index as you have across the ...id fields. It makes not much sense to fake data for those fields without knowing their cardinality, as the query planner will change strategies based on that. So the results have limited value for you, and there still might be some speedup possible by trying different multicol index pattern (my guess here - mixin updateat early, as the hypertable is optimized for the time dimension).

Thank you for all these tests and explanations, you are amazing.

I will really consider turning the field into a TEXT, just one thing, at the moment, the only timescaledb function I use is the time_bucket(), Timescale Docs,
I don’t think it can be used anymore if I transform the field into a TEXT since an AVG(value) must be done, unless a deserializer of the TEXT field into single FLOAT/INT fields is implemented, and I don’t want all this to lead again to slowdowns.

I didn’t find much on the possibility that these indexes could improve the performances, as I understand in this blog, https://news.ycombinator.com/item?id=27068535, a timescale engineer says that they have reimplemented how the indexes work , so I don’t know if it could be useful.

This question, indices - How brin index performs on non temporal data when compared with btree index in postgresql? - Stack Overflow, said:

But you can almost never use a BRIN index. They only work if the physical order of the rows in the table is either identical or exactly opposite to the logical order of the column values you are indexing.

In my application I sometimes sort by ASC and sometimes by DESC.

I also implemented the Unnest/Cast and Discard functions at python level, what I did before at query level, seems to be faster, but seems to allocate too much RAM memory.

I did an inspection with memray, GitHub - bloomberg/memray: The endgame Python memory profiler, and the unnest and cast function eats too much of it.

    def unnest_and_cast(cls, values: List[Dict[str, List]], value_key: str, datetime_key: str, output_type: Optional[AtomicDataType] = None):
        """
        [{'a': 0, 'b': [1, 2]}, {'a': 1, 'b': [3, 4]}] --> [{'a': 0, 'b': 1}, {'a': 0, 'b': 2}, {'a': 0, 'b': 3}, {'a': 1, 'b': 4}]

        """

        if not output_type or output_type == AtomicDataType.NONE:
            logger.info('output type not passed, the return value will be a string and the discard operation will not be performed')
            return [{datetime_key: item[datetime_key], value_key: value} for item in values for value in item[value_key]]

        transformed_data = []
        atomic_type = datatype_atomic_to_atomic(output_type)

        for item in values:
            values = item[value_key]
            if atomic_type:
                values = list(map(atomic_type, values))

            for value in values:
                transformed_data.append({datetime_key: item[datetime_key], value_key: value})

        return transformed_data
    def discard_values(cls, discards: Dict[str, Optional[float]], values: List[Dict[str, List]], value_key: str, is_array: bool = False):

        discard_min = discards.get('discard_min', None)
        discard_max = discards.get('discard_max', None)

        if (discard_min is None and discard_max is None) or not values:
            return values

        try:
            if is_array:
                return [d for d in values if value_key in d and all((discard_min is None or discard_min <= val) and (discard_max is None or val <= discard_max) for val in d[value_key])]

            else:
                return [d for d in values if value_key in d and (discard_min is None or d[value_key] >= discard_min) and (discard_max is None or d[value_key] <= discard_max)]

        except Exception as exc:
            logger.exception('can\'t discard values, the return value will be undiscarded')
            return values

In the end I need to return the values ​​as per the return of this function, this too is very expensive.

def make_single_dict(list_plot: List[Dict]) -> Dict:
    """
    [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}] -> {'a': [1, 3], 'b': [2, 4]}
    """

    if not list_plot:
        return {}

    return {key: [d[key] for d in list_plot] for key in list_plot[0].keys()}

Well AVG on value cries for frontend stalls, simply for the fact, that there are so many values in the array. If you need that to stay snappy over bigger time slices, you most likely will have to think about denormalization trickery, e.g. moving the workload into materialized views with predefined granularity during the night and such…

Whatever, I tested the AVG case with the fake data from above with this command (on uncompressed hypertable):

time docker exec -u postgres some-postgres psql -c "\
    SELECT\
        time_bucket('24 hours', updateat) AS day,\
        avg(func(value))\
    FROM data\
    GROUP BY day\
    ORDER BY day\
    DESC LIMIT 10"

where func is one of these:

  • for TEXT[] type
    -- calculate AVG of array values
    CREATE FUNCTION array_avg(_data anyarray)
    RETURNS numeric
    AS
    $$
        SELECT avg(a::numeric)
        FROM unnest(_data) as a
    $$ LANGUAGE SQL;
    
  • for TEXT type
    -- calculate AVG of array values contained in a string split by delimiter
    CREATE FUNCTION numstr_avg(_data text, _delimiter text)
    RETURNS numeric
    AS
    $$
        SELECT avg(a::numeric)
        FROM unnest(string_to_array(_data, _delimiter)) AS a
    $$ LANGUAGE SQL;
    

The command takes ~20% longer for TEXT (~3.5s) than for TEXT[] (~2.9s). It already has to account 25921 rows * 250 = ~6.5M values, which ofc takes its time.

Regarding BRIN index - I would not bother with it yet. In the longer run the question is, how fast rows come in over time, and once you hit multiples of 10M you might want to re-eval things. Btw if you reach those row counts soon, you can also think about spanning the partitioning over other dimensions as well (like ranges of machineids, if they follow a well-defined distribution scheme).

Yes, that would work with BRIN (and was meant by “either identical or exactly opposite to the logical order”). The key point with BRIN is - the natural order is ideally strictly given during inserts, which is normally the case for timestamps (also you should never update timestamps disturbing that order).

Have not yet looked at your python unnest and cast function…

Well thats quite unspecific. What eats too much? Is that only temporarily peaking, or is the final dataset too big for your machine? You dont show any profiling data, also some of the used functions are unclear.

If the final data held in memory is too much for your machine, the following might help:

  • get more RAM :wink:
  • restructure code to use iterators/generators instead of list()
  • turn the datetime object (really big) into a unix timestamp (only 4 bytes)
  • use slim container abstractions, like tuples or even numpy arrays

Ofc there are also memory mapping tricks possible with python (like mmap), but this contradicts the idea to stay performant.

Addendum regarding the array_avg function above - out of curiosity I also tested it with INT[] for value avoiding the a::numeric cast. With that the command finishes in 1.4s, thus is ~2 times faster than with TEXT[]. This is also much faster during inserts (~15s), meaning that the actual issue during insert of the fake data above is the int–>text conversion (and prolly differences in varying vs. fixed width bucket creation). It does not change much for the select query, that I tested first (which is clear, as pulling values from TEXT[] vs. INT[] w’o any cast is not much different).

So proper data and type normalization actually helps a lot here perfwise by avoiding casts from the more generic TEXT[] type.

I know that I repeat myself here regarding proper normalization - what is the reason not to unpack the array values into rows? To me it seems, that you in fact want to have them separated, as both AVG and your other thread about query optimization needs them col separated.

Sorry, these are the reports I captured: WeTransfer - Send Large Files & Share Photos Online - Up to 2GB Free, you can find 2 different tests, the commands to start the visualization can be found here: Summary Reporter - memray.

This only works with Linux, so you could create a docker container, install memray in it, and run the commands memray summary celery.bin…

I’ve tried implementing the functions with numpy, they seem slower actually from the tests I’ve done

  • make_single_dict: Transform a query return with .values('a', 'b') then [{'a':1, 'b':2}, {....}] into a single dictionary with keys 'a ' and 'b' and all their values ​​in one list

  • unnest_and_cast: Exactly unnest postresql, and if given a type, map TEXT[] to that type

  • discard_values:This certainly is the function that is a little more difficult to understand by reading it, but it does nothing but discard values ​​that are either < discard_min or >= discard_max, applied to a list of dictionaries containing arrays or single numbers, here is a more readable version

result = []
for d in values:
    if all(
        (discard_min is None or discard_min <= val) and (discard_max is None or val < discard_max)
        for val in d[value_key]
    ):
        result.append(d)

return result

My app is a backend that communicates with a frontend that displays data like Grafana: https://grafana.com/, so it’s up to the user to decide which date ranges to display.

Before getting to the django project that takes the data and displays them, there is a data collection that acquires from 1000 to 2000 variables from different engines, avoiding bringing each type into its own table is that I would have to create quite a few tables for each type and manage them one with its own repository, I thought that putting them all in one would be ideal

Sorry for repeating this late. Sadly I cannot access the profiling data.

I’d suspect, that the initial construction of the numpy arrays take longer - which is clear, as it involves value copying, while python containers just aggregate pointers of already existing objects. The pointered way is more versatile, but needs much more RAM. You cannot have both, the aggressive memory optimization as with “big blob” numpy arrays removes some of the more pythonic pattern.

About those additional python functions and normalization:
They would not be needed, if you had proper normalization with column separation in the first place.

In summary it seems you suffer from 2 main issues here:

  1. high overall data volume
  2. unnormalized data with mixing different types

Point 1 creates high resource needs and no matter how you approach the data - it has to be digested either by postgres or python. Often the DBMS is the better choice here, if the data fits the relational approach (means - it is well structured and can be normalized) and needs to queried individually later on. Both seems to be the case here.
Which brings me to point 2 - if you decided to put the array data into a db, then you first goal should be to normalize things as far as possible. A possible way here would be to track the timings in one table and link the time data to other properly typed tables by a 1:n relation containing the array values column separated. This turns queries into joints (slower), but should greatly enhance the speed of things like AVG calc.
The downside of this approach is obvious - the db will grow much faster and thus take more resources over time. The data tables itself will grow very fast, thus will need proper precautions like early partitioning due to the sheer amount of incoming data.

So far all your data usage pattern suggest that you want that granularity on the data, thus this highly normalized layout should be your first goal. If a certain query later on starts to choke on that design, then you can start to think about explicit denormalization (like storing precalced AVG). Never do that right from the beginning, as that often leads to lousy db schematics.

Ofc there is also a quite different approach possible - if you dont need that level of granularity on the data, then dont put it in the db at all. There are different approaches possible here, due to the sheer among of data even HDF5 might be worth a look (can interact with numpy directly). HDF5 is also quite efficient in handling big data volumes.
This would treat those array values more blob like, and it is always a good idea not to put blob data in a database. As how to interact with it from your time tables - well store an access pointer to the HDF5 dataset in a column.

1 Like