Inefficient Query Execution

This looks like you are missing the field "collection_basic_arrays"."itemid", as already pointed out by @plopidou.

I still think you are chasing the wrong rabbit to get this more performant. Is the time in the logs from consecutive python calls, thus gets called right after? If so, then your raw query already takes ~2s (at how many table entries?). A few remarks:

  • Run the query with EXPLAIN ... and EXPLAIN ANALYZE .... Check for bottlenecks, like big interim allocations (adjust postgres settings accordingly) or sequential scans.
  • Move the +2h from field eval into the where clause. Re-run explain. This should have a huge impact, if updateat is properly indexed, shifting from seq to indexed scan.
  • Do a dry-run without the unnesting, see the difference in the query planner. This most likely has the biggest impact, well see my answers in your other post…
  • Do a dry-run w’o the double precision cast. Should also have some impact - if its big enough, save values with correct type already.

where "collection_basic_arrays"."itemid", is added in the SELECT.

This happened because I was in debug mode of VSCode and opening the object from “variables”, the fields I clicked on appeared displayed, while when I start it not in DEBUG, this is the output

2023-06-13 11:27:00.433 GMT [58] LOG:  statement: SELECT (CAST(value_unnest AS double precision)) AS "value_conv", "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" FROM "collection_basic_arrays" , LATERAL unnest(value) AS value_unnest WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_X"' AND "collection_basic_arrays"."lineid" = '1' AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 06:02:00+00:00') ORDER BY "collection_basic_arrays"."updateat" ASC
2023-06-13 11:27:00.937 GMT [58] LOG:  statement: DELETE FROM "cms_caches"
2023-06-13 11:27:00.941 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.053 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.069 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.084 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.101 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.118 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.133 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.150 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.166 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.181 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.197 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.213 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.228 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.244 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.259 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.275 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.291 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.307 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.323 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.339 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.356 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.373 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.390 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.408 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.424 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.441 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.458 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21
2023-06-13 11:27:01.474 GMT [58] LOG:  statement: SELECT "collection_basic_arrays"."id", "collection_basic_arrays"."updateat" FROM "collection_basic_arrays" WHERE "collection_basic_arrays"."id" = 5929741 LIMIT 21

Is the time in the logs from consecutive python calls, thus gets called right after?

Yes

  • Move the +2h from field eval into the where clause. Re-run explain. This should have a huge impact, if updateat is properly indexed, shifting from seq to indexed scan.

I need to .order_by(updateat) or .order_by(updateat +2) ?

As sorry misread the query - was under the impression, that you filter by updateat +2 field. But you already filter correctly for updateat only, so this will not change anything.

What you can try is using a multicolumn index on the fields with the lowest selectivity (try different pattern, maybe with item_id/line_id/machine_id). Should make a big difference in getting very fast to rows of interest, but comes to a price (multicol indexes get very big rather soon, and are more exp during inserts).

I found the first error, in time_key I passed the field without the +2 conversion, so every time it went to process that field

  • save values with correct type already.

I can’t, this field has bool, float, int,…

multicolumn index

This is my table indexes

image

Oh well your second index already joins all filter fields into one big index (just a guess from the name). Does it get used by planner correctly? Can you post the query planner output here? (the output of EXPLAIN ANALYZE <your raw query>)

EXPLAIN ANALYZE SELECT (CAST(value_unnest AS double precision)) AS "value_conv", "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" 
FROM "collection_basic_arrays" , LATERAL unnest(value) AS value_unnest 
WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_Z"' 
	   AND "collection_basic_arrays"."lineid" = '1' 
	   AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' 
	   AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 06:02:00+00:00') 
	   ORDER BY "collection_basic_arrays"."updateat" ASC


"Nested Loop  (cost=0.45..4684.76 rows=220000 width=28) (actual time=10.021..33.881 rows=14750 loops=1)"
"  ->  Custom Scan (DecompressChunk) on collection_basic_arrays_70_chunk  (cost=0.44..9.58 rows=22000 width=1174) (actual time=9.976..10.106 rows=59 loops=1)"
"        Filter: ((updateat >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (updateat <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"        Rows Removed by Filter: 941"
"        ->  Index Scan using compress_hyper_4_146_chunk__compressed_hypertable_4_itemid_mach on compress_hyper_4_146_chunk  (cost=0.27..9.58 rows=22 width=166) (actual time=0.047..0.067 rows=1 loops=1)"
"              Index Cond: ((itemid = 'ns=3;s=""Motor"".""11A1"".""Out_Z""'::text) AND (machineid = '641c0b97d3c5a19affa77b7e'::text) AND (lineid = '1'::text))"
"              Filter: ((_ts_meta_max_1 >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"              Rows Removed by Filter: 89"
"  ->  Memoize  (cost=0.01..0.11 rows=10 width=32) (actual time=0.025..0.200 rows=250 loops=59)"
"        Cache Key: collection_basic_arrays_70_chunk.value"
"        Cache Mode: binary"
"        Hits: 0  Misses: 59  Evictions: 0  Overflows: 0  Memory Usage: 853kB"
"        ->  Function Scan on unnest value_unnest  (cost=0.00..0.10 rows=10 width=32) (actual time=0.020..0.073 rows=250 loops=59)"
"Planning Time: 5.759 ms"
"Execution Time: 37.401 ms"
EXPLAIN SELECT (CAST(value_unnest AS double precision)) AS "value_conv", "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" 
FROM "collection_basic_arrays" , LATERAL unnest(value) AS value_unnest 
WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_Z"' 
	   AND "collection_basic_arrays"."lineid" = '1' 
	   AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' 
	   AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 06:02:00+00:00') 
	   ORDER BY "collection_basic_arrays"."updateat" ASC


"Nested Loop  (cost=0.45..4684.76 rows=220000 width=28)"
"  ->  Custom Scan (DecompressChunk) on collection_basic_arrays_70_chunk  (cost=0.44..9.58 rows=22000 width=1174)"
"        Filter: ((updateat >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (updateat <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"        ->  Index Scan using compress_hyper_4_146_chunk__compressed_hypertable_4_itemid_mach on compress_hyper_4_146_chunk  (cost=0.27..9.58 rows=22 width=166)"
"              Index Cond: ((itemid = 'ns=3;s=""Motor"".""11A1"".""Out_Z""'::text) AND (machineid = '641c0b97d3c5a19affa77b7e'::text) AND (lineid = '1'::text))"
"              Filter: ((_ts_meta_max_1 >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"  ->  Memoize  (cost=0.01..0.11 rows=10 width=32)"
"        Cache Key: collection_basic_arrays_70_chunk.value"
"        Cache Mode: binary"
"        ->  Function Scan on unnest value_unnest  (cost=0.00..0.10 rows=10 width=32)"
EXPLAIN ANALYZE SELECT (CAST(value AS double precision[])) AS "value_conv", "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" 
FROM "collection_basic_arrays" 
WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_Z"' 
	   AND "collection_basic_arrays"."lineid" = '1' 
	   AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' 
	   AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 06:02:00+00:00') 
	   ORDER BY "collection_basic_arrays"."updateat" ASC


"Result  (cost=0.44..1494.58 rows=22000 width=52) (actual time=9.606..15.238 rows=59 loops=1)"
"  ->  Custom Scan (DecompressChunk) on collection_basic_arrays_70_chunk  (cost=0.44..9.58 rows=22000 width=1174) (actual time=9.519..9.628 rows=59 loops=1)"
"        Filter: ((updateat >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (updateat <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"        Rows Removed by Filter: 941"
"        ->  Index Scan using compress_hyper_4_146_chunk__compressed_hypertable_4_itemid_mach on compress_hyper_4_146_chunk  (cost=0.27..9.58 rows=22 width=166) (actual time=0.038..0.061 rows=1 loops=1)"
"              Index Cond: ((itemid = 'ns=3;s=""Motor"".""11A1"".""Out_Z""'::text) AND (machineid = '641c0b97d3c5a19affa77b7e'::text) AND (lineid = '1'::text))"
"              Filter: ((_ts_meta_max_1 >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"              Rows Removed by Filter: 89"
"Planning Time: 5.642 ms"
"Execution Time: 15.431 ms"
EXPLAIN ANALYZE SELECT value_unnest, "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" 
FROM "collection_basic_arrays" , LATERAL unnest(value) AS value_unnest 
WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_Z"' 
	   AND "collection_basic_arrays"."lineid" = '1' 
	   AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' 
	   AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 06:02:00+00:00') 
	   ORDER BY "collection_basic_arrays"."updateat" ASC

"Nested Loop  (cost=0.45..3584.76 rows=220000 width=52) (actual time=10.239..28.340 rows=14750 loops=1)"
"  ->  Custom Scan (DecompressChunk) on collection_basic_arrays_70_chunk  (cost=0.44..9.58 rows=22000 width=1174) (actual time=10.195..10.317 rows=59 loops=1)"
"        Filter: ((updateat >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (updateat <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"        Rows Removed by Filter: 941"
"        ->  Index Scan using compress_hyper_4_146_chunk__compressed_hypertable_4_itemid_mach on compress_hyper_4_146_chunk  (cost=0.27..9.58 rows=22 width=166) (actual time=0.037..0.057 rows=1 loops=1)"
"              Index Cond: ((itemid = 'ns=3;s=""Motor"".""11A1"".""Out_Z""'::text) AND (machineid = '641c0b97d3c5a19affa77b7e'::text) AND (lineid = '1'::text))"
"              Filter: ((_ts_meta_max_1 >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"              Rows Removed by Filter: 89"
"  ->  Memoize  (cost=0.01..0.11 rows=10 width=32) (actual time=0.024..0.196 rows=250 loops=59)"
"        Cache Key: collection_basic_arrays_70_chunk.value"
"        Cache Mode: binary"
"        Hits: 0  Misses: 59  Evictions: 0  Overflows: 0  Memory Usage: 853kB"
"        ->  Function Scan on unnest value_unnest  (cost=0.00..0.10 rows=10 width=32) (actual time=0.019..0.072 rows=250 loops=59)"
"Planning Time: 5.422 ms"
"Execution Time: 31.493 ms"

Hmm, thats very expensive already in the planning, and execution is slowed down by the LATERAL unpacking. A runtime of >20s is really not good anymore for only ~220k rows.

Can you try the following:

  • remove the lateral unnesting from the query just returning the row id (thus prefiltering for rows in question)
  • do a second query with WHERE IN with those ids, now with unnesting
  • measure both with in the planner again
EXPLAIN ANALYZE SELECT "collection_basic_arrays"."id"
FROM "collection_basic_arrays"
WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_Z"' 
	   AND "collection_basic_arrays"."lineid" = '1' 
	   AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' 
	   AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 06:02:00+00:00') 
	   ORDER BY "collection_basic_arrays"."updateat" ASC


"Custom Scan (DecompressChunk) on collection_basic_arrays_70_chunk  (cost=0.44..9.58 rows=22000 width=12) (actual time=0.273..0.313 rows=59 loops=1)"
"  Filter: ((updateat >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (updateat <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"  Rows Removed by Filter: 941"
"  ->  Index Scan using compress_hyper_4_146_chunk__compressed_hypertable_4_itemid_mach on compress_hyper_4_146_chunk  (cost=0.27..9.58 rows=22 width=134) (actual time=0.071..0.082 rows=1 loops=1)"
"        Index Cond: ((itemid = 'ns=3;s=""Motor"".""11A1"".""Out_Z""'::text) AND (machineid = '641c0b97d3c5a19affa77b7e'::text) AND (lineid = '1'::text))"
"        Filter: ((_ts_meta_max_1 >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2023-04-05 06:02:00+00'::timestamp with time zone))"
"        Rows Removed by Filter: 89"
"Planning Time: 6.628 ms"
"Execution Time: 0.357 ms"

Too Long

EXPLAIN ANALYZE SELECT (CAST(value_unnest AS double precision)) AS "value_conv", "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" 
FROM "collection_basic_arrays" , LATERAL unnest(value) AS value_unnest
WHERE "collection_basic_arrays"."id" IN (6108348, 6108349, 6108350, 6108351, 6108352, 6108353, 6108354, 6108355, 6108356, 6108357, 6108358, 6108359, 6108360, 6108361, 6108362, 6108363, 6108364, 6108365, 6108366, 6108367, 6108368, 6108369, 6108370, 6108371, 6108372, 6108373, 6108374, 6108375, 6108376, 6108377, 6108378, 6108379, 6108380, 6108381, 6108382, 6108383, 6108384, 6108385, 6108386, 6108387, 6108388, 6108389, 6108390, 6108391, 6108392, 6108393, 6108394, 6108395, 6108396, 6108397, 6108398, 6108399, 6108400, 6108401, 6108402, 6108403, 6108404, 6108405, 6108406)

"Nested Loop  (cost=0.05..1562557.70 rows=71820000 width=20) (actual time=91677.035..105215.907 rows=14750 loops=1)"
"  ->  Append  (cost=0.03..36152.82 rows=7182000 width=1190) (actual time=91671.356..105184.525 rows=59 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_5_chunk  (cost=0.03..7.16 rows=216000 width=1177) (actual time=2608.488..2610.109 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 214328"
"              ->  Seq Scan on compress_hyper_4_77_chunk  (cost=0.00..7.16 rows=216 width=124) (actual time=0.013..1.453 rows=216 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_14_chunk  (cost=0.03..5.62 rows=162000 width=1201) (actual time=1777.660..1778.643 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_116_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.007..0.748 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_21_chunk  (cost=0.03..5.62 rows=162000 width=1198) (actual time=1836.501..1837.560 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_117_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.007..0.376 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_29_chunk  (cost=0.03..8.70 rows=270000 width=1197) (actual time=3043.945..3044.666 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 267910"
"              ->  Seq Scan on compress_hyper_4_118_chunk  (cost=0.00..8.70 rows=270 width=124) (actual time=0.008..0.654 rows=270 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_40_chunk  (cost=0.03..5.62 rows=162000 width=1203) (actual time=1794.260..1794.791 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_119_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.016..0.400 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_44_chunk  (cost=0.03..5.62 rows=162000 width=1168) (actual time=2022.847..2023.312 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_120_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.008..0.597 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_45_chunk  (cost=0.03..7.16 rows=216000 width=1183) (actual time=2605.615..2606.372 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 214328"
"              ->  Seq Scan on compress_hyper_4_121_chunk  (cost=0.00..7.16 rows=216 width=124) (actual time=0.094..0.981 rows=216 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_46_chunk  (cost=0.03..7.16 rows=216000 width=1186) (actual time=2631.352..2631.777 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 214328"
"              ->  Seq Scan on compress_hyper_4_122_chunk  (cost=0.00..7.16 rows=216 width=124) (actual time=0.097..0.980 rows=216 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_47_chunk  (cost=0.03..5.62 rows=162000 width=1199) (actual time=2275.769..2276.187 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_123_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.087..0.779 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_48_chunk  (cost=0.03..5.62 rows=162000 width=1173) (actual time=2129.270..2129.708 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_124_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.122..0.808 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_49_chunk  (cost=0.03..8.70 rows=270000 width=1200) (actual time=3209.418..3209.777 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 267910"
"              ->  Seq Scan on compress_hyper_4_125_chunk  (cost=0.00..8.70 rows=270 width=124) (actual time=0.108..1.202 rows=270 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_50_chunk  (cost=0.03..5.62 rows=162000 width=1196) (actual time=2159.726..2159.936 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_126_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.129..0.846 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_51_chunk  (cost=0.03..5.62 rows=162000 width=1189) (actual time=2179.259..2179.265 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_127_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.094..0.808 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_52_chunk  (cost=0.03..7.16 rows=216000 width=1183) (actual time=2975.483..2975.774 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 214328"
"              ->  Seq Scan on compress_hyper_4_128_chunk  (cost=0.00..7.16 rows=216 width=124) (actual time=0.128..1.358 rows=216 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_53_chunk  (cost=0.03..7.16 rows=216000 width=1196) (actual time=2982.037..2982.307 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 214328"
"              ->  Seq Scan on compress_hyper_4_129_chunk  (cost=0.00..7.16 rows=216 width=124) (actual time=0.174..1.127 rows=216 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_54_chunk  (cost=0.03..5.62 rows=162000 width=1174) (actual time=2165.433..2165.438 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_130_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.107..0.744 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_55_chunk  (cost=0.03..5.62 rows=162000 width=1168) (actual time=2236.158..2236.448 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_131_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=0.108..0.742 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_56_chunk  (cost=0.03..8.70 rows=270000 width=1203) (actual time=3381.932..3382.206 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"

............................................. REPETED LINES.......................................


"              Rows Removed by Filter: 214328"
"              ->  Seq Scan on compress_hyper_4_150_chunk  (cost=0.00..7.16 rows=216 width=124) (actual time=0.097..1.014 rows=216 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_75_chunk  (cost=0.03..5.62 rows=162000 width=1195) (actual time=2036.813..2036.814 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 160746"
"              ->  Seq Scan on compress_hyper_4_151_chunk  (cost=0.00..5.62 rows=162 width=124) (actual time=1.674..2.358 rows=162 loops=1)"
"        ->  Custom Scan (DecompressChunk) on collection_basic_arrays_76_chunk  (cost=0.05..2.54 rows=54000 width=1198) (actual time=717.785..717.786 rows=0 loops=1)"
"              Filter: (id = ANY ('{6108348,6108349,6108350,6108351,6108352,6108353,6108354,6108355,6108356,6108357,6108358,6108359,6108360,6108361,6108362,6108363,6108364,6108365,6108366,6108367,6108368,6108369,6108370,6108371,6108372,6108373,6108374,6108375,6108376,6108377,6108378,6108379,6108380,6108381,6108382,6108383,6108384,6108385,6108386,6108387,6108388,6108389,6108390,6108391,6108392,6108393,6108394,6108395,6108396,6108397,6108398,6108399,6108400,6108401,6108402,6108403,6108404,6108405,6108406}'::integer[]))"
"              Rows Removed by Filter: 53582"
"              ->  Seq Scan on compress_hyper_4_152_chunk  (cost=0.00..2.54 rows=54 width=124) (actual time=0.824..1.093 rows=54 loops=1)"
"  ->  Memoize  (cost=0.01..0.11 rows=10 width=32) (actual time=0.117..0.307 rows=250 loops=59)"
"        Cache Key: collection_basic_arrays_5_chunk.value"
"        Cache Mode: binary"
"        Hits: 0  Misses: 59  Evictions: 0  Overflows: 0  Memory Usage: 853kB"
"        ->  Function Scan on unnest value_unnest  (cost=0.00..0.10 rows=10 width=32) (actual time=0.035..0.087 rows=250 loops=59)"
"Planning Time: 18.938 ms"
"JIT:"
"  Functions: 82"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 3.072 ms, Inlining 16.733 ms, Optimization 98.150 ms, Emission 116.508 ms, Total 234.463 ms"
"Execution Time: 105245.496 ms"

Yepp, as I thought - the unnesting gets you into the trouble here. Can you reshape the code, so you can just grab the array as a single value as is from the DB, any maybe unpack on python side? Then you should be in the time range of the first query here (+runtime in python to unpack values).

Should I also do the conversion to code? Can i use Pandas, Numpy, Dataframes

This would be the clean query, the analyzes of the other 2 queries, with unnest and with cast are above: Inefficient Query Execution - #13 by zN3utr4l

EXPLAIN ANALYZE SELECT value, "collection_basic_arrays"."id", ("collection_basic_arrays"."updateat" + '2:00:00') AS "time" 
FROM "collection_basic_arrays" 
WHERE ("collection_basic_arrays"."itemid" = 'ns=3;s="Motor"."11A1"."Out_Z"' 
	   AND "collection_basic_arrays"."lineid" = '1' 
	   AND "collection_basic_arrays"."machineid" = '641c0b97d3c5a19affa77b7e' 
	   AND "collection_basic_arrays"."updateat" BETWEEN '2023-04-04 22:00:00+00:00' AND '2023-04-05 22:02:00+00:00') 
	   ORDER BY "collection_basic_arrays"."updateat" ASC

"Result  (cost=0.29..2237.08 rows=33000 width=52) (actual time=10.589..1690.493 rows=17861 loops=1)"
"  ->  Custom Scan (DecompressChunk) on collection_basic_arrays_70_chunk  (cost=0.29..9.58 rows=33000 width=1174) (actual time=10.497..204.217 rows=17861 loops=1)"
"        Filter: ((updateat >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (updateat <= '2023-04-05 22:02:00+00'::timestamp with time zone))"
"        Rows Removed by Filter: 1139"
"        ->  Index Scan using compress_hyper_4_146_chunk__compressed_hypertable_4_itemid_mach on compress_hyper_4_146_chunk  (cost=0.27..9.58 rows=33 width=166) (actual time=0.034..0.197 rows=19 loops=1)"
"              Index Cond: ((itemid = 'ns=3;s=""Motor"".""11A1"".""Out_Z""'::text) AND (machineid = '641c0b97d3c5a19affa77b7e'::text) AND (lineid = '1'::text))"
"              Filter: ((_ts_meta_max_1 >= '2023-04-04 22:00:00+00'::timestamp with time zone) AND (_ts_meta_min_1 <= '2023-04-05 22:02:00+00'::timestamp with time zone))"
"              Rows Removed by Filter: 71"
"Planning Time: 4.854 ms"
"Execution Time: 1694.827 ms"

Ah this looks much more reasonable. The query itself still seems rather slow to me, but this might come from value alone, if it is quite big (I think you wrote something about 250 values per row in the other thread).

I cannot tell you, how to go on from here on - you write above, that the values might be bool, float, int,… - I am not even sure, how you managed to get that into an array type in postgres, as arrays are still type bound (are they string encoded somehow?) Isn’t such a mixup of different values/types more json-like than array-like?
Well with a json field you could get it fast converted with a proper fast json impl. You could also mimick the old cartesian product behavior by shimming db-like objects with a single value from the array in a loop. It really depends on how you further process these values.

The array itself contains all values ​​of the same type, but in the same table, each itemid can have different values, i.e. itemid x bool[], itemid y int[], so bool[] and int[] are sent as strings, simply str()

Just to make sure I understand correctly - all different array types end up in the same column with each value string-converted beforehand? Thus the real column type is text[]?

If thats the case, and you dont need any array functionality on db level at all, then you might gain more performance by doing 3 more things:

  • remove the array column, instead save the values in a single string (implement to/from converters)
  • compress array value in the string as far as possible, e.g. save “101” instead of “[true, false, true]”
  • try to fit the values into postgres’ page memory by adjusting the toast compression settings (for this to work its important not go over 8KB in a single value, default threshold is 2KB)

Avoiding the db toast and the array abstraction will remove a lot indirection in the query execution (lots of internal pointers and decompession), but also come with a price of faster page turn over (thus updates or deletions will faster fragment the table and you need more often to vacuum things).

If you cannot get <<8KB, you can also try to switch the column compression to LZ4 for postgres 14+ (see What is the new LZ4 TOAST compression in PostgreSQL 14, and how fast is it?).

With these changes the query should get much faster than the current ~1.6s. Hope this helps :slight_smile:

all different array types end up in the same column with each value string-converted beforehand? Thus the real column type is text ?

Yes

  • remove the array column, instead save the values in a single string (implement to/from converters)

This implementation can be done at DB Level or python?

  • try to fit the values into postgres’ page memory by adjusting the toast compression settings (for this to work its important not go over 8KB in a single value, default threshold is 2KB)

At the moment there seems to be no compression, I don’t know if using TimescaleDB, the Postgresql extension, changes anything, the table in question is a hypertable, the data is not saved in the actual table but in chunks of it

DataAnalyser=# \d+
                                                          List of relations
 Schema |                     Name                     |   Type   |  Owner   | Persistence | Access method |    Size    | Description 
--------+----------------------------------------------+----------+----------+-------------+---------------+------------+-------------
public | collection_basic_arrays                      | table    | postgres | permanent   | heap          | 8192 bytes | 
public | collection_basic_arrays_id_seq               | sequence | postgres | permanent   |               | 8192 bytes |

_____________________________________________________________________

DataAnalyser=# \d+ collection_basic_arrays
                                                        Table "public.collection_basic_arrays"
  Column   |           Type           | Collation | Nullable |             Default              | Storage  | Compression | Stats target | Description 
-----------+--------------------------+-----------+----------+----------------------------------+----------+-------------+--------------+-------------
 id        | integer                  |           | not null | generated by default as identity | plain    |             |              | 
 updateat  | timestamp with time zone |           | not null |                                  | plain    |             |              | 
 lineid    | text                     |           |          |                                  | extended |             |              | 
 machineid | text                     |           |          |                                  | extended |             |              | 
 itemid    | text                     |           |          |                                  | extended |             |              | 
 value     | text[]                   |           |          |                                  | extended |             |              | 
 type      | text                     |           | not null | 'BasicArray'::text               | extended |             |              | 
Indexes:
    "PK_collection_basic_arrays" PRIMARY KEY, btree (id, updateat)
    "IX_collection_basic_arrays_itemid" btree (itemid)
    "IX_collection_basic_arrays_itemid_machineid_lineid_updateat" UNIQUE, btree (itemid, machineid, lineid, updateat)
    "IX_collection_basic_arrays_itemid_updateat" UNIQUE, btree (itemid, updateat)
    "IX_collection_basic_arrays_lineid" btree (lineid)
    "IX_collection_basic_arrays_machineid" btree (machineid)
    "IX_collection_basic_arrays_updateat" btree (updateat)
Triggers:
    ts_insert_blocker BEFORE INSERT ON collection_basic_arrays FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal.collection_basic_arrays_14_chunk,
              _timescaledb_internal.collection_basic_arrays_21_chunk,
              _timescaledb_internal.collection_basic_arrays_29_chunk,
              _timescaledb_internal.collection_basic_arrays_40_chunk,
              _timescaledb_internal.collection_basic_arrays_44_chunk,
              _timescaledb_internal.collection_basic_arrays_45_chunk,
              _timescaledb_internal.collection_basic_arrays_46_chunk,
              _timescaledb_internal.collection_basic_arrays_47_chunk,
              _timescaledb_internal.collection_basic_arrays_48_chunk,
              _timescaledb_internal.collection_basic_arrays_49_chunk,
              _timescaledb_internal.collection_basic_arrays_50_chunk,
              _timescaledb_internal.collection_basic_arrays_51_chunk,
              _timescaledb_internal.collection_basic_arrays_52_chunk,
              _timescaledb_internal.collection_basic_arrays_53_chunk,
              _timescaledb_internal.collection_basic_arrays_54_chunk,
              _timescaledb_internal.collection_basic_arrays_55_chunk,
              _timescaledb_internal.collection_basic_arrays_56_chunk,
              _timescaledb_internal.collection_basic_arrays_57_chunk,
              _timescaledb_internal.collection_basic_arrays_58_chunk,
              _timescaledb_internal.collection_basic_arrays_59_chunk,
              _timescaledb_internal.collection_basic_arrays_5_chunk,
              _timescaledb_internal.collection_basic_arrays_60_chunk,
              _timescaledb_internal.collection_basic_arrays_61_chunk,
              _timescaledb_internal.collection_basic_arrays_62_chunk,
              _timescaledb_internal.collection_basic_arrays_63_chunk,
              _timescaledb_internal.collection_basic_arrays_64_chunk,
              _timescaledb_internal.collection_basic_arrays_65_chunk,
              _timescaledb_internal.collection_basic_arrays_66_chunk,
              _timescaledb_internal.collection_basic_arrays_67_chunk,
              _timescaledb_internal.collection_basic_arrays_68_chunk,
              _timescaledb_internal.collection_basic_arrays_69_chunk,
              _timescaledb_internal.collection_basic_arrays_70_chunk,
              _timescaledb_internal.collection_basic_arrays_71_chunk,
              _timescaledb_internal.collection_basic_arrays_72_chunk,
              _timescaledb_internal.collection_basic_arrays_73_chunk,
              _timescaledb_internal.collection_basic_arrays_74_chunk,
              _timescaledb_internal.collection_basic_arrays_75_chunk,
              _timescaledb_internal.collection_basic_arrays_76_chunk
Access method: heap

Ah well, took me some time to get this straight, always ended up writing an essay about postgres mechanics, but thats not really helpful here, so lets make this blunt:

I cannot tell you, whether TimescaleDB is a win or hindrance for your usecase, as I dont know your other business constraints.

From what you have presented here, the select query performance is bad. There are several reasons for this (beside hypertable chunk compression) and I think, that you can speed it up by multitudes with a customized vanilla table like this:

  • remove all indexes, instead go with just one BRIN index on updateat (here I assume that updateat has in fact the lowest selectivity and comes in ordered, which might not be true)
  • try to get rid of varying types for the ...id columns (e.g. use int if its an integer, or fixed width string types), with fixed width types you can easily avoid “out-of-line” storage indirections
  • condense value into a non-array field (e.g. a single text or bytea type)
  • make sure, every field is stored “in row” and not toasted/compressed at all

Thats a lot to test (and better be done on a copy-cloned table) and might not fit your demands at all, esp. around time-based aggregations, as it removes the hypertable functionality completely. Still it should make the select query a lot faster.

Sorry, I think I made it clear earlier, thanks anyway for the explanations regarding postrges, it was helpful to know them

We use timescaledb for its compression and for its optimization with time series.

What does this mean?

Did some measuring with fake data on my laptop (i7-haswell with SSD, dockered timescaledb/pg-15):

  • CREATE TABLE data (
        id int GENERATED BY DEFAULT AS IDENTITY,
        updateat TIMESTAMPTZ NOT NULL,
        lineid TEXT,
        machineid TEXT,
        itemid TEXT,
        value TEXT[]  -- test: with just TEXT here
    );
    
  • timescaledb setup:
    SELECT create_hypertable('data', 'updateat');
    -- extended test with chunk compression:
    ALTER TABLE data SET (timescaledb.compress);
    SELECT compress_chunk(i, if_not_compressed=>true)
        FROM show_chunks('data', older_than => INTERVAL ' 2 weeks') i;
    
  • vanilla table setup:
    CREATE INDEX data_updateat_idx ON data USING btree(updateat desc);
    -- test with BRIN index
    CREATE INDEX data_updateat_idx ON data USING brin(updateat);
    
  • fake data - 961921 rows (since I dont know the cardinality of your ...id columns I just faked the time progression with everything else static):
    INSERT INTO
        data(updateat, lineid, machineid, itemid, value)
    SELECT
        generate_series('2023-01-01', '2023-12-01', INTERVAL '30 sec'),
        '1',
        'blabla',
        'nix',
        '{0, ... ,249}';
    
  • test command:
    time docker exec -u postgres some-postgres psql -c "\
      SELECT value, id, (updateat + '2:00:00') AS time FROM data WHERE (
        itemid='nix'\
        AND lineid='1'\
        AND machineid='blabla'\
        AND updateat BETWEEN '2023-01-04 22:00:00+00:00' AND '2023-01-05 22:02:00+00:00'\
      ) ORDER BY updateat ASC;" > /dev/null
    

Test - select query on uncompressed hypertable vs. vanilla table: both at ~650ms, no difference measured (was < 1% with no clear trend). timescaleDB wins here in terms of offered functionality while not showing any runtime penalty. Furthermore its automatic partitioning will win the big table race in the long run.

Test - select query on uncompressed hypertable vs, compressed: Compressed shows bigger variance in runtime than uncompressed. To my surprise compressed is actually slightly faster, if the data resides in one chunk only (avoiding IO bottleneck?). But it looses ground, if rows span multiple chunks (additional decompress overhead?). The difference though is really tiny with ± <3% in runtime. Seems to be a good idea to have hypertable compression enabled.

Test - select query on TEXT[] vs. TEXT for value: Plain TEXT is ~20% faster than TEXT[] in both vanilla and hypertable setup - roughly 650ms vs. 500ms query time. (sidenote: Inserts take much longer with TEXT[] - for the fake data above it was ~42s vs. ~14s)

Did a few more tests with the vanilla table and different toast/compression settings, and BRIN vs BTREE index type:

  • toast compression only adds ~3%, thus should not be a real concern
  • BRIN was slightly faster than BTREE (~5%). It is known, that BRIN will start to pay off in much bigger tables (>10M), which I did not test with my ~1M entries. Since BRIN is more tricky to use and maintain healthy (field data should have a natural preordering), I would not bother with it until hitting a row count far beyond 10M. It is also not clear to me, if a hypertable can take advantage of it (did not check the timescaledb docs).

Summary
You should stick with the hypertable, as it provides really great additional functionality out of the box like automated partitioning and compression with almost no runtime penalty, even if you dont plan to use any of the “hyperfunctions” later on.
The switch of value from TEXT[] to TEXT gives a significant benefit for your select query (~20%), and even more for insert (~3x faster). Idk if that enough of a benefit for you to write additional field converters on python side.

Limitation
My tests have one big flaw - I did not use a multicolumn index as you have across the ...id fields. It makes not much sense to fake data for those fields without knowing their cardinality, as the query planner will change strategies based on that. So the results have limited value for you, and there still might be some speedup possible by trying different multicol index pattern (my guess here - mixin updateat early, as the hypertable is optimized for the time dimension).

Thank you for all these tests and explanations, you are amazing.

I will really consider turning the field into a TEXT, just one thing, at the moment, the only timescaledb function I use is the time_bucket(), Timescale Docs,
I don’t think it can be used anymore if I transform the field into a TEXT since an AVG(value) must be done, unless a deserializer of the TEXT field into single FLOAT/INT fields is implemented, and I don’t want all this to lead again to slowdowns.

I didn’t find much on the possibility that these indexes could improve the performances, as I understand in this blog, https://news.ycombinator.com/item?id=27068535, a timescale engineer says that they have reimplemented how the indexes work , so I don’t know if it could be useful.

This question, indices - How brin index performs on non temporal data when compared with btree index in postgresql? - Stack Overflow, said:

But you can almost never use a BRIN index. They only work if the physical order of the rows in the table is either identical or exactly opposite to the logical order of the column values you are indexing.

In my application I sometimes sort by ASC and sometimes by DESC.

I also implemented the Unnest/Cast and Discard functions at python level, what I did before at query level, seems to be faster, but seems to allocate too much RAM memory.

I did an inspection with memray, GitHub - bloomberg/memray: The endgame Python memory profiler, and the unnest and cast function eats too much of it.

    def unnest_and_cast(cls, values: List[Dict[str, List]], value_key: str, datetime_key: str, output_type: Optional[AtomicDataType] = None):
        """
        [{'a': 0, 'b': [1, 2]}, {'a': 1, 'b': [3, 4]}] --> [{'a': 0, 'b': 1}, {'a': 0, 'b': 2}, {'a': 0, 'b': 3}, {'a': 1, 'b': 4}]

        """

        if not output_type or output_type == AtomicDataType.NONE:
            logger.info('output type not passed, the return value will be a string and the discard operation will not be performed')
            return [{datetime_key: item[datetime_key], value_key: value} for item in values for value in item[value_key]]

        transformed_data = []
        atomic_type = datatype_atomic_to_atomic(output_type)

        for item in values:
            values = item[value_key]
            if atomic_type:
                values = list(map(atomic_type, values))

            for value in values:
                transformed_data.append({datetime_key: item[datetime_key], value_key: value})

        return transformed_data
    def discard_values(cls, discards: Dict[str, Optional[float]], values: List[Dict[str, List]], value_key: str, is_array: bool = False):

        discard_min = discards.get('discard_min', None)
        discard_max = discards.get('discard_max', None)

        if (discard_min is None and discard_max is None) or not values:
            return values

        try:
            if is_array:
                return [d for d in values if value_key in d and all((discard_min is None or discard_min <= val) and (discard_max is None or val <= discard_max) for val in d[value_key])]

            else:
                return [d for d in values if value_key in d and (discard_min is None or d[value_key] >= discard_min) and (discard_max is None or d[value_key] <= discard_max)]

        except Exception as exc:
            logger.exception('can\'t discard values, the return value will be undiscarded')
            return values

In the end I need to return the values ​​as per the return of this function, this too is very expensive.

def make_single_dict(list_plot: List[Dict]) -> Dict:
    """
    [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}] -> {'a': [1, 3], 'b': [2, 4]}
    """

    if not list_plot:
        return {}

    return {key: [d[key] for d in list_plot] for key in list_plot[0].keys()}

Well AVG on value cries for frontend stalls, simply for the fact, that there are so many values in the array. If you need that to stay snappy over bigger time slices, you most likely will have to think about denormalization trickery, e.g. moving the workload into materialized views with predefined granularity during the night and such…

Whatever, I tested the AVG case with the fake data from above with this command (on uncompressed hypertable):

time docker exec -u postgres some-postgres psql -c "\
    SELECT\
        time_bucket('24 hours', updateat) AS day,\
        avg(func(value))\
    FROM data\
    GROUP BY day\
    ORDER BY day\
    DESC LIMIT 10"

where func is one of these:

  • for TEXT[] type
    -- calculate AVG of array values
    CREATE FUNCTION array_avg(_data anyarray)
    RETURNS numeric
    AS
    $$
        SELECT avg(a::numeric)
        FROM unnest(_data) as a
    $$ LANGUAGE SQL;
    
  • for TEXT type
    -- calculate AVG of array values contained in a string split by delimiter
    CREATE FUNCTION numstr_avg(_data text, _delimiter text)
    RETURNS numeric
    AS
    $$
        SELECT avg(a::numeric)
        FROM unnest(string_to_array(_data, _delimiter)) AS a
    $$ LANGUAGE SQL;
    

The command takes ~20% longer for TEXT (~3.5s) than for TEXT[] (~2.9s). It already has to account 25921 rows * 250 = ~6.5M values, which ofc takes its time.

Regarding BRIN index - I would not bother with it yet. In the longer run the question is, how fast rows come in over time, and once you hit multiples of 10M you might want to re-eval things. Btw if you reach those row counts soon, you can also think about spanning the partitioning over other dimensions as well (like ranges of machineids, if they follow a well-defined distribution scheme).

Yes, that would work with BRIN (and was meant by “either identical or exactly opposite to the logical order”). The key point with BRIN is - the natural order is ideally strictly given during inserts, which is normally the case for timestamps (also you should never update timestamps disturbing that order).

Have not yet looked at your python unnest and cast function…

Well thats quite unspecific. What eats too much? Is that only temporarily peaking, or is the final dataset too big for your machine? You dont show any profiling data, also some of the used functions are unclear.

If the final data held in memory is too much for your machine, the following might help:

  • get more RAM :wink:
  • restructure code to use iterators/generators instead of list()
  • turn the datetime object (really big) into a unix timestamp (only 4 bytes)
  • use slim container abstractions, like tuples or even numpy arrays

Ofc there are also memory mapping tricks possible with python (like mmap), but this contradicts the idea to stay performant.