[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

uasiddiqi · 2024-10-30T10:23:47Z

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Adaptive chunking, Compression, Query executor, Other

What happened?

We’re experiencing an intermittent issue with aggregation queries on compressed data in TimescaleDB (versions 2.16 and 2.17). Specifically, when querying a single id_number from a compressed hypertable and selecting a time period for more than a week, we encounter an error. However, this error does not occur when the query includes multiple id_number values. We have two seemingly identical environments, but only one of them exhibits this behavior.

Steps to Reproduce:

Set up compressed hypertable:

Create a hypertable (e.g., data_5min) with compression enabled.
Insert sample data and ensure the table has multiple chunks.

Run Query:

Single ID (FAILS)

SELECT 
    id_number AS idNumber,
    AVG("value") AS "value",
    time_bucket('6 hour', ts) AS timeBucket
FROM 
    data_5min
WHERE 
    id_number IN (10) 
    AND ts >= '2024-10-15 00:00:00' 
    AND ts < '2024-10-27 00:00:00'
GROUP BY 
    id_number, timeBucket
ORDER BY 
    timeBucket;

Multiple ID (WORKS)

SELECT 
    id_number AS idNumber,
    AVG("value") AS "value",
    time_bucket('6 hour', ts) AS timeBucket
FROM 
    data_5min
WHERE 
    id_number IN (10, 20) 
    AND ts >= '2024-10-15 00:00:00' 
    AND ts < '2024-10-27 00:00:00'
GROUP BY 
    id_number, timeBucket
ORDER BY 
    timeBucket;

whereas,

Hypertable definition:

 hypertable_schema | hypertable_name  |   owner    | num_dimensions | num_chunks | compression_enabled | tablespaces 
-------------------+------------------+------------+----------------+------------+---------------------+-------------
 public            | data_5min        |   admin    |              1 |         44 | t                   | 
(1 row)

Schema:

 column_name |        data_type         | is_nullable 
-------------+--------------------------+-------------
 id_number   | integer                  | NO
 value       | double precision         | NO
 ts          | timestamp with time zone | NO

Expected Behavior:
The query with a single id_number should execute without errors, as it does when querying with multiple IDs.

TimescaleDB version affected

2.17, 2.16

PostgreSQL version used

16.4

What operating system did you use?

Ubuntu 22.04 LTS

What installation method did you use?

Docker

What platform did you run on?

Other

Relevant log output and stack trace

Error Message:
ERROR: aggregated compressed column not found
DETAIL: Assertion ‘value_column_description != NULL’ failed.

How can we reproduce the bug?

Set up compressed hypertable:
- Create a hypertable (e.g., data_5min) with compression enabled.
- Insert sample data and ensure the table has multiple chunks.
- Run Query for single id_number as is the description

The text was updated successfully, but these errors were encountered:

uasiddiqi · 2024-10-30T10:51:21Z

TimescaleDB Forum

akuzm · 2024-10-31T13:00:16Z

I could not reproduce it after some experiments, here's the table definition I used:

create table data_5min(drop1 int8, id_number integer, drop2 int8, value double precision, drop3 int8, ts timestamptz);
select create_hypertable('data_5min', 'ts');
insert into data_5min select 1, id_number, 1, random() value, 1, ts from generate_series(1, 100) id_number, generate_series('2024-10-10', '2024-10-20', interval '1 hour') ts;
alter table data_5min set (timescaledb.compress);
select compress_chunk(x) from show_chunks('data_5min') x;
alter table data_5min drop column drop1;
alter table data_5min drop column drop2;
alter table data_5min drop column drop3;
insert into data_5min select id_number, random() value, ts from generate_series(1, 100) id_number, generate_series('2024-10-20', '2024-11-01', interval '1 hour') ts;
select compress_chunk(x) from show_chunks('data_5min') x;

I'd like more information about your setup:

What are your compression settings?
EXPLAIN plans for both queries
Results of this query: \copy (with ht as (select id, compressed_hypertable_id from _timescaledb_catalog.hypertable where table_name = 'data_5min'), chunk as (select format('%s.%s', schema_name, table_name)::regclass chunk, * from _timescaledb_catalog.chunk, ht where hypertable_id in (ht.id, ht.compressed_hypertable_id)) select attrelid::regclass::text, x.*, chunk.*, pg_attribute.* from pg_attribute join chunk on attrelid = chunk left join timescaledb_information.chunks x on x.chunk_name = chunk.table_name) to 'chunks.txt' with header;

uasiddiqi · 2024-11-04T11:16:58Z

Dear @akuzm,
Thanks for trying to work on it. I do understand that it is difficult to replicate and that is what troubled me too. As mentioned in the ticket , we have 2 identical systems and only one of those exhibits this behavior, the other one works fine. So i really don't know whats going on.

Regarding the question, i could find the following.

Compression setting

job_id | application_name | schedule_interval | max_runtime | max_retries | retry_period | proc_schema | proc_name | owner | scheduled | fixed_schedule | config | next_start | initial_start | hypertable_schema | hypertable_name | check_schema | check_name
--------+---------------------------+-------------------+-------------+-------------+--------------+------------------------+--------------------+--------------+-----------+----------------+---------------------------------------------------+-------------------------------+---------------+-------------------+------------------+------------------------+--------------------------
1000 | Compression Policy [1000] | 12:00:00 | 00:00:00 | -1 | 01:00:00 | _timescaledb_functions | policy_compression | "obs-admin" | t | f | {"hypertable_id": 1, "compress_after": "14 days"} | 2024-11-04 20:19:31.516068+00 | | public | data_5min | _timescaledb_functions | policy_compression_check

Explain Query

EXPLAIN select id_number as idNumber,AVG("value") as "value",time_bucket('6 hour', ts) as timeBucket from data_5min where id_number IN (10) AND ts >= '2024-09-20 00:00:00' AND ts < '2024-10-21 00:00:00' group by id_number, timeBucket order by timeBucket;
ERROR: aggregated compressed column not found
DETAIL: Assertion 'value_column_description != NULL' failed

EXPLAIN select id_number as idNumber,AVG("value") as "value",time_bucket('6 hour', ts) as timeBucket from data_5min where id_number IN (10,20) AND ts >= '2024-09-20 00:00:00' AND ts < '2024-10-21 00:00:00' group by id_number, timeBucket order by timeBucket;

QUERY PLAN
Sort (cost=3097.69..3103.82 rows=2454 width=20)
Sort Key: (time_bucket('06:00:00'::interval, _hyper_1_8_chunk.ts))
-> Finalize HashAggregate (cost=2922.71..2959.52 rows=2454 width=20)
Group Key: (time_bucket('06:00:00'::interval, _hyper_1_8_chunk.ts)), _hyper_1_8_chunk.id_number
-> Append (cost=35.43..2830.68 rows=12270 width=44)
-> Partial HashAggregate (cost=35.43..66.11 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_8_chunk.ts), _hyper_1_8_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_8_chunk (cost=1.36..5.43 rows=4000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_49_chunk_id_number__ts_meta_min_1__ts_meta_idx on compress_hyper_2_49_chunk (cost=0.28..5.43 rows=4 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=35.39..66.07 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_9_chunk.ts), _hyper_1_9_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_9_chunk (cost=1.35..5.39 rows=4000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_659_chunk_id_number__ts_meta_min_1__ts_met_idx on compress_hyper_2_659_chunk (cost=0.28..5.39 rows=4 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=51.42..82.10 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_4_chunk.ts), _hyper_1_4_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_4_chunk (cost=1.07..6.42 rows=6000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_675_chunk_id_number__ts_meta_min_1__ts_met_idx on compress_hyper_2_675_chunk (cost=0.28..6.42 rows=6 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=35.39..66.07 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_10_chunk.ts), _hyper_1_10_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_10_chunk (cost=1.35..5.39 rows=4000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_684_chunk_id_number__ts_meta_min_1__ts_met_idx on compress_hyper_2_684_chunk (cost=0.28..5.39 rows=4 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=2458.32..2489.00 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_657_chunk.ts), _hyper_1_657_chunk.id_number
-> Bitmap Heap Scan on _hyper_1_657_chunk (cost=51.82..2439.99 rows=2444 width=20)
Recheck Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Bitmap Index Scan on _hyper_1_657_chunk_pk_data_5min (cost=0.00..51.20 rows=2444 width=0)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))

Attached chunk.txt

Hope it helps,
I cannot share data due to proprietary restrictions but i can share meta-data if that works.
chunks.txt

uasiddiqi added the bug label Oct 30, 2024

akuzm linked a pull request Nov 5, 2024 that will close this issue

Fix variable resolution in vectorized aggregation planning #7415

Open

akuzm modified the milestones: TimescaleDB 2.17.2, TimescaleDB 2.18.0 Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

uasiddiqi commented Oct 30, 2024

uasiddiqi commented Oct 30, 2024

akuzm commented Oct 31, 2024

uasiddiqi commented Nov 4, 2024

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

Comments

uasiddiqi commented Oct 30, 2024

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

uasiddiqi commented Oct 30, 2024

akuzm commented Oct 31, 2024

uasiddiqi commented Nov 4, 2024