Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

Open
uasiddiqi opened this issue Oct 30, 2024 · 3 comments · May be fixed by #7415
Open

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB #7410

uasiddiqi opened this issue Oct 30, 2024 · 3 comments · May be fixed by #7415

Comments

@uasiddiqi
Copy link

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Adaptive chunking, Compression, Query executor, Other

What happened?

We’re experiencing an intermittent issue with aggregation queries on compressed data in TimescaleDB (versions 2.16 and 2.17). Specifically, when querying a single id_number from a compressed hypertable and selecting a time period for more than a week, we encounter an error. However, this error does not occur when the query includes multiple id_number values. We have two seemingly identical environments, but only one of them exhibits this behavior.

Steps to Reproduce:

  1. Set up compressed hypertable:
  • Create a hypertable (e.g., data_5min) with compression enabled.
  • Insert sample data and ensure the table has multiple chunks.
  1. Run Query:
  • Single ID (FAILS)
SELECT 
    id_number AS idNumber,
    AVG("value") AS "value",
    time_bucket('6 hour', ts) AS timeBucket
FROM 
    data_5min
WHERE 
    id_number IN (10) 
    AND ts >= '2024-10-15 00:00:00' 
    AND ts < '2024-10-27 00:00:00'
GROUP BY 
    id_number, timeBucket
ORDER BY 
    timeBucket;
  • Multiple ID (WORKS)
SELECT 
    id_number AS idNumber,
    AVG("value") AS "value",
    time_bucket('6 hour', ts) AS timeBucket
FROM 
    data_5min
WHERE 
    id_number IN (10, 20) 
    AND ts >= '2024-10-15 00:00:00' 
    AND ts < '2024-10-27 00:00:00'
GROUP BY 
    id_number, timeBucket
ORDER BY 
    timeBucket;

whereas,

Hypertable definition:

 hypertable_schema | hypertable_name  |   owner    | num_dimensions | num_chunks | compression_enabled | tablespaces 
-------------------+------------------+------------+----------------+------------+---------------------+-------------
 public            | data_5min        |   admin    |              1 |         44 | t                   | 
(1 row)

Schema:

 column_name |        data_type         | is_nullable 
-------------+--------------------------+-------------
 id_number   | integer                  | NO
 value       | double precision         | NO
 ts          | timestamp with time zone | NO

Expected Behavior:
The query with a single id_number should execute without errors, as it does when querying with multiple IDs.

TimescaleDB version affected

2.17, 2.16

PostgreSQL version used

16.4

What operating system did you use?

Ubuntu 22.04 LTS

What installation method did you use?

Docker

What platform did you run on?

Other

Relevant log output and stack trace

Error Message:
ERROR: aggregated compressed column not found
DETAIL: Assertion ‘value_column_description != NULL’ failed.

How can we reproduce the bug?

Set up compressed hypertable:
- Create a hypertable (e.g., data_5min) with compression enabled.
- Insert sample data and ensure the table has multiple chunks.
- Run Query for single id_number as is the description
@uasiddiqi uasiddiqi added the bug label Oct 30, 2024
@uasiddiqi
Copy link
Author

TimescaleDB Forum

@akuzm
Copy link
Member

akuzm commented Oct 31, 2024

I could not reproduce it after some experiments, here's the table definition I used:

create table data_5min(drop1 int8, id_number integer, drop2 int8, value double precision, drop3 int8, ts timestamptz);
select create_hypertable('data_5min', 'ts');
insert into data_5min select 1, id_number, 1, random() value, 1, ts from generate_series(1, 100) id_number, generate_series('2024-10-10', '2024-10-20', interval '1 hour') ts;
alter table data_5min set (timescaledb.compress);
select compress_chunk(x) from show_chunks('data_5min') x;
alter table data_5min drop column drop1;
alter table data_5min drop column drop2;
alter table data_5min drop column drop3;
insert into data_5min select id_number, random() value, ts from generate_series(1, 100) id_number, generate_series('2024-10-20', '2024-11-01', interval '1 hour') ts;
select compress_chunk(x) from show_chunks('data_5min') x;

I'd like more information about your setup:

  • What are your compression settings?
  • EXPLAIN plans for both queries
  • Results of this query: \copy (with ht as (select id, compressed_hypertable_id from _timescaledb_catalog.hypertable where table_name = 'data_5min'), chunk as (select format('%s.%s', schema_name, table_name)::regclass chunk, * from _timescaledb_catalog.chunk, ht where hypertable_id in (ht.id, ht.compressed_hypertable_id)) select attrelid::regclass::text, x.*, chunk.*, pg_attribute.* from pg_attribute join chunk on attrelid = chunk left join timescaledb_information.chunks x on x.chunk_name = chunk.table_name) to 'chunks.txt' with header;

@uasiddiqi
Copy link
Author

Dear @akuzm,
Thanks for trying to work on it. I do understand that it is difficult to replicate and that is what troubled me too. As mentioned in the ticket , we have 2 identical systems and only one of those exhibits this behavior, the other one works fine. So i really don't know whats going on.

Regarding the question, i could find the following.

  • Compression setting

job_id | application_name | schedule_interval | max_runtime | max_retries | retry_period | proc_schema | proc_name | owner | scheduled | fixed_schedule | config | next_start | initial_start | hypertable_schema | hypertable_name | check_schema | check_name
--------+---------------------------+-------------------+-------------+-------------+--------------+------------------------+--------------------+--------------+-----------+----------------+---------------------------------------------------+-------------------------------+---------------+-------------------+------------------+------------------------+--------------------------
1000 | Compression Policy [1000] | 12:00:00 | 00:00:00 | -1 | 01:00:00 | _timescaledb_functions | policy_compression | "obs-admin" | t | f | {"hypertable_id": 1, "compress_after": "14 days"} | 2024-11-04 20:19:31.516068+00 | | public | data_5min | _timescaledb_functions | policy_compression_check

hypertable_schema | hypertable_name | attname | segmentby_column_index | orderby_column_index | orderby_asc | orderby_nullsfirst
-------------------+------------------+-----------+------------------------+----------------------+-------------+--------------------
public | data_5min | id_number | 1 | | |
public | data_5min | ts | | 1 | f | t
(2 rows)

  • Explain Query

EXPLAIN select id_number as idNumber,AVG("value") as "value",time_bucket('6 hour', ts) as timeBucket from data_5min where id_number IN (10) AND ts >= '2024-09-20 00:00:00' AND ts < '2024-10-21 00:00:00' group by id_number, timeBucket order by timeBucket;
ERROR: aggregated compressed column not found
DETAIL: Assertion 'value_column_description != NULL' failed

EXPLAIN select id_number as idNumber,AVG("value") as "value",time_bucket('6 hour', ts) as timeBucket from data_5min where id_number IN (10,20) AND ts >= '2024-09-20 00:00:00' AND ts < '2024-10-21 00:00:00' group by id_number, timeBucket order by timeBucket;

QUERY PLAN
Sort (cost=3097.69..3103.82 rows=2454 width=20)
Sort Key: (time_bucket('06:00:00'::interval, _hyper_1_8_chunk.ts))
-> Finalize HashAggregate (cost=2922.71..2959.52 rows=2454 width=20)
Group Key: (time_bucket('06:00:00'::interval, _hyper_1_8_chunk.ts)), _hyper_1_8_chunk.id_number
-> Append (cost=35.43..2830.68 rows=12270 width=44)
-> Partial HashAggregate (cost=35.43..66.11 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_8_chunk.ts), _hyper_1_8_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_8_chunk (cost=1.36..5.43 rows=4000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_49_chunk_id_number__ts_meta_min_1__ts_meta_idx on compress_hyper_2_49_chunk (cost=0.28..5.43 rows=4 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=35.39..66.07 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_9_chunk.ts), _hyper_1_9_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_9_chunk (cost=1.35..5.39 rows=4000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_659_chunk_id_number__ts_meta_min_1__ts_met_idx on compress_hyper_2_659_chunk (cost=0.28..5.39 rows=4 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=51.42..82.10 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_4_chunk.ts), _hyper_1_4_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_4_chunk (cost=1.07..6.42 rows=6000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_675_chunk_id_number__ts_meta_min_1__ts_met_idx on compress_hyper_2_675_chunk (cost=0.28..6.42 rows=6 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=35.39..66.07 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_10_chunk.ts), _hyper_1_10_chunk.id_number
-> Custom Scan (DecompressChunk) on _hyper_1_10_chunk (cost=1.35..5.39 rows=4000 width=20)
Vectorized Filter: ((ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Index Scan using compress_hyper_2_684_chunk_id_number__ts_meta_min_1__ts_met_idx on compress_hyper_2_684_chunk (cost=0.28..5.39 rows=4 width=88)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (_ts_meta_min_1 < '2024-10-21 00:00:00+00'::timestamp with time zone) AND (_ts_meta_max_1 >= '2024-09-20 00:00:00+00'::timestamp with time zone))
-> Partial HashAggregate (cost=2458.32..2489.00 rows=2454 width=44)
Group Key: time_bucket('06:00:00'::interval, _hyper_1_657_chunk.ts), _hyper_1_657_chunk.id_number
-> Bitmap Heap Scan on _hyper_1_657_chunk (cost=51.82..2439.99 rows=2444 width=20)
Recheck Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))
-> Bitmap Index Scan on _hyper_1_657_chunk_pk_data_5min (cost=0.00..51.20 rows=2444 width=0)
Index Cond: ((id_number = ANY ('{10,20}'::integer[])) AND (ts >= '2024-09-20 00:00:00+00'::timestamp with time zone) AND (ts < '2024-10-21 00:00:00+00'::timestamp with time zone))

  • Attached chunk.txt

Hope it helps,
I cannot share data due to proprietary restrictions but i can share meta-data if that works.
chunks.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants