[Bug]: negative bitmapset member not allowed on query with sort #7384

dx034 · 2024-10-23T08:18:28Z

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Query executor, Query planner

What happened?

Since upgrading to PostgreSQL 17 with timescaledb 2.17, I encounter the following error when performing or planning certain queries with sorting: SQL Error [XX000]: ERROR: negative bitmapset member not allowed.

Sample query:
SELECT ts, col1, col2, col3 FROM ts_rr WHERE ident=66091 order by ts

The query does not fail if the order by clause is missing. ts is a TIMESTAMP column, ident is int. The table is partitioned by time with a chunk_time_interval of 24 hours. The error also occurs if a filter is applied to ts that would restrict the lookup to one chunk. This also happens if it only filters on uncompressed chunks.

The table contains several billion rows, but with the filter the query shouldn't return more than 50 rows.
VACUUM ANALYZE ts_rr was performed before executing the query. The error also occurs on other tables with similar structure.

This might be related to the query planner as the error already occurs when I execute the query with explain.

TimescaleDB version affected

2.17.1

PostgreSQL version used

17.0

What operating system did you use?

debian bookworm

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

SQL Error [XX000]: ERROR: negative bitmapset member not allowed

How can we reproduce the bug?

I'm not really sure how to debug this further, but I'm happy to provide more information. I created a similar table and added a subset of data to that table, but wasn't able to reproduce the error. On the actual tables, the error occurs on each query that sorts the results. Therefore, I believe the amount of rows in the table causes this error.

I believe that this is connected to Postgres 17, but as the upgrade to timescaledb 2.17 was performed shortly before upgrading Postgres, I'm not able to say for sure after which upgrade this started. However, this also occurs on chunks that were added after the upgrade, so I don't believe it's caused by upgrading.

The text was updated successfully, but these errors were encountered:

antekresic · 2024-10-23T08:27:42Z

I'm afraid version 2.7.1 doesn't have support for pg17 which was only introduced with 2.17

Is this a typo i.e. you wrote 2.7 instead of 2.17 ? If not, I would suggest upgrading to 2.17

Thanks.

dx034 · 2024-10-23T08:29:00Z

Sorry, I meant 2.17.1. I initially upgraded with 2.17.0, but upgrading to 2.17.1 doesn't solve the problem.

antekresic · 2024-10-23T08:34:32Z

Can you try this query on individual chunks? For instance, doing:

SELECT ts, col1, col2, col3 FROM _timescaledb_internal._hyper_x_x_chunk WHERE ident=66091 order by ts

If it succeeds, can you post the explain plan here? I suggest trying it on a compressed and uncompressed chunk as well.
This is just to pinpoint the chunk that might be causing this.

Thanks.

dx034 · 2024-10-23T08:50:01Z

Querying the chunk directly doesn't cause the error and returns results as expected.

QUERY PLAN | ---------------------------------------------------------------------------------------------------------+ Index Only Scan using "6896_3571_ts_rr_pkey" on _hyper_8_6896_chunk (cost=0.57..38.23 rows=224 width=12)| Index Cond: ((ident = 66091) AND (ts >= (now() - '04:00:00'::interval)) AND (ts <= now())) |

The query I performed uses a restriction on the time to restrict it on one chunk, but the same query fails on the main table.

I'll try it on all chunks and will report back.

dx034 · 2024-10-23T08:53:16Z

I tested all chunks and the error occurs whenever I execute it on a compressed chunk. Executing on an uncompressed chunk directly does not lead to an error.

EDIT: It only affects chunks that were compressed since the update. So I guess there might be a data corruption in those chunks? Chunks compressed prior to upgrading are not affected.

antekresic · 2024-10-23T08:59:37Z

Can you just send what compression settings are you using on this hypertable?

Thanks.

dx034 · 2024-10-23T09:02:52Z

From select * from timescaledb_information.compression_settings:

hypertable_schema|hypertable_name|attname|segmentby_column_index|orderby_column_index|orderby_asc|orderby_nullsfirst| -----------------+---------------+-------+----------------------+--------------------+-----------+------------------+ public |ts_rr |ident | 1| | | | public |ts_rr |ts | | 1|true |false |

Sorry, the output doesn't fully align, please let me know if you need another output format or info from another table.

antekresic · 2024-10-23T09:08:37Z

Have you changed these settings on the hypertable? What I'm wondering if you have compressed chunks with different compression settings and if that might be causing the problem.

Can you check that in SELECT * from timescaledb_information.chunk_compression_settings ;?

I understand you probably have a lot of chunks, just trying to get to the bottom of the problem.

FWIW I don't think this is data corruption, more of a planning bug based on the fact data is present when the chunk is queried directly.

dx034 · 2024-10-23T09:11:51Z

I haven't changed the settings in a long time. Most of the tables affected do not have that many chunks as old chunks are deleted. Here's the output of the query. Again, it doesn't really align here but the rows look the same for all chunks:

ts_rr _timescaledb_internal._hyper_8_6842_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6751_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6765_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6771_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6702_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6782_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6855_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6794_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6802_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6695_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6812_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6825_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6673_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6833_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6683_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6723_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6714_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6733_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6863_chunk ident ts ts_rr _timescaledb_internal._hyper_8_6744_chunk ident ts

antekresic · 2024-10-23T09:18:54Z

If there isn't a lot of chunks, you could potentially fix this by recompressing the old chunks but lets not do that until we get to the bottom of why this is happening in the first place.

I'm trying to reproduce this locally, any chance you can share the schema of your hypertable?

dx034 · 2024-10-23T09:22:38Z

Here's the schema. Please note that this seems to happen to all tables with a similar schema. In one of them, the ident column also has a different data type. I could try to recompress one chunk and see if manual compression fixes it?

As said before, chunks compressed before the upgrade to pg17 (and timescale 2.17) work fine, it's just the newly compressed chunks.

CREATE TABLE public.ts_rr ( ident int4 NOT NULL, ts timestamp(0) NOT NULL, cs text NULL, a int4 NULL, s int2 NULL, vs int2 NULL, hea int2 NULL, la int4 NULL, lo7 int4 NULL, CONSTRAINT ts_fr_pkey PRIMARY KEY (hex, ts) ) TABLESPACE spillover ;

antekresic · 2024-10-23T09:37:36Z

Can you uncompress the new chunks manually which are causing issues and run the query again to get the explain plan?

I think recompressing a single chunk wouldn't help, you would have to do it to all the chunks on the hypertable. Since you have multiple tables with same structure that this is happening on, you can try doing it and see if it helps.

dx034 · 2024-10-23T09:43:59Z

Uncompressing all affected compressed chunks of one table will take a while, I'd also have to check if disk space is sufficient for that.

For now, I decompressed one of the affected chunks. After decompressing, the query worked (it doesn't return any results as the time is out of range for the chunk):
Index Scan Backward using _hyper_8_6863_chunk_ts_rr_ts_idx on _hyper_8_6863_chunk (cost=0.58..2.80 rows=1 width=12) Index Cond: ((ts >= (now() - '04:00:00'::interval)) AND (ts <= now())) Filter: (ident = 66091)

I then compressed it again and ran the same query. The error occured again: SQL Error [XX000]: ERROR: negative bitmapset member not allowed.

Prior to this, I performed VACUUM on the whole database. No errors were logged during compression.

So I guess decompressing would make the error go away, but isn't feasible due to available disk space.

dx034 · 2024-10-23T09:47:46Z

One more observation: The error does not seem to occur when I don't filter by ident. Since this is the column the table is segmented by, I guess this filter causes the issue, not the filter by time (which I can remove and still get the same error).

antekresic · 2024-10-23T09:51:26Z

Can you show what does the plan look like for older chunks which aren't causing this problem? Or actually, the whole plan would be good.

dx034 · 2024-10-23T09:53:46Z

Can you show what does the plan look like for older chunks which aren't causing this problem? Or actually, the whole plan would be good.

Sure:
Custom Scan (DecompressChunk) on _hyper_8_6733_chunk (cost=1.57..3.14 rows=2000 width=12) Vectorized Filter: ((ts <= "timestamp"(now())) AND (ts >= "timestamp"((now() - '04:00:00'::interval)))) -> Index Scan using compress_hyper_10_6768_chunk__compressed_hypertable_10_ident__ts_ on compress_hyper_10_6768_chunk (cost=0.42..3.14 rows=2 width=60) Index Cond: (ident = 66091)

antekresic · 2024-10-23T11:05:10Z

Would you mind doing:

\d+ _timescaledb_internal.compress_hyper_10_6768_chunk

Sorry for the long back and forth, I'm still trying to figure out whats going on here.

Thanks!

dx034 · 2024-10-23T11:45:23Z

No worries, thanks for your extensive help! Here's the output:

Some further stats to give you an impression of the size of each chunk: _hyper_8_6863_chunk contains ~123m entries with 63148 distinct values of ident.

dx034 · 2024-10-23T11:49:17Z

The details you requested were for a chunk that worked. Was that intended? There's the details on the (recompressed) chunk that doesn't work. It looks different:

antekresic · 2024-10-23T13:17:41Z

Could you check the max OID in your database?

SELECT MAX(oid) FROM pg_class;

Thanks.

dx034 · 2024-10-23T13:29:21Z

All the way at the top; 4294091789. I used to have some messages regarding vacuum to prevent wraparound, but since vacuuming more, that has disappeared. That's also why I ran vacuum on the db before replicating it here, hoped that this way it wouldn't play a role.

If it matters, the job with the highest oid is a timescaledb compression: relname: compress_hyper_6_2664_chunk__compressed_hypertable_6_ident__ts_me

antekresic · 2024-10-23T14:50:01Z

Yeah, that's the problem. Bug is triggered by using oids improperly for certain queries so this happens.

I'm gonna work on a bugfix for this but in the meantime, for a workaround, you might want to consider migrating to a fresh instance to reset the oid values. Don't know how big of a problem is that for you.

Hope this helps.

dx034 · 2024-10-23T14:52:12Z

Migrating over to a fresh instance wouldn't really be feasible. But I've stopped sorting by ts and will sort in the calling application. That seems to fix it for now, performance penalty also isn't too big.

Many thanks for all your help, looking forward to the fix!

vitabaks · 2024-11-02T09:29:47Z

We also encountered the error negative bitmapset member not allowed (SQLSTATE XX000) after upgrading to version 2.17.1. Unfortunately, we were unable to reproduce this error in staging environments and only experienced it in production.

@antekresic, I noticed that the MR has already been merged, but a new release has not yet been published. Could you please confirm if building from the source code of the 2.17.x branch would be sufficient?

UPD: Yes, it helped.

git clone --branch 2.17.x --single-branch https://github.com/timescale/timescaledb.git
cd timescaledb
./bootstrap
cd build && make
make install
# restart postgres

@dx034

This release contains performance improvements and bug fixes since the 2.17.1 release. We recommend that you upgrade at the next available opportunity. **Features** **Bugfixes** * timescale#7384 Fix using OIDs with bitmapsets * timescale#7388 Use-after-free in vectorized grouping by segmentby columns. **Thanks** * @dx034 for reporting an issue with negative bitmapset members due to large OIDs

@dx034

This release contains performance improvements and bug fixes since the 2.17.1 release. We recommend that you upgrade at the next available opportunity. **Features** **Bugfixes** * timescale#7384 Fix using OIDs with bitmapsets * timescale#7388 Use-after-free in vectorized grouping by segmentby columns. **Thanks** * @dx034 for reporting an issue with negative bitmapset members due to large OIDs

@dx034

This release contains performance improvements and bug fixes since the 2.17.1 release. We recommend that you upgrade at the next available opportunity. **Features** **Bugfixes** * #7384 Fix using OIDs with bitmapsets * #7388 Use-after-free in vectorized grouping by segmentby columns **Thanks** * @dx034 for reporting an issue with negative bitmapset members due to large OIDs

dx034 added the bug label Oct 23, 2024

antekresic self-assigned this Oct 23, 2024

antekresic mentioned this issue Oct 25, 2024

Fix using OIDs in bitmapset #7399

Merged

fabriziomello added the compression label Oct 25, 2024

antekresic closed this as completed in #7399 Oct 25, 2024

timescale-automation mentioned this issue Oct 25, 2024

Backport to 2.17.x: #7399: Fix using OIDs in bitmapset #7402

Merged

akuzm added this to the TimescaleDB 2.17.2 milestone Nov 5, 2024

akuzm mentioned this issue Nov 6, 2024

Release 2.17.2 -- release branch #7420

Merged

akuzm mentioned this issue Nov 6, 2024

Release 2.17.2 -- main branch #7421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: negative bitmapset member not allowed on query with sort #7384

[Bug]: negative bitmapset member not allowed on query with sort #7384

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

vitabaks commented Nov 2, 2024 •

edited

Loading

[Bug]: negative bitmapset member not allowed on query with sort #7384

[Bug]: negative bitmapset member not allowed on query with sort #7384

Comments

dx034 commented Oct 23, 2024 • edited Loading

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 • edited Loading

dx034 commented Oct 23, 2024 • edited Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 • edited Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 • edited Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024 • edited Loading

dx034 commented Oct 23, 2024

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

dx034 commented Oct 23, 2024 • edited Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024 • edited Loading

antekresic commented Oct 23, 2024

dx034 commented Oct 23, 2024

vitabaks commented Nov 2, 2024 • edited Loading

dx034 commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

antekresic commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

dx034 commented Oct 23, 2024 •

edited

Loading

vitabaks commented Nov 2, 2024 •

edited

Loading