-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: negative bitmapset member not allowed on query with sort #7384
Comments
I'm afraid version 2.7.1 doesn't have support for pg17 which was only introduced with 2.17 Is this a typo i.e. you wrote 2.7 instead of 2.17 ? If not, I would suggest upgrading to 2.17 Thanks. |
Sorry, I meant 2.17.1. I initially upgraded with 2.17.0, but upgrading to 2.17.1 doesn't solve the problem. |
Can you try this query on individual chunks? For instance, doing:
If it succeeds, can you post the explain plan here? I suggest trying it on a compressed and uncompressed chunk as well. Thanks. |
Querying the chunk directly doesn't cause the error and returns results as expected.
The query I performed uses a restriction on the time to restrict it on one chunk, but the same query fails on the main table. I'll try it on all chunks and will report back. |
I tested all chunks and the error occurs whenever I execute it on a compressed chunk. Executing on an uncompressed chunk directly does not lead to an error. EDIT: It only affects chunks that were compressed since the update. So I guess there might be a data corruption in those chunks? Chunks compressed prior to upgrading are not affected. |
Can you just send what compression settings are you using on this hypertable? Thanks. |
From
Sorry, the output doesn't fully align, please let me know if you need another output format or info from another table. |
Have you changed these settings on the hypertable? What I'm wondering if you have compressed chunks with different compression settings and if that might be causing the problem. Can you check that in I understand you probably have a lot of chunks, just trying to get to the bottom of the problem. FWIW I don't think this is data corruption, more of a planning bug based on the fact data is present when the chunk is queried directly. |
I haven't changed the settings in a long time. Most of the tables affected do not have that many chunks as old chunks are deleted. Here's the output of the query. Again, it doesn't really align here but the rows look the same for all chunks:
|
If there isn't a lot of chunks, you could potentially fix this by recompressing the old chunks but lets not do that until we get to the bottom of why this is happening in the first place. I'm trying to reproduce this locally, any chance you can share the schema of your hypertable? |
Here's the schema. Please note that this seems to happen to all tables with a similar schema. In one of them, the ident column also has a different data type. I could try to recompress one chunk and see if manual compression fixes it? As said before, chunks compressed before the upgrade to pg17 (and timescale 2.17) work fine, it's just the newly compressed chunks.
|
Can you uncompress the new chunks manually which are causing issues and run the query again to get the explain plan? I think recompressing a single chunk wouldn't help, you would have to do it to all the chunks on the hypertable. Since you have multiple tables with same structure that this is happening on, you can try doing it and see if it helps. |
Uncompressing all affected compressed chunks of one table will take a while, I'd also have to check if disk space is sufficient for that. For now, I decompressed one of the affected chunks. After decompressing, the query worked (it doesn't return any results as the time is out of range for the chunk): I then compressed it again and ran the same query. The error occured again: Prior to this, I performed So I guess decompressing would make the error go away, but isn't feasible due to available disk space. |
One more observation: The error does not seem to occur when I don't filter by |
Can you show what does the plan look like for older chunks which aren't causing this problem? Or actually, the whole plan would be good. |
Sure: |
Would you mind doing:
Sorry for the long back and forth, I'm still trying to figure out whats going on here. Thanks! |
No worries, thanks for your extensive help! Here's the output:
Some further stats to give you an impression of the size of each chunk: |
The details you requested were for a chunk that worked. Was that intended? There's the details on the (recompressed) chunk that doesn't work. It looks different:
|
Could you check the max OID in your database?
Thanks. |
All the way at the top; If it matters, the job with the highest oid is a timescaledb compression: |
Yeah, that's the problem. Bug is triggered by using oids improperly for certain queries so this happens. I'm gonna work on a bugfix for this but in the meantime, for a workaround, you might want to consider migrating to a fresh instance to reset the oid values. Don't know how big of a problem is that for you. Hope this helps. |
Migrating over to a fresh instance wouldn't really be feasible. But I've stopped sorting by ts and will sort in the calling application. That seems to fix it for now, performance penalty also isn't too big. Many thanks for all your help, looking forward to the fix! |
We also encountered the error @antekresic, I noticed that the MR has already been merged, but a new release has not yet been published. Could you please confirm if building from the source code of the UPD: Yes, it helped. git clone --branch 2.17.x --single-branch https://github.com/timescale/timescaledb.git
cd timescaledb
./bootstrap
cd build && make
make install
# restart postgres |
This release contains performance improvements and bug fixes since the 2.17.1 release. We recommend that you upgrade at the next available opportunity. **Features** **Bugfixes** * timescale#7384 Fix using OIDs with bitmapsets * timescale#7388 Use-after-free in vectorized grouping by segmentby columns. **Thanks** * @dx034 for reporting an issue with negative bitmapset members due to large OIDs
This release contains performance improvements and bug fixes since the 2.17.1 release. We recommend that you upgrade at the next available opportunity. **Features** **Bugfixes** * timescale#7384 Fix using OIDs with bitmapsets * timescale#7388 Use-after-free in vectorized grouping by segmentby columns. **Thanks** * @dx034 for reporting an issue with negative bitmapset members due to large OIDs
This release contains performance improvements and bug fixes since the 2.17.1 release. We recommend that you upgrade at the next available opportunity. **Features** **Bugfixes** * #7384 Fix using OIDs with bitmapsets * #7388 Use-after-free in vectorized grouping by segmentby columns **Thanks** * @dx034 for reporting an issue with negative bitmapset members due to large OIDs
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Query executor, Query planner
What happened?
Since upgrading to PostgreSQL 17 with timescaledb 2.17, I encounter the following error when performing or planning certain queries with sorting:
SQL Error [XX000]: ERROR: negative bitmapset member not allowed
.Sample query:
SELECT ts, col1, col2, col3 FROM ts_rr WHERE ident=66091 order by ts
The query does not fail if the order by clause is missing.
ts
is a TIMESTAMP column,ident
is int. The table is partitioned by time with a chunk_time_interval of 24 hours. The error also occurs if a filter is applied tots
that would restrict the lookup to one chunk. This also happens if it only filters on uncompressed chunks.The table contains several billion rows, but with the filter the query shouldn't return more than 50 rows.
VACUUM ANALYZE ts_rr
was performed before executing the query. The error also occurs on other tables with similar structure.This might be related to the query planner as the error already occurs when I execute the query with
explain
.TimescaleDB version affected
2.17.1
PostgreSQL version used
17.0
What operating system did you use?
debian bookworm
What installation method did you use?
Deb/Apt
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
How can we reproduce the bug?
The text was updated successfully, but these errors were encountered: