Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when attempting to materialize multiple continuous aggregates #2308

Closed
nrlulz opened this issue Sep 3, 2020 · 9 comments
Closed

Comments

@nrlulz
Copy link

nrlulz commented Sep 3, 2020

Relevant system information:

  • OS: Ubuntu Server 18.04
  • PostgreSQL version (output of postgres --version): 12.4
  • TimescaleDB version (output of \dx in psql): 1.7.3
  • Installation method: 1.7.1 AMI

Describe the bug
I am unable to materialize more than one continuous aggregate view at a time due to deadlock

To Reproduce

CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE TABLE readings (ts timestamptz NOT NULL, device int4 NOT NULL, value float8 NOT NULL);
SELECT * FROM create_hypertable('readings', 'ts');
INSERT INTO readings VALUES ('2020-01-01 00:00:00+00', 1, 1.234), ('2020-02-01 00:00:00+00', 1, 5.678);
CREATE VIEW readings_15m WITH (timescaledb.continuous) AS SELECT time_bucket('15m', ts) AS ts, device, AVG(value) AS value FROM readings GROUP BY device, time_bucket('15m', ts);
CREATE VIEW readings_60m WITH (timescaledb.continuous) AS SELECT time_bucket('60m', ts) AS ts, device, AVG(value) AS value FROM readings GROUP BY device, time_bucket('60m', ts);

Expected behavior
The two continuous aggregates would materialize concurrently. I feel like this used to be possible, but I am not sure if this is supposed to be a supported thing or if it's a bug.

Actual behavior
One begins materializing, the other fails with a deadlock detected error.

Additional context
Output of SELECT * FROM timescaledb_information.continuous_aggregate_stats;

view_name   |completed_threshold   |invalidation_threshold|job_id|last_run_started_at|last_successful_finish|last_run_status|job_status|last_run_duration|next_scheduled_run |total_runs|total_successes|total_failures|total_crashes|
------------|----------------------|----------------------|------|-------------------|----------------------|---------------|----------|-----------------|-------------------|----------|---------------|--------------|-------------|
readings_15m|2020-01-31 16:15:00-08|2020-01-31 16:15:00-08|  1008|2020-09-02 17:17:39|   2020-09-02 17:17:39|Success        |Scheduled |  00:00:00.019543|2020-09-02 17:47:39|       149|            149|             0|            0|
readings_60m|2020-01-01 12:00:00-08|2020-01-31 16:15:00-08|  1009|2020-09-02 17:17:36|   2020-09-02 17:17:36|Failed         |Scheduled |  00:00:01.005059|2020-09-02 19:15:45|         2|              1|             1|            0|

Relevant slice of postgresql log

2020-09-03 00:17:36.971 UTC [28395] LOG:  new materialization range for public.readings (time column ts) larger than allowed in one run, truncating 2020-02-01 00:15:00+00 to 2020-01-01 10:00:00+00
2020-09-03 00:17:36.971 UTC [28395] LOG:  materializing continuous aggregate public.readings_15m: nothing to invalidate, new range up to 2020-01-01 10:00:00+00
2020-09-03 00:17:37.972 UTC [28396] LOG:  job 1009 threw an error
2020-09-03 00:17:37.972 UTC [28396] ERROR:  deadlock detected
2020-09-03 00:17:37.972 UTC [28396] DETAIL:  Process 28396 waits for ShareLock on transaction 97084915; blocked by process 28395.
        Process 28395 waits for AccessExclusiveLock on relation 12398811 of database 12398506; blocked by process 28396.
        Process 28396: <command string not enabled>
        Process 28395: <command string not enabled>
2020-09-03 00:17:37.972 UTC [28396] HINT:  See server log for query details.
2020-09-03 00:17:37.972 UTC [28396] CONTEXT:  while locking tuple (0,42) in relation "continuous_aggs_invalidation_threshold"
2020-09-03 00:17:37.974 UTC [28395] LOG:  the materialize continuous aggregate job is scheduled to run again immediately
2020-09-03 00:17:37.974 UTC [19360] LOG:  background worker "Continuous Aggregate Background Job" (PID 28396) exited with exit code 1
@RedShift1
Copy link

Does the 2.0.0 milestone mean it won't be fixed in a next 1.7.x release? Seems like a pretty serious bug.

@erimatnor
Copy link
Contributor

@RedShift1 no it doesn't mean we won't fix it for the 1.7.x branch.

@erimatnor
Copy link
Contributor

@nrlulz is this issue happening continuously, i.e., does one of the background jobs always fail or does it eventually resolve itself?

@nrlulz
Copy link
Author

nrlulz commented Sep 15, 2020

It is no single job that fails. It eventually resolves itself, I just can only run one at a time.

@erimatnor erimatnor modified the milestones: 2.0.0, 1.7.5 Sep 21, 2020
@erimatnor
Copy link
Contributor

We have yet to encounter this issue on 2.0 RC, so retargeting this for next 1.7.x release.

@erimatnor erimatnor modified the milestones: 2.0.0, 1.7.5 Oct 21, 2020
@erimatnor erimatnor removed this from the 1.7.5 milestone Jan 18, 2021
@erimatnor
Copy link
Contributor

We're trying to verify whether this is still a problem in 2.0. If so, we'll try to target a fix for that branch instead.

@NunoFilipeSantos
Copy link
Contributor

@nrlulz can you please try against our latest TimescaleDB version (2.4.2) and get back to us?

@nrlulz
Copy link
Author

nrlulz commented Oct 1, 2021

@NunoFilipeSantos I have not seen this happen in any of our databases since upgrading to 2.0. They are all running 2.4.2 now.

@NunoFilipeSantos
Copy link
Contributor

That is wonderful news! 🥇
Closing this issue, if you encounter it again on 2.X, feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants