Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(orchestrator): slightly improve the timeout query #2814

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

TBonnin
Copy link
Collaborator

@TBonnin TBonnin commented Oct 3, 2024

simplifying slightly the where clause selecting tasks that have timeout out from

CREATED AND created_timeout < NOW
OR STARTED AND heartbeat_timeout < NOW
OR STARTED AND completed_timeout < NOW

to

CREATED AND created_timeout < NOW
OR (STARTED AND (heartbeat_timeout < NOW OR completed_timeout < NOW`))

Query plan before

Update on tasks  (cost=17.97..26.04 rows=1 width=91)
  ->  Nested Loop  (cost=17.97..26.04 rows=1 width=91)
        ->  HashAggregate  (cost=17.41..17.42 rows=1 width=56)
              Group Key: "ANY_subquery".id
              ->  Subquery Scan on "ANY_subquery"  (cost=13.33..17.40 rows=1 width=56)
                    ->  LockRows  (cost=13.33..17.39 rows=1 width=22)
                          ->  Bitmap Heap Scan on tasks tasks_1  (cost=13.33..17.38 rows=1 width=22)
                                Recheck Cond: ((state = 'CREATED'::nango_scheduler.task_states) OR (state = 'STARTED'::nango_scheduler.task_states) OR (state = 'STARTED'::nango_scheduler.task_states))
                                Filter: (((state = 'CREATED'::nango_scheduler.task_states) AND ((starts_after + ((created_to_started_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)) OR ((state = 'STARTED'::nango_scheduler.task_states) AND ((last_heartbeat_at + ((heartbeat_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)) OR ((state = 'STARTED'::nango_scheduler.task_states) AND ((last_state_transition_at + ((started_to_completed_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)))
                                ->  BitmapOr  (cost=13.33..13.33 rows=1 width=0)
                                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                                            Index Cond: (state = 'CREATED'::nango_scheduler.task_states)
                                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                                            Index Cond: (state = 'STARTED'::nango_scheduler.task_states)
                                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                                            Index Cond: (state = 'STARTED'::nango_scheduler.task_states)
        ->  Index Scan using tasks_pkey on tasks  (cost=0.56..8.58 rows=1 width=296)
              Index Cond: (id = "ANY_subquery".id)

Query plan after

Update on tasks t  (cost=13.55..21.59 rows=1 width=123)
  CTE eligible_tasks
    ->  LockRows  (cost=8.89..12.99 rows=1 width=292)
          ->  Bitmap Heap Scan on tasks  (cost=8.89..12.98 rows=1 width=292)
                Recheck Cond: ((state = 'CREATED'::nango_scheduler.task_states) OR (state = 'STARTED'::nango_scheduler.task_states))
                Filter: (((state = 'CREATED'::nango_scheduler.task_states) AND ((starts_after + ((created_to_started_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)) OR ((state = 'STARTED'::nango_scheduler.task_states) AND (((last_heartbeat_at + ((heartbeat_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP) OR ((last_state_transition_at + ((started_to_completed_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP))))
                ->  BitmapOr  (cost=8.89..8.89 rows=1 width=0)
                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                            Index Cond: (state = 'CREATED'::nango_scheduler.task_states)
                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                            Index Cond: (state = 'STARTED'::nango_scheduler.task_states)
  ->  Nested Loop  (cost=0.56..8.60 rows=1 width=123)
        ->  CTE Scan on eligible_tasks e  (cost=0.00..0.02 rows=1 width=120)
        ->  Index Scan using tasks_pkey on tasks t  (cost=0.56..8.58 rows=1 width=22)
              Index Cond: (id = e.id)

I switched to use knex.raw because using knex syntax isn't shorter or easier to read I am also using a CTE instead of a IN subquery to make it slightly easier to read

Describe your changes

Issue ticket number and link

Checklist before requesting a review (skip if just adding/editing APIs & templates)

  • I added tests, otherwise the reason is:
  • I added observability, otherwise the reason is:
  • I added analytics, otherwise the reason is:

Copy link

linear bot commented Oct 3, 2024

@TBonnin TBonnin force-pushed the tbonnin/nan-1774/orch-timeout-query branch from e3ab019 to 70686f1 Compare October 3, 2024 16:49
@TBonnin TBonnin force-pushed the tbonnin/nan-1774/orch-timeout-query branch from 70686f1 to c4ad566 Compare October 7, 2024 11:16
@TBonnin TBonnin enabled auto-merge (squash) October 7, 2024 11:16
simplifying slightly the where clause selecting tasks that have timeout
out from
```
CREATED AND created_timeout < NOW
OR STARTED AND heartbeat_timeout < NOW
OR STARTED AND completed_timeout < NOW
```
to
```
CREATED AND created_timeout < NOW
OR STARTED AND (heartbeat_timeout < NOW OR completed_timeout < NOW`)
```

Query plan before
```
Update on tasks  (cost=17.97..26.04 rows=1 width=91)
  ->  Nested Loop  (cost=17.97..26.04 rows=1 width=91)
        ->  HashAggregate  (cost=17.41..17.42 rows=1 width=56)
              Group Key: "ANY_subquery".id
              ->  Subquery Scan on "ANY_subquery"  (cost=13.33..17.40 rows=1 width=56)
                    ->  LockRows  (cost=13.33..17.39 rows=1 width=22)
                          ->  Bitmap Heap Scan on tasks tasks_1  (cost=13.33..17.38 rows=1 width=22)
                                Recheck Cond: ((state = 'CREATED'::nango_scheduler.task_states) OR (state = 'STARTED'::nango_scheduler.task_states) OR (state = 'STARTED'::nango_scheduler.task_states))
                                Filter: (((state = 'CREATED'::nango_scheduler.task_states) AND ((starts_after + ((created_to_started_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)) OR ((state = 'STARTED'::nango_scheduler.task_states) AND ((last_heartbeat_at + ((heartbeat_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)) OR ((state = 'STARTED'::nango_scheduler.task_states) AND ((last_state_transition_at + ((started_to_completed_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)))
                                ->  BitmapOr  (cost=13.33..13.33 rows=1 width=0)
                                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                                            Index Cond: (state = 'CREATED'::nango_scheduler.task_states)
                                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                                            Index Cond: (state = 'STARTED'::nango_scheduler.task_states)
                                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                                            Index Cond: (state = 'STARTED'::nango_scheduler.task_states)
        ->  Index Scan using tasks_pkey on tasks  (cost=0.56..8.58 rows=1 width=296)
              Index Cond: (id = "ANY_subquery".id)
```

Query plan after
```
Update on tasks t  (cost=13.55..21.59 rows=1 width=123)
  CTE eligible_tasks
    ->  LockRows  (cost=8.89..12.99 rows=1 width=292)
          ->  Bitmap Heap Scan on tasks  (cost=8.89..12.98 rows=1 width=292)
                Recheck Cond: ((state = 'CREATED'::nango_scheduler.task_states) OR (state = 'STARTED'::nango_scheduler.task_states))
                Filter: (((state = 'CREATED'::nango_scheduler.task_states) AND ((starts_after + ((created_to_started_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP)) OR ((state = 'STARTED'::nango_scheduler.task_states) AND (((last_heartbeat_at + ((heartbeat_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP) OR ((last_state_transition_at + ((started_to_completed_timeout_secs)::double precision * '00:00:01'::interval)) < CURRENT_TIMESTAMP))))
                ->  BitmapOr  (cost=8.89..8.89 rows=1 width=0)
                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                            Index Cond: (state = 'CREATED'::nango_scheduler.task_states)
                      ->  Bitmap Index Scan on idx_tasks_state  (cost=0.00..4.44 rows=1 width=0)
                            Index Cond: (state = 'STARTED'::nango_scheduler.task_states)
  ->  Nested Loop  (cost=0.56..8.60 rows=1 width=123)
        ->  CTE Scan on eligible_tasks e  (cost=0.00..0.02 rows=1 width=120)
        ->  Index Scan using tasks_pkey on tasks t  (cost=0.56..8.58 rows=1 width=22)
              Index Cond: (id = e.id)
```

I switched to use knex.raw because using knex syntax isn't shorter or easier to read
I am also using a CTE instead of a IN subquery to make it slightly easier to read
@TBonnin TBonnin force-pushed the tbonnin/nan-1774/orch-timeout-query branch from c4ad566 to 27e9937 Compare October 7, 2024 11:37
@TBonnin TBonnin merged commit 0937e98 into master Oct 7, 2024
21 checks passed
@TBonnin TBonnin deleted the tbonnin/nan-1774/orch-timeout-query branch October 7, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants