Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grab watchdog lock before checking connection status #636

Merged
merged 3 commits into from
Jan 19, 2022
Merged

Grab watchdog lock before checking connection status #636

merged 3 commits into from
Jan 19, 2022

Conversation

pmrv
Copy link
Contributor

@pmrv pmrv commented Jan 17, 2022

Before we first check whether the connection is alive, then kick the
watchdog and only then grab the lock to make the SQL query. There's a
race condition in this as sketched below, where the watchdog acquires
the lock and closes the connection after we check that it's still alive,
but before we acquire the lock.

Main Thread                     Watchdog Thread
-----------                     ---------------

                                TIMER RUNS OUT
                                        |
                                        v
CHECK CONNECTION        ~       ACQUIRE LOCK
    |                                   |
    |                                   v
    |                           CLOSE CONNECTION
    |                                   |
    |                                   v
    |                           RELEASE LOCK
    |
    v
ACQUIRE LOCK
    |
    v
EXECUTE SQL QUERY
    |
    v
RELEASE LOCK

The solution is to protect also the checking of the connection with the
same lock, so that closing of the connection happens only before or
after we checked the connection and executed our query.

@prince-mathews Please check whether your issue from the meeting persist with this branch.

Before we first check whether the connection is alive, then kick the
watchdog and only then grab the lock to make the SQL query.  There's a
race condition in this as sketched below, where the watchdog acquires
the lock and closes the connection after we check that it's still alive,
but before we acquire the lock.

```
Main Thread                     Watchdog Thread
-----------                     ---------------

                                TIMER RUNS OUT
                                        |
                                        v
CHECK CONNECTION        ~       ACQUIRE LOCK
    |                                   |
    |                                   v
    |                           CLOSE CONNECTION
    |                                   |
    |                                   v
    |                           RELEASE LOCK
    |
    v
ACQUIRE LOCK
    |
    v
EXECUTE SQL QUERY
    |
    v
RELEASE LOCK
```

The solution is to protect also the checking of the connection with the
same lock, so that closing of the connection happens only before or
after we checked the connection *and* executed our query.
@pmrv pmrv added the bug Something isn't working label Jan 17, 2022
@jan-janssen
Copy link
Member

That might also fix the issue I worked around in #470

@prince-mathews
Copy link
Contributor

I used the branch and I don't get the error anymore. Thank you :)

@pmrv
Copy link
Contributor Author

pmrv commented Jan 18, 2022

I used the branch and I don't get the error anymore. Thank you :)

Great! This is ready to merge then imo.

@niklassiemer niklassiemer added the format_black reformat the code using the black standard label Jan 18, 2022
@niklassiemer niklassiemer removed the format_black reformat the code using the black standard label Jan 18, 2022
@pmrv pmrv merged commit 2445037 into master Jan 19, 2022
@delete-merged-branch delete-merged-branch bot deleted the watchdog branch January 19, 2022 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants