Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Crypt Key missing" - Worker (pid:XX) was sent SIGKILL! #8065

Closed
ghost opened this issue Oct 24, 2024 · 2 comments
Closed

"Crypt Key missing" - Worker (pid:XX) was sent SIGKILL! #8065

ghost opened this issue Oct 24, 2024 · 2 comments
Assignees
Milestone

Comments

@ghost
Copy link

ghost commented Oct 24, 2024

Please note that security bugs or issues should be reported to [email protected].

Describe the bug

Whenever a query is consuming resources, it produces an error that makes pgadmin container restart and give an SIGKILL before doing so:

172.25.9.248 - - [24/Oct/2024:15:37:40 +0000] "GET /sqleditor/poll/8544899 HTTP/1.1" 200 453 "https://pgadmin-pgadmin-prod.apps.dev.ocp.domain.com/sqleditor/panel/8544899?is_query_tool=true&sgid=361&sid=1856&did=16413&database_name=eutras01-prod" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
172.25.9.248 - - [24/Oct/2024:15:37:41 +0000] "GET /sqleditor/poll/8544899 HTTP/1.1" 200 453 "https://pgadmin-pgadmin-prod.apps.dev.ocp.domain.com/sqleditor/panel/8544899?is_query_tool=true&sgid=361&sid=1856&did=16413&database_name=eutras01-prod" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36 Edg/130.0.0.0"
[2024-10-24 15:37:41 +0000] [1] [ERROR] **Worker (pid:21) was sent SIGKILL! Perhaps out of memory?**
[2024-10-24 15:37:41 +0000] [122] [INFO] Booting worker with pid: 122
2024-10-24 15:37:48,566: INFO	pgadmin:	########################################################
2024-10-24 15:37:48,566: INFO	pgadmin:	Starting pgAdmin 4 v8.12...
2024-10-24 15:37:48,566: INFO	pgadmin:	########################################################
2024-10-24 15:37:48,566: DEBUG	pgadmin:	Python syspath: ['/pgadmin4', '/venv/bin', '/pgadmin4', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/venv/lib/python3.12/site-packages', '/usr/lib/python3.12/site-packages', '/venv/lib/python3.12/site-packages/setuptools/_vendor']
2024-10-24 15:37:50,774: INFO	pgadmin:	Registering blueprint module: <AboutModule 'about'>
2024-10-24 15:37:50,775: INFO	pgadmin:	Registering blueprint module: <AuthenticateModule 'authenticate'>
2024-10-24 15:37:50,776: INFO	pgadmin:	Registering blueprint module: <BrowserModule 'browser'>
2024-10-24 15:37:53,981: INFO	pgadmin:	Registering blueprint module: <DashboardModule 'dashboard'>
2024-10-24 15:37:54,044: INFO	pgadmin:	Registering blueprint module: <HelpModule 'help'>
2024-10-24 15:37:54,044: INFO	pgadmin:	Registering blueprint module: <MiscModule 'misc'>
2024-10-24 15:37:56,478: INFO	pgadmin:	Registering blueprint module: <PreferencesModule 'preferences'>
2024-10-24 15:37:56,482: INFO	pgadmin:	Registering blueprint module: <PgAdminModule 'redirects'>
2024-10-24 15:37:56,484: INFO	pgadmin:	Registering blueprint module: <SettingsModule 'settings'>
2024-10-24 15:37:56,488: INFO	pgadmin:	Registering blueprint module: <ToolsModule 'tools'>
2024-10-24 15:37:58,266: DEBUG	pgadmin:	Config server mode: True
2024-10-24 15:37:58,267: DEBUG	pgadmin:	Not running under the desktop runtime, port: 5050
2024-10-24 15:37:59,647: ERROR	pgadmin:	'pinged'
Traceback (most recent call last):
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/misc/__init__.py", line 154, in cleanup
    driver.ping()
  File "/pgadmin4/pgadmin/utils/driver/__init__.py", line 34, in ping
    DriverRegistry._objects[type].gc_timeout()
  File "/pgadmin4/pgadmin/utils/driver/psycopg3/__init__.py", line 253, in gc_timeout
    if curr_time - sess_mgr['pinged'] >= session_idle_timeout:
                   ~~~~~~~~^^^^^^^^^^
KeyError: 'pinged' 

This is for the Crypt Key Missing part:

2024-10-24 15:39:53,980: INFO	pgadmin:	Released a lock.
2024-10-24 15:39:53,980: INFO	pgadmin:	Failed to connect to the database server(#1856) for connection (DB:postgres) with error message as below:connection failed: connection to server at "10.183.96.169", port 5444 failed: fe_sendauth: no password supplied
2024-10-24 15:39:53,980: ERROR	pgadmin:	'CONN:3930432'
Traceback (most recent call last):
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/browser/utils.py", line 309, in dispatch_request
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv/lib/python3.12/site-packages/flask_login/utils.py", line 290, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 304, in inner
    return mfa_enabled(
           ^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 169, in mfa_enabled
    return execute_if_enabled()
           ^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 301, in if_else_func_inner
    return _func(first, second)
           ^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 242, in mfa_session_authenticated
    return authenticated() if session.get('mfa_authenticated', False) is True \
           ^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/authenticate/mfa/utils.py", line 297, in execute_func
    return wrapped(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/user_login_check.py", line 22, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/browser/server_groups/servers/__init__.py", line 994, in properties
    manager = driver.connection_manager(sid)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pgadmin4/pgadmin/utils/driver/psycopg3/__init__.py", line 117, in connection_manager
    manager._restore_connections()
  File "/pgadmin4/pgadmin/utils/driver/psycopg3/server_manager.py", line 393, in _restore_connections
    conn = self.connections[conn_id]
           ~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'CONN:3930432'

The pod then is restarted immediately and the user receives an error informing "Crypt Key Missing", because the pgadmin pod doesn't handle the SIGKILL gracefully and doesn't show the master password prompt again.

And the pod is restarted so fast, that pgadmin still shows the query editor, but you have to refresh the whole thing (F5) to make it work again. There's no autorefresh or any disconnection.

To Reproduce

Access a database through the pgadmin container and timeout it. We are trying to query a 90M row query and have 6Gb limit on the pod and 600m core.
The query is very bad, yes: SELECT * from schema.table; but we're trying to reproduce the error that some users have reporterd recently from different dbs and clusters.

Expected behavior

I understand that calculating if a query is going to timeout is extremely complicated (if not impossible) so I would suggest either showing another error (such as query timeout or some other) instead of SIGKILL and killing the app. Because then the container would be killled, then reloaded. Plus the password prompt is not shown once is restarted, it shows the Crypt missing error but you have to manually refresh the tool.

If there is a setting we can use to handle this from a pgadmin perspective, please advise on how to do this (how timeouts are handled or wait time), if not, maybe handling the timout somehow to at least then show a message from the system, such as "Query timed out, session disconnected" and killing the session, not the whole thing.

If you query the db directly from the db, the query takes a long time, but it's doable.

Error message

"Crypt Key is missing" from pgadmin. From the logs, I've attached the messages on the previous sections.

Screenshots

There's no OOM issue, no threshold has been surpassed.

image

Here's our CPU usage for the pod:
image

Here's the message:
image

Additional context

We're deploying the app with helm into Openshift, pgadmin 4 image version is REL-8_12-21-gff838e43d. Please let me know if there's more info you need.

Thank you!

@ghost ghost added the Bug label Oct 24, 2024
@adityatoshniwal
Copy link
Contributor

Hi @andres-chavez-bi,
We'll need to investigate more on why there was an exception (reason behind kill). I did spend some time to figure out but didn't find any reason. We could of course add a check to avoid killing of pgAdmin.
Regarding the Crypt Key Missing - The reason it is asking because when a user logs in, the users password is used as the crypt key and is stored in-memory. But when the pgAdmin process got killed, the in-memory data is lot along with user logged in session. The user has to log in again to start a new user session.
This can be avoided by simply fixing the process killing root cause which will be taken care before next release.

Thanks.

@pravesh-sharma pravesh-sharma self-assigned this Nov 14, 2024
@pravesh-sharma pravesh-sharma moved this from 🆕 New to 🏗 In Progress in Current Sprint (184) Nov 14, 2024
@akshay-joshi akshay-joshi added this to the 8.14 milestone Nov 21, 2024
@pravesh-sharma pravesh-sharma moved this from In Testing to 🏗 In Progress in Current Sprint (184) Nov 28, 2024
yogeshmahajan-1903 added a commit to yogeshmahajan-1903/pgadmin4 that referenced this issue Nov 28, 2024
yogeshmahajan-1903 added a commit to yogeshmahajan-1903/pgadmin4 that referenced this issue Nov 28, 2024
@yogeshmahajan-1903 yogeshmahajan-1903 moved this from 🏗 In Progress to In Review in Current Sprint (184) Nov 28, 2024
@pravesh-sharma pravesh-sharma moved this from In Review to In Testing in Current Sprint (184) Dec 2, 2024
@pravesh-sharma
Copy link
Contributor

Issue fixed, verified on snapshot build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants