Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auth: periodically clean up expired sessions in the web_sessions system table #51169

Closed
dhartunian opened this issue Jul 8, 2020 · 4 comments · Fixed by #67547
Closed

auth: periodically clean up expired sessions in the web_sessions system table #51169

dhartunian opened this issue Jul 8, 2020 · 4 comments · Fixed by #67547
Labels
A-authentication Pertains to authn subsystems A-cc-enablement Pertains to current CC production issues or short-term projects A-security A-webui Triage label for DB Console (fka admin UI) issues. Add this if nothing else is clear. A-webui-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security X-server-triaged-202105

Comments

@dhartunian
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
In the long run, the web_sessions table will currently continue to grow and accumulate expired sessions indefinitely.

Describe the solution you'd like
The table should either delete a user's old sessions upon creating a new session for them, or periodically delete expired sessions from the table to keep it from growing indefinitely.

Describe alternatives you've considered
N/A

Additional context
N/A

@dhartunian dhartunian added A-webui-security A-webui Triage label for DB Console (fka admin UI) issues. Add this if nothing else is clear. labels Jul 8, 2020
@blathers-crl
Copy link

blathers-crl bot commented Jul 8, 2020

Hi @dhartunian, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 8, 2020
@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz
Copy link
Contributor

knz commented Jun 22, 2021

I have investigated this a little bit.

  • We want auto-expiration for the following conditions:

    • expiresAt is older than a configurable interval (new cluster setting server.web_session.purge_ttl), default 1 hour.
    • revokedAt is older than the same interval.
    • lastUsedAt is older than another configurable interval (new cluster setting server.web_session.auto_logout_timeout), with a default of 1 week.
  • We should also be careful to not require a full table scans for these purges. Assuming an index on each of these columns, we should use 3 different DELETE statements, not just 1 with a combined condition. It's also probably necessary to spell out the index name explicitly e.g. DELETE FROM system.web_sessions@"web_sessions_expiresAt_idx" .... An EXPLAIN statement should confirm that the index is used.

  • In reference to the index assumption above: we already have SQL indexes on expiresAt, but not on the two other columns. Therefore, we should add additional indexes.

  • Even with 3 separate DELETE statements, we should minimize the amount of back-and-forth between the authentication layer and SQL. Therefore, we should use a single multi-CTE statement (e.g. WITH a AS (DELETE ...), b AS (DELETE ...), ...)

  • It may also be worth bunching these DELETEs together with the existing INSERT statements to ensure that the inserts never catch up with deletes, e.g. WITH a AS (DELETE) INSERT ..., WITH a AS (DELETE) UPDATE.

So here's a possible strategy:

  1. add the 2 cluster settings.
  2. implement a new unit test TestPurgeSession in server/authentication_test.go that does the following:
    a. creates a test server.
    b. customizes the cluster settings to a lower value than defaults.
    c. inserts some seemingly-old entries in system.web_sessions using explicit SQL inserts. The age should be chosen so that it's older than the configured setting values, but younger than what would be expired assuming defaults. (This is important to check that the settings are effective.)
    d. calls ( *authenticationServer) newAuthSession()
    e. verifies that newAuthSession() has deleted the test entries created at the 3nd step.

The code of TestCreateSession can be used for inspiration.
Of course at this point the new test is going to fail because the logic is not yet implemented.

Then:

  1. create a new SQL migration to add the missing indexes on revokedAt and lastUsedAt. May need to use the new "long-running migrations" mechanism.
  2. modify ( *authenticationServer) newAuthSession() to add the DELETE statements as CTEs for the existing INSERT statement.
  3. confirm that the new test passes.

@cameronnunez
Copy link
Contributor

cameronnunez commented Jul 13, 2021

  • lastUsedAt is older than another configurable interval (new cluster setting server.web_session.auto_logout_timeout), with a default of 1 week.

@knz To be clear, the server.web_session_timeout cluster setting is used to mark an entry as expired (by default 1 week after being created), while the server.web_session.auto_logout_timeout proposed cluster setting is used to delete entries that have not be used for at least the length of this duration, correct?

@knz
Copy link
Contributor

knz commented Jul 13, 2021

the server.web_session_timeout cluster setting is used to mark an entry as expired (by default 1 week after being created)

yes. You can think about "expiration timeout" as "after this time, the user cannot login any more"

the server.web_session.auto_logout_timeout proposed cluster setting is used to delete entries that have not be used for at least the length of this duration, correct?

yes. with emphasis on "have not been used".
You can think about "auto logout timeout" as "after this time, the user has not actually logged in any more" (assuming they could).

@knz knz added A-authentication Pertains to authn subsystems A-cc-enablement Pertains to current CC production issues or short-term projects labels Jul 29, 2021
craig bot pushed a commit that referenced this issue Aug 12, 2021
67547: auth: set up periodic deletion of old sessions in the `web_sessions` system table r=knz,mberhault a=cameronnunez

Fixes [#51169](#51169).

Expired sessions are not cleaned up in the web_sessions system
table quickly enough. The table should be kept from growing
indefinitely in the long run. This patch sets up periodic
deletion of these expired sessions.

Release note (security update): Old authentication web session
rows in the system.web_sessions table no longer accumulate
indefinitely in the long run. These rows are periodically
deleted. Refer to the reference docs for details about the
new cluster settings for system.web_sessions.

68310: util/cgroups: method to read file-backed memory on inactive LRU list r=abarganier a=abarganier

util/cgroups: method to read file-backed memory on inactive LRU list

Currently we have methods that allows us to read the cgroup memory
limit, as well as the current memory usage, for processes running in
unix containers. However, to more accurately determine the current
memory usage in the eyes of the container provider, we must subtract
the "cache usage" from the total memory usage, which is represented
by the inactive file-backed memory stat.

From the Docker documentation:

"On Linux, the Docker CLI reports memory usage by subtracting cache
usage from the total memory usage. The API does not perform such a
calculation but rather provides the total memory usage and the amount
from the cache so that clients can use the data as needed. The cache
usage is defined as the value of total_inactive_file field in the
memory.stat file on cgroup v1 hosts...On cgroup v2 hosts, the cache
usage is defined as the value of inactive_file field."

https://docs.docker.com/engine/reference/commandline/stats/#extended-description

In an effort to gain better observability into current memory usage in
the eyes of the container provider for purposes of identifying whether
or not a CRDB node is on its way to OOM, we add the ability to read
these values from the memory subsystem. This heuristic can then be
used in debug tools such as periodic query dumps and eventually,
more generalized crash dump platforms.

Informs #66901

68337: importccl: only initialize progress fields on first import attempt r=pbardea a=pbardea

Fixes #68247.

Release note (bug fix): Fix a bug where IMPORT would incorrectly reset
its progress upon resumption.

68394: storage: elide nonexistent files from registry r=jbowens a=jbowens

When the encryption-at-rest registry is loaded, elide any file entries
corresponding to files that do not exist on the filesystem. These
entries may exist because files were manually deleted by an operator, an
operation to update the registry failed or because the files were
deleted through a codepath that failed to update the registry.

Release note (bug fix): Fixes a bug where encryption-at-rest registry
would accumulate nonexistent file entries forever, contributing to
filesystem operations' latency on the store.

----

I'd like to follow this up with a change that implements `RemoveAll`
on `encryptedFS`. It's not as straightforward of a change. This
patch at least ensures process restarts clear all accumulated
garbage.

68697: opt: add format=hide-hist option r=mgartner a=mgartner

The `format=hide-hist` option for optimizer tests has been added. It
allows stats to be shown in optimizer test output without histograms.

This is useful during debugging when you want to view stats like row
count, but do not want the clutter of histograms.

Additionally, using `format=show-stats` with `optstepsweb` creates
base-64 encoded URLs longer than the maximum URL length supported by
browsers (typically ~2000 characters). You can now use
`optstepsweb format=(show-stats,hide-hist)` to view high-level
statistics in `optstepsweb`.

Release note: None

68743: sql: prevent internal error when altering database placement r=mgartner a=mgartner

The parser was updated in #68068 to support a new syntax for altering
database placement:

    ALTER DATABASE d SET PLACEMENT DEFAULT
    ALTER DATABASE d SET PLACEMENT RESTRICTED

Running these statements causes internal errors because there is no
execution support for them yet. This commit prevents an internal error
when they are executed, and returns a user-friendly error instead.

Informs #65475

Release note: None

68796: sql: fix panic using Reset() when using sqlstats iterator r=knz,maryliag a=Azhng

Previously, when if SQLStats.Reset() is called while iterating
through an iterator, this will cause panics since the iterator
will then return a nil pointer.

This commit changed the iterator to gracefully handle this situation
and prevents panic.

Resolves #68785

Release note: None

Co-authored-by: Cameron Nunez <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Azhng <[email protected]>
@craig craig bot closed this as completed in #67547 Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-authentication Pertains to authn subsystems A-cc-enablement Pertains to current CC production issues or short-term projects A-security A-webui Triage label for DB Console (fka admin UI) issues. Add this if nothing else is clear. A-webui-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security X-server-triaged-202105
Projects
None yet
4 participants