Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
67547: auth: set up periodic deletion of old sessions in the `web_sessions` system table r=knz,mberhault a=cameronnunez

Fixes [#51169](#51169).

Expired sessions are not cleaned up in the web_sessions system
table quickly enough. The table should be kept from growing
indefinitely in the long run. This patch sets up periodic
deletion of these expired sessions.

Release note (security update): Old authentication web session
rows in the system.web_sessions table no longer accumulate
indefinitely in the long run. These rows are periodically
deleted. Refer to the reference docs for details about the
new cluster settings for system.web_sessions.

68310: util/cgroups: method to read file-backed memory on inactive LRU list r=abarganier a=abarganier

util/cgroups: method to read file-backed memory on inactive LRU list

Currently we have methods that allows us to read the cgroup memory
limit, as well as the current memory usage, for processes running in
unix containers. However, to more accurately determine the current
memory usage in the eyes of the container provider, we must subtract
the "cache usage" from the total memory usage, which is represented
by the inactive file-backed memory stat.

From the Docker documentation:

"On Linux, the Docker CLI reports memory usage by subtracting cache
usage from the total memory usage. The API does not perform such a
calculation but rather provides the total memory usage and the amount
from the cache so that clients can use the data as needed. The cache
usage is defined as the value of total_inactive_file field in the
memory.stat file on cgroup v1 hosts...On cgroup v2 hosts, the cache
usage is defined as the value of inactive_file field."

https://docs.docker.com/engine/reference/commandline/stats/#extended-description

In an effort to gain better observability into current memory usage in
the eyes of the container provider for purposes of identifying whether
or not a CRDB node is on its way to OOM, we add the ability to read
these values from the memory subsystem. This heuristic can then be
used in debug tools such as periodic query dumps and eventually,
more generalized crash dump platforms.

Informs #66901

68337: importccl: only initialize progress fields on first import attempt r=pbardea a=pbardea

Fixes #68247.

Release note (bug fix): Fix a bug where IMPORT would incorrectly reset
its progress upon resumption.

68394: storage: elide nonexistent files from registry r=jbowens a=jbowens

When the encryption-at-rest registry is loaded, elide any file entries
corresponding to files that do not exist on the filesystem. These
entries may exist because files were manually deleted by an operator, an
operation to update the registry failed or because the files were
deleted through a codepath that failed to update the registry.

Release note (bug fix): Fixes a bug where encryption-at-rest registry
would accumulate nonexistent file entries forever, contributing to
filesystem operations' latency on the store.

----

I'd like to follow this up with a change that implements `RemoveAll`
on `encryptedFS`. It's not as straightforward of a change. This
patch at least ensures process restarts clear all accumulated
garbage.

68697: opt: add format=hide-hist option r=mgartner a=mgartner

The `format=hide-hist` option for optimizer tests has been added. It
allows stats to be shown in optimizer test output without histograms.

This is useful during debugging when you want to view stats like row
count, but do not want the clutter of histograms.

Additionally, using `format=show-stats` with `optstepsweb` creates
base-64 encoded URLs longer than the maximum URL length supported by
browsers (typically ~2000 characters). You can now use
`optstepsweb format=(show-stats,hide-hist)` to view high-level
statistics in `optstepsweb`.

Release note: None

68743: sql: prevent internal error when altering database placement r=mgartner a=mgartner

The parser was updated in #68068 to support a new syntax for altering
database placement:

    ALTER DATABASE d SET PLACEMENT DEFAULT
    ALTER DATABASE d SET PLACEMENT RESTRICTED

Running these statements causes internal errors because there is no
execution support for them yet. This commit prevents an internal error
when they are executed, and returns a user-friendly error instead.

Informs #65475

Release note: None

68796: sql: fix panic using Reset() when using sqlstats iterator r=knz,maryliag a=Azhng

Previously, when if SQLStats.Reset() is called while iterating
through an iterator, this will cause panics since the iterator
will then return a nil pointer.

This commit changed the iterator to gracefully handle this situation
and prevents panic.

Resolves #68785

Release note: None

Co-authored-by: Cameron Nunez <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Azhng <[email protected]>
  • Loading branch information
7 people committed Aug 12, 2021
8 parents cef3142 + 2641423 + 6f3981f + 1e9e62c + 9e859bb + d1f15fb + 3be3634 + 18ea82d commit de520ba
Show file tree
Hide file tree
Showing 29 changed files with 1,285 additions and 107 deletions.
6 changes: 5 additions & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ server.shutdown.lease_transfer_wait duration 5s the amount of time a server wait
server.shutdown.query_wait duration 10s the server will wait for at least this amount of time for active queries to finish
server.time_until_store_dead duration 5m0s the time after which if there is no new gossiped information about a store, it is considered dead
server.user_login.timeout duration 10s timeout after which client authentication times out if some system range is unavailable (0 = no timeout)
server.web_session.auto_logout.timeout duration 168h0m0s the duration that web sessions will survive before being periodically purged, since they were last used
server.web_session.purge.max_deletions_per_cycle integer 10 the maximum number of old sessions to delete for each purge
server.web_session.purge.period duration 1h0m0s the time until old sessions are deleted
server.web_session.purge.ttl duration 1h0m0s if nonzero, entries in system.web_sessions older than this duration are periodically purged
server.web_session_timeout duration 168h0m0s the duration that a newly created web session will be valid
sql.cross_db_fks.enabled boolean false if true, creating foreign key references across databases is allowed
sql.cross_db_sequence_owners.enabled boolean false if true, creating sequences owned by tables from other databases is allowed
Expand Down Expand Up @@ -149,4 +153,4 @@ trace.datadog.project string CockroachDB the project under which traces will be
trace.debug.enable boolean false if set, traces for recent requests can be seen at https://<ui>/debug/requests
trace.lightstep.token string if set, traces go to Lightstep using this token
trace.zipkin.collector string if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'). Only one tracer can be configured at a time.
version version 21.1-124 set the active cluster version in the format '<major>.<minor>'
version version 21.1-126 set the active cluster version in the format '<major>.<minor>'
6 changes: 5 additions & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@
<tr><td><code>server.shutdown.query_wait</code></td><td>duration</td><td><code>10s</code></td><td>the server will wait for at least this amount of time for active queries to finish</td></tr>
<tr><td><code>server.time_until_store_dead</code></td><td>duration</td><td><code>5m0s</code></td><td>the time after which if there is no new gossiped information about a store, it is considered dead</td></tr>
<tr><td><code>server.user_login.timeout</code></td><td>duration</td><td><code>10s</code></td><td>timeout after which client authentication times out if some system range is unavailable (0 = no timeout)</td></tr>
<tr><td><code>server.web_session.auto_logout.timeout</code></td><td>duration</td><td><code>168h0m0s</code></td><td>the duration that web sessions will survive before being periodically purged, since they were last used</td></tr>
<tr><td><code>server.web_session.purge.max_deletions_per_cycle</code></td><td>integer</td><td><code>10</code></td><td>the maximum number of old sessions to delete for each purge</td></tr>
<tr><td><code>server.web_session.purge.period</code></td><td>duration</td><td><code>1h0m0s</code></td><td>the time until old sessions are deleted</td></tr>
<tr><td><code>server.web_session.purge.ttl</code></td><td>duration</td><td><code>1h0m0s</code></td><td>if nonzero, entries in system.web_sessions older than this duration are periodically purged</td></tr>
<tr><td><code>server.web_session_timeout</code></td><td>duration</td><td><code>168h0m0s</code></td><td>the duration that a newly created web session will be valid</td></tr>
<tr><td><code>sql.cross_db_fks.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if true, creating foreign key references across databases is allowed</td></tr>
<tr><td><code>sql.cross_db_sequence_owners.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if true, creating sequences owned by tables from other databases is allowed</td></tr>
Expand Down Expand Up @@ -153,6 +157,6 @@
<tr><td><code>trace.debug.enable</code></td><td>boolean</td><td><code>false</code></td><td>if set, traces for recent requests can be seen at https://<ui>/debug/requests</td></tr>
<tr><td><code>trace.lightstep.token</code></td><td>string</td><td><code></code></td><td>if set, traces go to Lightstep using this token</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'). Only one tracer can be configured at a time.</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.1-124</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>21.1-126</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
1 change: 1 addition & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,7 @@ ALL_TESTS = [
"//pkg/sql/sqlliveness/slstorage:slstorage_test",
"//pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil:sqlstatsutil_test",
"//pkg/sql/sqlstats/persistedsqlstats:persistedsqlstats_test",
"//pkg/sql/sqlstats/sslocal:sslocal_test",
"//pkg/sql/stats:stats_test",
"//pkg/sql/stmtdiagnostics:stmtdiagnostics_test",
"//pkg/sql/tests:tests_test",
Expand Down
7 changes: 7 additions & 0 deletions pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,9 @@ const (
SQLInstancesTable
// Can return new retryable rangefeed errors without crashing the client
NewRetryableRangefeedErrors
// AlterSystemWebSessionsCreateIndexes creates indexes on the columns revokedAt and
// lastUsedAt for the system.web_sessions table.
AlterSystemWebSessionsCreateIndexes

// Step (1): Add new versions here.
)
Expand Down Expand Up @@ -440,6 +443,10 @@ var versionsSingleton = keyedVersions{
Key: NewRetryableRangefeedErrors,
Version: roachpb.Version{Major: 21, Minor: 1, Internal: 124},
},
{
Key: AlterSystemWebSessionsCreateIndexes,
Version: roachpb.Version{Major: 21, Minor: 1, Internal: 126},
},

// Step (2): Add new versions here.
}
Expand Down
5 changes: 3 additions & 2 deletions pkg/clusterversion/key_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions pkg/server/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ go_library(
"node_tombstone_storage.go",
"pagination.go",
"problem_ranges.go",
"purge_auth_session.go",
"rlimit_bsd.go",
"rlimit_darwin.go",
"rlimit_unix.go",
Expand Down Expand Up @@ -275,6 +276,7 @@ go_test(
"node_test.go",
"node_tombstone_storage_test.go",
"pagination_test.go",
"purge_auth_session_test.go",
"servemode_test.go",
"server_import_ts_test.go",
"server_systemlog_gc_test.go",
Expand Down Expand Up @@ -333,6 +335,7 @@ go_test(
"//pkg/sql/execinfrapb",
"//pkg/sql/idxusage",
"//pkg/sql/sem/tree",
"//pkg/sql/sessiondata",
"//pkg/sql/sqlstats",
"//pkg/sql/tests",
"//pkg/startupmigrations",
Expand Down
160 changes: 160 additions & 0 deletions pkg/server/purge_auth_session.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
// Copyright 2021 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package server

import (
"context"
math_rand "math/rand"
"time"

"github.com/cockroachdb/cockroach/pkg/security"
"github.com/cockroachdb/cockroach/pkg/settings"
"github.com/cockroachdb/cockroach/pkg/sql/sessiondata"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
)

var (
webSessionPurgeTTL = settings.RegisterDurationSetting(
"server.web_session.purge.ttl",
"if nonzero, entries in system.web_sessions older than this duration are periodically purged",
time.Hour,
).WithPublic()

webSessionAutoLogoutTimeout = settings.RegisterDurationSetting(
"server.web_session.auto_logout.timeout",
"the duration that web sessions will survive before being periodically purged, since they were last used",
7*24*time.Hour,
settings.NonNegativeDuration,
).WithPublic()

webSessionPurgePeriod = settings.RegisterDurationSetting(
"server.web_session.purge.period",
"the time until old sessions are deleted",
time.Hour,
settings.NonNegativeDuration,
).WithPublic()

webSessionPurgeLimit = settings.RegisterIntSetting(
"server.web_session.purge.max_deletions_per_cycle",
"the maximum number of old sessions to delete for each purge",
10,
).WithPublic()
)

// startPurgeOldSessions runs an infinite loop in a goroutine
// which regularly deletes old rows in the system.web_sessions table.
func startPurgeOldSessions(ctx context.Context, s *authenticationServer) error {
return s.server.stopper.RunAsyncTask(ctx, "purge-old-sessions", func(context.Context) {
settingsValues := &s.server.st.SV
period := webSessionPurgePeriod.Get(settingsValues)

timer := timeutil.NewTimer()
defer timer.Stop()
timer.Reset(jitteredInterval(period))

for ; ; timer.Reset(webSessionPurgePeriod.Get(settingsValues)) {
select {
case <-timer.C:
timer.Read = true
s.purgeOldSessions(ctx)
case <-s.server.stopper.ShouldQuiesce():
return
case <-ctx.Done():
return
}
}
},
)
}

// purgeOldSessions deletes old web session records.
// Performs three purges: (1) one for sessions with expiration
// older than the purge TTL, (2) one for sessions with revocation
// older than the purge TTL, and (3) one for sessions that have
// timed out since they were last used.
func (s *authenticationServer) purgeOldSessions(ctx context.Context) {
var (
deleteOldExpiredSessionsStmt = `
DELETE FROM system.web_sessions
WHERE "expiresAt" < $1
ORDER BY random()
LIMIT $2
RETURNING 1
`
deleteOldRevokedSessionsStmt = `
DELETE FROM system.web_sessions
WHERE "revokedAt" < $1
ORDER BY random()
LIMIT $2
RETURNING 1
`
deleteSessionsAutoLogoutStmt = `
DELETE FROM system.web_sessions
WHERE "lastUsedAt" < $1
ORDER BY random()
LIMIT $2
RETURNING 1
`
settingsValues = &s.server.st.SV
internalExecutor = s.server.sqlServer.internalExecutor
currTime = s.server.clock.PhysicalTime()

purgeTTL = webSessionPurgeTTL.Get(settingsValues)
autoLogoutTimeout = webSessionAutoLogoutTimeout.Get(settingsValues)
limit = webSessionPurgeLimit.Get(settingsValues)

purgeTime = currTime.Add(purgeTTL * time.Duration(-1))
autoLogoutTime = currTime.Add(autoLogoutTimeout * time.Duration(-1))
)

if _, err := internalExecutor.QueryRowEx(
ctx,
"delete-old-expired-sessions",
nil, /* txn */
sessiondata.InternalExecutorOverride{User: security.RootUserName()},
deleteOldExpiredSessionsStmt,
purgeTime,
limit,
); err != nil {
log.Errorf(ctx, "error while deleting old expired web sessions: %+v", err)
}

if _, err := internalExecutor.QueryRowEx(
ctx,
"delete-old-revoked-sessions",
nil, /* txn */
sessiondata.InternalExecutorOverride{User: security.RootUserName()},
deleteOldRevokedSessionsStmt,
purgeTime,
limit,
); err != nil {
log.Errorf(ctx, "error while deleting old revoked web sessions: %+v", err)
}

if _, err := internalExecutor.QueryRowEx(
ctx,
"delete-sessions-timeout",
nil, /* txn */
sessiondata.InternalExecutorOverride{User: security.RootUserName()},
deleteSessionsAutoLogoutStmt,
autoLogoutTime,
limit,
); err != nil {
log.Errorf(ctx, "error while deleting web sessions older than auto-logout timeout: %+v", err)
}
}

// jitteredInterval returns a randomly jittered (+/-25%) duration
// from the interval.
func jitteredInterval(interval time.Duration) time.Duration {
return time.Duration(float64(interval) * (0.75 + 0.5*math_rand.Float64()))
}
Loading

0 comments on commit de520ba

Please sign in to comment.