-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Manually update the spanner session approximate_last_used_time #1009
Conversation
src/db/spanner/manager/session.rs
Outdated
if age > max_life as i64 { | ||
metrics.incr("db.connection.max_life"); | ||
dbg!("### aging out", conn.session.get_name()); | ||
conn.session = create_session(&conn.client, database_name).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to get back around to this last week.
The last conversation we had about this I had a realization: the create_time/approximate_last_use_time
stats we've begun using are for the Spanner Session
but the GOAWAYs are likely more about the lower level GRPC connection/channel.
The case Err(e) => block below potentially recreates sessions on existing channels. In that case the Session
create_time
doesn't tell us about the GRPC channel but instead a Session
possibly created much later than the original channel was.
The point of all this:
- We really want to throw away the entire GRPC channel/connection (disconnecting it in the process) during
max_life/idle
events. Instead of recreating the session here, let'sreturn Err(DbErrorKind::Expired.into())
which should cause deadpool to drop ourSpannerSession
struct (which is a confusing name here, it's really a combination of both the SpannerClient with its underlying grpcio::Channel and the spanner Session on top of it). Drop of thegrpcio::Channel
will disconnect it. - We should probably manually tracking the grpc channel/connection create time ourselves and use it for this check because the Spanner
Session
start_time
could be different from the parent channel's (though I think its idle time would always be accurate?) I'm fine punting on this for now though (doing it later in a separate PR) because theSession
's version probably doesn't differ very often -- with thereturn Err(
change this should hopefully avoid a significant amount of our GOAWAYS as is. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^
* return an error instead of trying to create new session
src/db/spanner/manager/session.rs
Outdated
@@ -23,6 +23,8 @@ pub struct SpannerSession { | |||
/// The underlying client (Connection/Channel) for interacting with spanner | |||
pub client: SpannerClient, | |||
pub(in crate::db::spanner) use_test_transactions: bool, | |||
pub(in crate::db::spanner) create_time: i64, | |||
pub(in crate::db::spanner) last_used_time: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed earlier, nothing updates this last_used_time. We either need to track it ourselves (not quite sure that's going to be easy) or checking the resulted version from get_session in recycle -- and also updating the value on conn there too (so the ConnectionInfo
used for sentry extras is accurate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to push back on modifying the approximate_last_used
time for a few reasons.
- altering the last_used_time HEAVILY changes code in order to pass a
mut
reference down to theasync_exec()
function, where we actually use it. (We might be able to get away with anARC
wrapper for that, but again, things can get ugly faster than expected. - we have to presume that Protobuf actually works. It promises to update that whenever there's a transaction on that session, which relates to what we're interested in tracking.
- we can drop our
last_used_time
since it's not going to be as accurate.
As it stands, we're reporting three values via connection_info
:
- Our creation timestamp
- Protobuf's session creation timestamp
- Protobuf's session approximate last use timestamp.
While there may be some clock drift, it should be consistent and we can probably adjust. Plus, these are not super fine grained timers, they're seconds, some drift should be acceptable since we're monitoring on a per-minute basis.
Closes #1008
Description
Updates the
Session.approximate_last_used_time
value manually on recycleTesting
During multi-second, active use of a connection, the
connection_info.idle
should vary and remain less than theconnection_info.age
Issue(s)
Closes #1008