Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Manually update the spanner session approximate_last_used_time #1009

Merged
merged 18 commits into from
Mar 12, 2021

Conversation

jrconlin
Copy link
Member

Closes #1008

Description

Updates the Session.approximate_last_used_time value manually on recycle

Testing

During multi-second, active use of a connection, the connection_info.idle should vary and remain less than the connection_info.age

Issue(s)

Closes #1008

@jrconlin jrconlin requested a review from a team February 26, 2021 19:35
src/db/spanner/manager/session.rs Outdated Show resolved Hide resolved
@jrconlin jrconlin requested a review from pjenvey March 2, 2021 16:50
if age > max_life as i64 {
metrics.incr("db.connection.max_life");
dbg!("### aging out", conn.session.get_name());
conn.session = create_session(&conn.client, database_name).await?;
Copy link
Member

@pjenvey pjenvey Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to get back around to this last week.

The last conversation we had about this I had a realization: the create_time/approximate_last_use_time stats we've begun using are for the Spanner Session but the GOAWAYs are likely more about the lower level GRPC connection/channel.

The case Err(e) => block below potentially recreates sessions on existing channels. In that case the Session create_time doesn't tell us about the GRPC channel but instead a Session possibly created much later than the original channel was.

The point of all this:

  • We really want to throw away the entire GRPC channel/connection (disconnecting it in the process) during max_life/idle events. Instead of recreating the session here, let's return Err(DbErrorKind::Expired.into()) which should cause deadpool to drop our SpannerSession struct (which is a confusing name here, it's really a combination of both the SpannerClient with its underlying grpcio::Channel and the spanner Session on top of it). Drop of the grpcio::Channel will disconnect it.
  • We should probably manually tracking the grpc channel/connection create time ourselves and use it for this check because the Spanner Session start_time could be different from the parent channel's (though I think its idle time would always be accurate?) I'm fine punting on this for now though (doing it later in a separate PR) because the Session's version probably doesn't differ very often -- with the return Err( change this should hopefully avoid a significant amount of our GOAWAYS as is. 🤷‍♂️

Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

@jrconlin jrconlin requested a review from pjenvey March 10, 2021 22:58
@@ -23,6 +23,8 @@ pub struct SpannerSession {
/// The underlying client (Connection/Channel) for interacting with spanner
pub client: SpannerClient,
pub(in crate::db::spanner) use_test_transactions: bool,
pub(in crate::db::spanner) create_time: i64,
pub(in crate::db::spanner) last_used_time: i64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed earlier, nothing updates this last_used_time. We either need to track it ourselves (not quite sure that's going to be easy) or checking the resulted version from get_session in recycle -- and also updating the value on conn there too (so the ConnectionInfo used for sentry extras is accurate).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to push back on modifying the approximate_last_used time for a few reasons.

  1. altering the last_used_time HEAVILY changes code in order to pass a mut reference down to the async_exec() function, where we actually use it. (We might be able to get away with an ARC wrapper for that, but again, things can get ugly faster than expected.
  2. we have to presume that Protobuf actually works. It promises to update that whenever there's a transaction on that session, which relates to what we're interested in tracking.
  3. we can drop our last_used_time since it's not going to be as accurate.

As it stands, we're reporting three values via connection_info:

  • Our creation timestamp
  • Protobuf's session creation timestamp
  • Protobuf's session approximate last use timestamp.

While there may be some clock drift, it should be consistent and we can probably adjust. Plus, these are not super fine grained timers, they're seconds, some drift should be acceptable since we're monitoring on a per-minute basis.

* use spanner's idle
* drop custom idle
* add comments to clarify various Session usages
* copy live session info to SpannerSession reference session
pjenvey
pjenvey previously approved these changes Mar 12, 2021
pjenvey
pjenvey previously approved these changes Mar 12, 2021
@pjenvey pjenvey merged commit f669b25 into master Mar 12, 2021
@pjenvey pjenvey deleted the bug/1008-lut branch March 12, 2021 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add last_use_time to recycler
2 participants