Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Google Cloud Spanner #3977

Merged
merged 7 commits into from
Feb 15, 2018
Merged

Add support for Google Cloud Spanner #3977

merged 7 commits into from
Feb 15, 2018

Conversation

sethvargo
Copy link
Contributor

@sethvargo sethvargo commented Feb 14, 2018

This PR adds support and documentation for a Google Cloud Spanner storage (physical) backend. The backend supports both HA and Transactional interfaces.

As requested, HA is not enabled by default. Uses can opt-in to HA by setting ha_enabled = "true" in configuration.

@sethvargo
Copy link
Contributor Author

sethvargo commented Feb 14, 2018

/cc @jefferai @mayakacz @emilymye

if haEnabledStr == "" {
haEnabledStr = c["ha_enabled"]
}
haEnabled := true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please default to false. We don't test this and we've had a lot of people with other HA backends that end up having some kind of problem. This isn't to say that this backend will, only that we would prefer people need to opt-in, at least in the near term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

var err error
haEnabled, err = strconv.ParseBool(haEnabledStr)
if err != nil {
return nil, errors.Wrap(err, "failed to parse HA enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use errwrap instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it more verbose than necessary, and I routinely forget to put the {{err}}, which makes future debugging annoying. Nonetheless, I'll conform and update. I've just personally gravitated to pkg/errors more 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable; at this point we're just trying to get the whole Vault codebase to use something and since errwrap is the HC way...

I might look into pkg/errors at some point as a replacement, but first I want to get everything sane with one type of wrapping.

// indicating if we are stopped - it exists to prevent double closing the
// channel. stopLock is a mutex around the locks.
stopCh chan struct{}
stopped bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a bool and lock you may want to just use atomic.CompareAndSwap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - with channels? Sorry - I've never used CompareAndSwap, so I'm not sure what you're suggesting here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In place of using stopped/stopLock. You can have a uint32 that defaults to zero; when stopping you atomic.CompareAndSwap and if it was swapped, you continue on, and if not, you're already running stop logic. Then in Lock you do the opposite (compare to 1 and swap to 0).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a must btw, just a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I would prefer to leave it as-is, if that's okay with you.

}
}()

_, err := l.backend.client.ReadWriteTransaction(ctx, func(ctx context.Context, txn *spanner.ReadWriteTransaction) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool; I'm assuming that the entire block will happen atomically so you can't get two clients elevating themselves.

Copy link
Contributor Author

@sethvargo sethvargo Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The function within the function may be called multiple times, but the transaction is retained during retries. It's a global lock basically.

@jefferai jefferai added this to the 0.9.4 milestone Feb 14, 2018
- **High Availability** – the Google Cloud Spanner storage backend supports high
availability. Because the Google Cloud Spanner storage backend uses the system
time on the Vault node to acquire sessions, clock skew across Vault servers
can cause lock contention.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a user supposed to do if there is lock contention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cry

Just kidding. Basically they need to resolve the skew using something like ntp and restarting services. None of the other backends documented how to fix skew, and, since it's not specific to Spanner, I don't know if it belongs here. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most (all?) other HA backends aren't time-specific, they're session-specific, so skew isn't really an issue.

Copy link
Contributor Author

@sethvargo sethvargo Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB is 😄 , and they did not document the fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a way to do an exclusive lock-for-writing on a document. I don't know what the failure conditions are. It's possible they just don't account for clock skew at all.


- `ha_enabled` `(string: "true")` - Specifies if high availability mode is
enabled. This is a boolean value, but it is specified as a string like "true"
or "false".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add what the default is (I think in another review it looks like the default will be "false")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That "true" there is actually the default, although it's now wrong since I asked for it to be flipped :-)

@jefferai jefferai merged commit 7af2bdc into hashicorp:master Feb 15, 2018
@sethvargo sethvargo deleted the sethvargo/spanner branch February 15, 2018 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants