-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demo: cockroach demo attempts to obtain a temporary license upon startup and enables telemetry #40273
Conversation
RFAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, the number one and two rules we've had for 4 years is 1) "no phoning home without informing the user" and 2) "no phoning home without a way for the user to opt out".
Are you sure this implementation as-is is a good idea? What was the rationale? (tip: the rationale must be in the commit message)
Maybe @bdarnell has an opinion on this.
My opinion is that we should hardcode a license string that's good for just the demo session, with no phoning home whatsoever.
Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @rohany)
pkg/cli/demo.go, line 224 at r2 (raw file):
return ``, ``, cleanup, err } defer db.Close()
You're leaking the db
object across the entire function. Was that intentional?
pkg/cli/demo.go, line 230 at r2 (raw file):
license, err := getLicense() if err == nil { if _, err := db.Exec(`SET CLUSTER SETTING cluster.organization = ` + lex.EscapeSQLString(demoOrg)); err != nil {
IIRC SET CLUSTER SETTING accepts placeholders, so you can pass a proper SQL argument here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation decision was based off of some offline discussions with David Taylor, Jordan and Roland -- the end goal was to provide an API endpoint specifically for the demo, and then record some telemetry using the licenses given out there.
This issue with a license string that is just for the demo is that what is stopping users from just taking that license and using it on real clusters?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @jordanlewis, and @knz)
pkg/cli/demo.go, line 236 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Probably would be a good idea to log here.
Done.
pkg/cli/demo.go, line 224 at r2 (raw file):
Previously, knz (kena) wrote…
You're leaking the
db
object across the entire function. Was that intentional?
Yes -- I don't see a reason in making another db object for just setting the cluster. Future changes I'm making will also be using this db object, and I don't want to keep creating db connections for each of the settings that might need the db.
pkg/cli/demo.go, line 230 at r2 (raw file):
Previously, knz (kena) wrote…
IIRC SET CLUSTER SETTING accepts placeholders, so you can pass a proper SQL argument here
Ok, will fix.
There's nothing that stops the user from querying the same URL manually and get a string they can reuse in their real cluster. There is no real security here!! Look the security of the license key is a complete red herring. What you want (= what seems to motivate this work) is telemetry, not key confidentiality. If that's indeed what you want, you can set up proper telemetry (I believe the test server used by This will in turn will give the user the opportunity to disable said telemetry using the standard cluster setting and/or environment variable. |
This reminds me, you must expand the release note and explain that this is really recording telemetry and explain (in the release note) how to opt out. |
Exactly. I'd rather not use a license key at all here - just enable enterprise features for |
We've been discussing this in the original issue at #40222 and @dt convinced me that 1) it makes the code simpler and 2) it's interesting to use a license key for other reasons (simplicity of managing what gets enabled etc). If you wish we can resume/continue the conversation about design and motivation on the original issue #40222 instead of this PR. Assuming this PR continues in the same direction, here are the remaining points for the work:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We enable enterprise features in unit tests without a license key - how does that work? It still seems like it ought to be simpler to do something similar here but maybe I'm just not familiar enough with how this part works.
If it's not feasible to make this work like unit tests, I'm OK with it as long as it's gated and documented similarly to other telemetry usage.
pkg/cli/demo.go
Outdated
return ``, ``, cleanup, err | ||
} | ||
} else { | ||
log.Warningf(ctx, "error when attempting to acquire demo license: %+v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a message like "enterprise features are not enabled in this session".
pkg/cli/demo.go
Outdated
@@ -95,6 +103,25 @@ func init() { | |||
} | |||
} | |||
|
|||
func getLicense() (string, error) { | |||
client := &http.Client{ | |||
Timeout: time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should definitely have a fairly short timeout (or be asynchronous?) so that you don't get blocked if you're trying to use it offline, but I think 1 second is too short (consider the number of round trips in the TCP and TLS handshakes coming from Australia. This consideration would of course change if we made our reg service global :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd push against it being asynchronous -- I think it is better for the cluster to a little slow on starting up than a few partition commands failing at the start while the license is being requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't want to wait much more than 1, maybe 2s if I'm offline though.
I think we could actually get away with async as long as the window where behavior can change is <5s, just since the likelihood someone actually managing to enter and try to run an enterprise command that soon, while non-zero, is small enough that the expected value of confusion caused would be small too.
I wouldn't try to do anything fancier though -- 20.1 should have a more full-featured solution for this and stuff beyond demo
so whatever is simplest MVP should be our goal for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I updated the code to perform the request asynchronously.
I objected to using the same boolean that unit tests use for uniformity/maintainability. Unit tests typically test things in relative isolation -- testing just one feature -- so the fact that the special-case isn't the real thing is usually OK -- for the purpose of that test is is good enough. In That is why I'd prefer that we just use a real, normal license installed in the normal place, instead of a special-purpose bool. How we get that license I think it separable question that has significant discussion over on #40222 |
Thank you for the reviews and all the discussion. I've updated the PR and commit message based on all the comments. PTAL. Regarding telemetry, we check it using the 'COCKROACH_SKIP_UPDATE_CHECK' environement variable, but our docs mention something else -- https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html. What is the right thing? |
ce612c1
to
1291c5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks for all the changes.
Please also add at least one or two tests to confirm that the telemetry and licensing are effectively disabled under the advertised conditions. |
Yeah, I ran into some problems with that and testing, so I moved around some code to separate this license acquisition into the cliccl package. |
Actually, I'm running into some problems with writing some tests for this. Is it possible to have the CLI tests run using the CCL binary instead of the OSS binary? |
The simplest way to test this may be to use the TCL interactive_test framework. This uses a CCL binary: https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/interactive_tests/test_demo.tcl |
0a3b3c0
to
ef1b7b0
Compare
5f55b23
to
6a0d1d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, and @rohany)
pkg/ccl/cliccl/demo.go, line 33 at r5 (raw file):
} // Send some extra information to the endpoint. params := fmt.Sprintf("?type=demo&version=%s&clusterid=%s", build.VersionPrefix(), clusterID.String())
You'll need a healthy dose of escaping on some of these strings.
pkg/ccl/cliccl/demo.go, line 51 at r5 (raw file):
func getAndApplyLicense(db *gosql.DB, clusterID uuid.UUID, org string) (bool, error) { license, err := getLicense(clusterID) if err != nil {
-
I may recommend to print out the error instead of forgetting it entirely.
-
just FYI (no change needed here) it is generally beneficial to further narrow down the case of ignored error to the things that are exactly symptomatic of a failed license request. A connection error in the http client would be an ignorable error, however if the connection succeeds but the server returns a non-OK http error code perhaps you want a more explicit error. But then if the HTTP code was OK but you get a short read (again suggesting a network problem) that can also be ignored. All this is subtle stuff, but generally in the long term it pays off to not fully drop errors on the floor - some are really indicative of a programming bug in CockroachDB (or one of the license servers).
pkg/cli/demo.go, line 49 at r4 (raw file):
Previously, knz (kena) wrote…
attempts to connect to a Cockroach Labs server to obtain a temporary enterprise license
(otherwise it's not clear there's a remote roundtrip available.)
Is it possible that there are some trailing spaces remaining in this string? (Not sure, the reviewable output confuses me)
pkg/cli/demo.go, line 55 at r5 (raw file):
temporary enterprise license for demoing enterprise features and enable telemetry back to Cockroach Labs. In order to disable this behavior, set the environment variable "COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING".
@dt was that the env var we're settling on for this release?
pkg/cli/demo.go, line 227 at r5 (raw file):
success, err := GetAndApplyLicense(db, s.ClusterID(), demoOrg) // TODO (rohany): How to report this error and exit when license // acquisition is performed asynchronously?
If this wasn't meant to be asynchronous, you'd use an error channel.
However for here it's a bit more subtle. I would recommend this:
- edit
cli.go: Main()
to extract the error handling under the call toRun()
up to and includingos.Exit()
into a separateexitWithError(...)
function. Call it there. - call that function here too.
pkg/cli/demo.go, line 232 at r5 (raw file):
} if !success { msg := "Unable to acquire demo license. Enterprise features are not enabled in this session.\n"
const msg =
pkg/cli/interactive_tests/test_demo_telemetry.tcl, line 15 at r5 (raw file):
send "alter table vehicles partition by list (city) (partition p1 values in ('nyc'));\n" # expect that it failed, as no license was requested. eexpect "use of partitions requires an enterprise license"
add:
interrupt
eexpect eof
(to clean up the process)
then end_test
at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, and @rohany)
pkg/ccl/cliccl/demo.go, line 21 at r5 (raw file):
"github.com/cockroachdb/cockroach/pkg/cli" "github.com/cockroachdb/cockroach/pkg/util/uuid" "github.com/pkg/errors"
github.com/cockroachdb/errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, @knz, and @rohany)
pkg/ccl/cliccl/demo.go, line 33 at r5 (raw file):
Previously, knz (kena) wrote…
You'll need a healthy dose of escaping on some of these strings.
oh, yeah, sorry, my comment was pseudo-code: to do this for real we should use the net/url.Values{}, .Set each of them and then .Encode()
for the actual string instead of doing any sprintf'ing/escaping ourselves.
There is a wider discussion/design going on to serve licenses in a more general way in the future, so this commit is not aiming for a future-proof design, but instead an MVP to allow users to demo enterprise features within `cockroach demo`. We are not concerned about offline usage of enterprise features as users can obtain a license and enable features manually using SET. Fixes cockroachdb#40222. Release note (cli change): cockroach demo attempts to contact a license server to obtain a temporary license. cockroach demo now enables telemetry for the demo cluster. This feature can be opted out of by setting the `COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING` environment variable (https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).
RFAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but please also get a greenlight from david
Reviewed 4 of 4 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell, @jordanlewis, and @rohany)
Thanks for all the reviews! bors r+ |
Build failed |
bors r+ |
40273: demo: cockroach demo attempts to obtain a temporary license upon startup and enables telemetry r=rohany a=rohany There is a wider discussion/design going on to serve licenses in a more general way in the future, so this commit is not aiming for a future-proof design, but instead an MVP to allow users to demo enterprise features within `cockroach demo`. We are not concerned about offline usage of enterprise features as users can obtain a license and enable features manually using SET. Fixes #40222. Release note (cli change): cockroach demo attempts to contact a license server to obtain a temporary license. cockroach demo now enables telemetry for the demo cluster. This feature can be opted out of by setting the `COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING` environment variable (https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html). Co-authored-by: Rohan Yadav <[email protected]>
Build succeeded |
There is a wider discussion/design going on to serve licenses in
a more general way in the future, so this commit is not aiming
for a future-proof design, but instead an MVP to allow users to
demo enterprise features within
cockroach demo
.We are not concerned about offline usage of enterprise features
as users can obtain a license and enable features manually using SET.
Fixes #40222.
Release note (cli change): cockroach demo attempts to contact a license
server to obtain a temporary license. cockroach demo now enables
telemetry for the demo cluster. This feature can be opted out of by
setting the
COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING
environment variable(https://www.cockroachlabs.com/docs/stable/diagnostics-reporting.html).