Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: create secure clusters by default #118504

Merged
merged 3 commits into from
Feb 9, 2024

Conversation

DarrylWong
Copy link
Contributor

@DarrylWong DarrylWong commented Jan 30, 2024

Previously, the default for roachtests was to create insecure clusters. This change now switches the default to create secure clusters, allowing tests to opt out as needed. To support this, the following changes were made:

  1. Refactor PGUrl functions to use PGURLOptions instead of tenant/SQLInstance, which were usually not specified anyway. This change allows up to add an authMode option, which we currently only use to specify root user authentication but in the future can use to allow for non root authentication.
  2. Add helpers to extract session ID and create an HTTP Client with the extracted session ID. This lets us create HTTP Requests to secure clusters.
  3. Explicitly specify a connection string with a root certificate or a certs directory when connecting to a cluster. If not specified, the connection will be rejected as the default behavior is to authenticate with root and no certificates. Many tests already did this through specifying {pgurl}.

Follow up work would be to start using non root user everywhere now that we are running on secure clusters and have the ability to easily generate non root urls.

Release note: None
Epic: None
Informs: #63145

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong DarrylWong changed the title Secure clusters roachtest: create secure clusters by default Jan 30, 2024
@DarrylWong DarrylWong force-pushed the secure-clusters branch 5 times, most recently from 576977d to 8b08243 Compare February 1, 2024 15:57
@DarrylWong
Copy link
Contributor Author

GCE Run

Failures are unrelated or have been fixed and run on gceworker. tpchvec/smithcmp fails because it pulls the .toml file from master, which this PR updates.

@DarrylWong DarrylWong force-pushed the secure-clusters branch 2 times, most recently from 12e8025 to 84b8556 Compare February 1, 2024 16:06
@DarrylWong DarrylWong marked this pull request as ready for review February 1, 2024 16:07
@DarrylWong DarrylWong requested review from a team as code owners February 1, 2024 16:07
@DarrylWong DarrylWong requested review from herkolategan and srosenberg and removed request for a team February 1, 2024 16:07
Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🎉

Two general comments:

  • I'm still not sure when to use cockroach --url {pgurl:1} vs cockroach --certs-dir "certs" --pgport {pgport:1}. Some tests use the former, others use the latter and I'm unsure if it's intentional or coincidental.
  • Would be really nice to not directly reference roach and system directly anywhere. It should be easy for us to change these values (especially in light of the unfinished virtualization support), and currently it seems like it will be painful to track down all hardcoded references to those two values.

Reviewed 16 of 16 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, and @srosenberg)


pkg/cmd/roachtest/clusterstats/helpers.go line 40 at r2 (raw file):

		return nil, err
	}
	httpclient, err := roachtestutil.DefaultHttpClientWithSessionCookie(ctx, c, l, c.Node(1), fmt.Sprintf("http://%s:9090/api/v1/query", prometheusNodeIP[0]))

If I'm reading it right, this is setting cockroachdb-specific cookies to connect to prometheus. 🤔


pkg/cmd/roachtest/roachtestutil/utils.go line 36 at r1 (raw file):

) (string, error) {
	opts := roachprod.PGURLOptions{}
	opts.Secure = c.IsSecure()

Nit: just merge these two lines?


pkg/cmd/roachtest/roachtestutil/utils.go line 102 at r2 (raw file):

		cookie, err = getSessionIDOnSingleNode(ctx, c, l, c.Node(node))
		if err == nil {
			break

Should we really be swallowing errors? Does this work around a specific issue?


pkg/cmd/roachtest/tests/build_info.go line 46 at r3 (raw file):

	}

	fmt.Printf("details: %+v", details)

Intentional?


pkg/cmd/roachtest/tests/status_server.go line 70 at r3 (raw file):

		t.Fatal(err)
	}
	name, value, found := strings.Cut(sessionID, "=")

Why do we need to do this manually in this test?


pkg/roachprod/install/cockroach.go line 495 at r1 (raw file):

const (
	AuthRootCert PGAuthMode = iota

Would be nice to add some comments here explaining the different auth options.


pkg/roachprod/install/cockroach.go line 515 at r1 (raw file):

		// TODO(DarrylWong): Support authentication for multitenant,
		// since they do not use roach:system.
		password := SystemInterfaceName

Can we define some const like DefaultPassword = SystemInterfaceName so that this won't break when we change the default password? This might all change when we add support for virtual clusters.


pkg/roachprod/install/cockroach.go line 525 at r1 (raw file):

		case AuthUserPassword:
			u.User = url.UserPassword(user, password)
			v.Add("sslmode", "allow")

Is there a reason why we use allow instead of verify-full in this case?


pkg/util/httputil/client.go line 45 at r2 (raw file):

			DialContext:       (&net.Dialer{Timeout: dialerTimeout}).DialContext,
			DisableKeepAlives: true,
			TLSClientConfig:   &tls.Config{InsecureSkipVerify: true},

Unclear (to me) why we needed to add this. The client also seems to be used in other parts of the DB, so I don't think we should change this setting.

Copy link

blathers-crl bot commented Feb 1, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure when to use cockroach --url {pgurl:1} vs cockroach --certs-dir "certs" --pgport {pgport:1}

Mostly intentional but I'm sure there's some that are unintentional. I'll be sure to double check again.

The difference comes from the fact that not every CLI lets you use a non root user. I'm not too well versed on the details but it looks like it has to due the fact some commands use RPC connections, which in turn require the root client certs. The main ones off the top of my head are cockroach node and cockroach debug. It looks like there was some talk to change this but as of right now those commands are stuck using root.

So in anticipation of using non root users as the next step, I think --url {pgurl:1} should be the default method if non root auth is possible. This will let switching to non root really easy (change the pgurl expander) while keeping the ones that can't be switched separate.

I guess an alternative would be to use --url along with the defaultPgUrl(..., AuthRootCert) helper for the root only commands. The tradeoff would be more boilerplate but it would maybe be more explicit that we meant to use root.

Would be really nice to not directly reference roach and system directly anywhere

Good point 👍

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/clusterstats/helpers.go line 40 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

If I'm reading it right, this is setting cockroachdb-specific cookies to connect to prometheus. 🤔

Oh good catch! Reverted that change, I think I was just on a spree of removing anything that remotely said client in roachtest.


pkg/cmd/roachtest/roachtestutil/utils.go line 36 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Nit: just merge these two lines?

Oops, rebase mistake. Done.


pkg/cmd/roachtest/roachtestutil/utils.go line 102 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Should we really be swallowing errors? Does this work around a specific issue?

The failover tests cause nodes to go down which makes the simple option of arbitrarily calling this on c.Node(1) not possible. I opted to just pass it all the nodes and have it try until it succeeded, versus keeping track of which nodes were alive and piping that information all way to this helper.

I added a l.Printf("%s auth session login failed on node %d: %v", test.DefaultCockroachPath, node, err) to mimic how we deal with test debug, would that be good enough? Or do you think its just bad in general to not error out there?


pkg/cmd/roachtest/tests/build_info.go line 46 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Intentional?

Nope, removed.


pkg/cmd/roachtest/tests/status_server.go line 70 at r3 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Why do we need to do this manually in this test?

I think to avoid getting the sessionId for each endpoint when it won't change anyway but it does look pretty messy now that you point it out. I rewrote the test slightly differently though to remove the need to do that.


pkg/roachprod/install/cockroach.go line 495 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Would be nice to add some comments here explaining the different auth options.

Done.


pkg/roachprod/install/cockroach.go line 525 at r1 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Is there a reason why we use allow instead of verify-full in this case?

If you want to use verify-full then you have to specify certs, in which case you should just use AuthUserCert. I guess a good question would be when would you ever want to not use verify-full? I think not very often, there were just a few tests that already used user+password auth.


pkg/util/httputil/client.go line 45 at r2 (raw file):
Without that bit, http requests fail with failed to verify certificate: x509: certificate signed by unknown authority. Going off of what Stan said in the original issue:

The other issue is the generated certs are not signed by a recognized root CA. Thus, by default http clients will fail to verify the node cert.

I'm still trying to fully understand myself why this is true for http but we can auth with certs through CLI. I see some stuff about strictTLS being false for the cli commands. Hmm, I'll have to look into this more.

For sanity, I put in fake values for the cookie and it fails, so its not like this setting is just completely skipping auth.

client also seems to be used in other parts of the DB

Yeah you're right, shouldn't be touched in that case. I moved the setting change to inside the helper inroachtestutil.

@DarrylWong DarrylWong force-pushed the secure-clusters branch 4 times, most recently from f2c9657 to c1ae680 Compare February 2, 2024 20:45
@DarrylWong
Copy link
Contributor Author

After digging some more, it looks like httpclient uses crypto/tls which mentions that:

The config cannot be nil: users must set either ServerName or InsecureSkipVerify in the config.

But since our certs are just generate through cockroach cert, there isn't a root CA server to put as ServerName and we have to use InsecureSkipVerify instead. Does this sound correct or am I still missing something?

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense to me. I think it's fine to skip verify for the purposes of our HTTP client in roachtest.

Reviewed 69 of 69 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 2477 at r7 (raw file):

	ctx context.Context, l *logger.Logger, nodes option.NodeListOption, opts roachprod.PGURLOptions,
) ([]string, error) {
	opts.Secure = c.localCertsDir != ""

c.IsSecure().


pkg/cmd/roachtest/roachtestutil/utils.go line 102 at r2 (raw file):

Previously, DarrylWong wrote…

The failover tests cause nodes to go down which makes the simple option of arbitrarily calling this on c.Node(1) not possible. I opted to just pass it all the nodes and have it try until it succeeded, versus keeping track of which nodes were alive and piping that information all way to this helper.

I added a l.Printf("%s auth session login failed on node %d: %v", test.DefaultCockroachPath, node, err) to mimic how we deal with test debug, would that be good enough? Or do you think its just bad in general to not error out there?

Yeah, I think this is good enough for now.


pkg/cmd/roachtest/roachtestutil/utils.go line 140 at r7 (raw file):

		// much higher than on linux).
		DialContext:       (&net.Dialer{Timeout: httputil.StandardHTTPTimeout}).DialContext,
		DisableKeepAlives: true,

IMO, we can do away with these two options for the purposes of the roachtest client, unless they are somehow needed for correctness?


pkg/cmd/roachtest/roachtestutil/utils.go line 143 at r7 (raw file):

		TLSClientConfig:   &tls.Config{InsecureSkipVerify: true},
	}
	for _, cookieUrl := range cookieUrls {

API suggestion: it's good to allow for customization with cookieUrls, but I think we should provide a default of sending the cookies for every request to the cockroachdb admin endpoint. This seems to be how most (all?) roachtests use it, and would avoid having to enumerate endpoints on every call to this function.


pkg/cmd/roachtest/tests/build_info.go line 36 at r7 (raw file):

	}
	url := `http://` + adminUIAddrs[0] + `/_status/details/local`
	client, err := roachtestutil.DefaultHttpClientWithSessionCookie(ctx, c, t.L(), c.Node(1), url)

Do we still need the session cookies when hitting the http endpoint? Could we use https?


pkg/cmd/roachtest/tests/gorm.go line 128 at r7 (raw file):

			option.WithNodes(node),
			fmt.Sprintf(`cd %s && rm migrate_test.go &&
				GORM_DIALECT="postgres" GORM_DSN="user=roach password=sytem dbname=gorm host=localhost port={pgport:1} sslmode=require"

👀


pkg/cmd/roachtest/tests/knex.go line 41 at r7 (raw file):

		//if c.IsLocal() {
		//	t.Fatal("cannot be run in local mode")
		//}

👀


pkg/cmd/roachtest/tests/knex.go line 124 at r7 (raw file):

		// Write the knexfile test config into the test suite to use.
		// The default test config does not support ssl connections.
		testConfigFile := "./pkg/cmd/roachtest/tests/knexfile.js"

This will create a dependency on where the roachtest binary runs relative to project root. Could we embed this file and use PutString?


pkg/cmd/roachtest/tests/knexfile.js line 34 at r7 (raw file):

    return res;
  },
  {}

What is this doing? :derp:


pkg/cmd/roachtest/tests/pop.go line 37 at r7 (raw file):

		t.Status("setting up cockroach")
		startOpts := option.DefaultStartOptsInMemory()
		startOpts.RoachprodOpts.SQLPort = 26259

Why was this necessary?


pkg/cmd/roachtest/tests/tpce.go line 58 at r7 (raw file):

	defaultFixtureBucket = "gs://cockroach-fixtures-us-east1/tpce-csv"
	defaultUser          = "roach"
	defaultPassword      = "system"

We should probably reference the install constants.


pkg/cmd/roachtest/tests/typeorm.go line 308 at r7 (raw file):

      	"rejectUnauthorized": true
    	}
  	}

Nit: indentation seems off.


pkg/roachprod/install/cockroach.go line 525 at r1 (raw file):

Previously, DarrylWong wrote…

If you want to use verify-full then you have to specify certs, in which case you should just use AuthUserCert. I guess a good question would be when would you ever want to not use verify-full? I think not very often, there were just a few tests that already used user+password auth.

Got it, makes sense! At some point it would be nice for roachprod/roachtest to provide a wrapper to create new users and certs so that we could dynamically use AuthUserCert for users created in the test. Something for the future.


pkg/roachprod/install/cockroach.go line 178 at r7 (raw file):

	defaultInitTarget = Node(1)

	Username = "roach"

Remove?

@renatolabs
Copy link
Contributor

Optional: could you also remove this default setting? 🙏

defaultClusterSettings = []install.ClusterSettingOption{
install.SecureOption(true),
}

@DarrylWong DarrylWong force-pushed the secure-clusters branch 2 times, most recently from a1f6fd2 to 0d6e1b6 Compare February 5, 2024 20:44
Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also removed from other places in roachtest

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/cluster.go line 2477 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

c.IsSecure().

Done


pkg/cmd/roachtest/roachtestutil/utils.go line 140 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

IMO, we can do away with these two options for the purposes of the roachtest client, unless they are somehow needed for correctness?

SGTM, removed and confirmed the relevant tests still pass.


pkg/cmd/roachtest/roachtestutil/utils.go line 143 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

API suggestion: it's good to allow for customization with cookieUrls, but I think we should provide a default of sending the cookies for every request to the cockroachdb admin endpoint. This seems to be how most (all?) roachtests use it, and would avoid having to enumerate endpoints on every call to this function.

Yeah that sounds like a better idea. Still working on this but just publishing my other comments first.


pkg/cmd/roachtest/tests/build_info.go line 36 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Do we still need the session cookies when hitting the http endpoint? Could we use https?

Yeah it fails with status: 401 Unauthorized, content-type: text/plain; charset=utf-8, body: a valid authentication cookie is required without the cookie. I made it https to be explicit but I wonder if it's redirecting to https by it's own? Also changed it elsewhere when applicable.


pkg/cmd/roachtest/tests/gorm.go line 128 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

👀

Hmm, so I didn't catch this because I misspelled system, but it looks like it was passing despite that? At first I thought it was like django where it silently fails, but we get the same amount of passed/skipped/failed tests 🤔 . If I misspell the user as well, or give it a random port, it still passes.

Seems like this line isn't needed at all, but that doesn't make sense to me how it could connect to the db? I also see the gorm code definitely using that env variable too...

https://github.com/go-gorm/gorm/blob/master/tests/tests_test.go#L49-L66

I fixed in the meanwhile but still trying to figure out why it doesn't need it.


pkg/cmd/roachtest/tests/knex.go line 41 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

👀

🤦

Fixed, thanks!


pkg/cmd/roachtest/tests/knex.go line 124 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

This will create a dependency on where the roachtest binary runs relative to project root. Could we embed this file and use PutString?

Yeah we could and I have no preference myself here, but what do you think about cases like ruby-pg and tpchvec/smithcmp which are also using the c.Put approach?

They could also go with the embedded file approach but the files they need to copy are quite large, over 500+ lines. For something small like knexfile.js it might not be a big deal but it might be annoying trying to edit ruby_pg_helpers.rb if it's embedded as a string?

Another approach would be what tpchvec/smithcmp was doing before: using curl on the github repo. This worked but IMO was really annoying if you have to change the file since it was pulling from master. That's why I changed it to just c.Put the file but based on what you're saying, maybe I should revert it.


pkg/cmd/roachtest/tests/pop.go line 37 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Why was this necessary?

pop already has an option to use ssl connections, by using cockroach_ssl. For some reason (maybe because it runs some sort of CI on both secure and insecure concurrently), cockroach_ssluses port26259`.

See here: https://github.com/gobuffalo/pop/blob/main/database.yml#L26-L28

Also added a comment to clarify.


pkg/cmd/roachtest/tests/tpce.go line 58 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

We should probably reference the install constants.

Done


pkg/cmd/roachtest/tests/typeorm.go line 308 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Nit: indentation seems off.

👍

Was using tabs instead of spaces


pkg/roachprod/install/cockroach.go line 178 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Remove?

Done


pkg/cmd/roachtest/tests/knexfile.js line 34 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

What is this doing? :derp:

To be honest, I' don't really know either. I just copied it from the original knexfile.js.

I do know that without that block of code, the test was not able to find the cockroachdb adapter. So it seems like it's needed to actually export the test config.

@DarrylWong DarrylWong force-pushed the secure-clusters branch 5 times, most recently from 6bd92f5 to 8c0ad3b Compare February 6, 2024 19:22
@DarrylWong
Copy link
Contributor Author

Should be RFAL again. Reworked the HTTP Client so it should be easier for the caller to use. Ideally the client can just be created once and reused throughout the test was what I was aiming for.

I was running into trouble getting acceptance/status-server to pass CI, despite it passing locally on gceworker and on a TC GCE run. It turned out to be a cluster reuse problem where it was using the cookies from a previous acceptance/gossip test. This only affected that test because it uses the same exact url as a test in acceptance/gossip.

It's not clear to me why the go GC isn't destroying the cookie jar though. The only thing I can think of is we aren't properly closing the response, but it looks like it's already built in for GetJSON and we are for all instances of Get. My bandaid solution was to just always overwrite the cookie jar even if there's already a cookie for that url in there, but I'm not sure if this is cause for concern or just a case of Go GC not working.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, and @srosenberg)


pkg/cmd/roachtest/tests/build_info.go line 36 at r7 (raw file):

Previously, DarrylWong wrote…

Yeah it fails with status: 401 Unauthorized, content-type: text/plain; charset=utf-8, body: a valid authentication cookie is required without the cookie. I made it https to be explicit but I wonder if it's redirecting to https by it's own? Also changed it elsewhere when applicable.

Right! I think it redirects.


pkg/cmd/roachtest/tests/knex.go line 124 at r7 (raw file):

Previously, DarrylWong wrote…

Yeah we could and I have no preference myself here, but what do you think about cases like ruby-pg and tpchvec/smithcmp which are also using the c.Put approach?

They could also go with the embedded file approach but the files they need to copy are quite large, over 500+ lines. For something small like knexfile.js it might not be a big deal but it might be annoying trying to edit ruby_pg_helpers.rb if it's embedded as a string?

Another approach would be what tpchvec/smithcmp was doing before: using curl on the github repo. This worked but IMO was really annoying if you have to change the file since it was pulling from master. That's why I changed it to just c.Put the file but based on what you're saying, maybe I should revert it.

Sorry, I should have been more specific. When I suggested embedding, I meant using the embed package [1] which is the best of both worlds (embeds into the binary while also allowing you to keep it as a separate file). This is an optional improvement, feel free to do it in follow up work if you want.

[1] https://pkg.go.dev/embed


pkg/cmd/roachtest/tests/pop.go line 37 at r7 (raw file):

Previously, DarrylWong wrote…

pop already has an option to use ssl connections, by using cockroach_ssl. For some reason (maybe because it runs some sort of CI on both secure and insecure concurrently), cockroach_ssluses port26259`.

See here: https://github.com/gobuffalo/pop/blob/main/database.yml#L26-L28

Also added a comment to clarify.

Nice, thank you.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why the go GC isn't destroying the cookie jar though

The GC wouldn't delete the cookie jar because it shouldn't; the HTTP client lives in the test_runner and, as far as I can tell, we are keeping a reference to the client throughout the entire execution:

func DefaultHTTPClient(
	c cluster.Cluster, l *logger.Logger, opts ...func(options *RoachtestHTTPOptions),
) RoachtestHTTPClient {
	roachtestHTTP := RoachtestHTTPClient{
		client:  httputil.DefaultClient,
		cluster: c,
		l:       l,
	}

httputil.DefaultClient is a single in-memory *http.Client; when we call SetCookies, we are setting cookies in that one HTTP client. In fact, it seems we are reusing the same client across all tests in a run, no?

DefaultHTTPClient should return a new *http.Client instance each time it's called.

Happy to chat in more detail if that doesn't make sense, feel free to message me!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/httpclient.go line 52 at r14 (raw file):

	// each time a new request is made. This is usually not required
	// as the sessionID does not change unless the cluster is restarted.
	clearSessionID bool

I'm curious about this API choice. IMO, tests that restart nodes and make authenticated API requests should be the ones clearing the session when it makes sense (i.e., after restart) so that we are more explicit about this behaviour. Something like client.ResetSession(). Optional, this is fine as-is.


pkg/cmd/roachtest/roachtestutil/httpclient.go line 103 at r14 (raw file):

func (r *RoachtestHTTPClient) Get(ctx context.Context, url string) (*http.Response, error) {
	if err := r.addCookie(ctx, url); err != nil {

FWIW, when I mentioned that we could default to not taking a list of cookieURLs, I meant that when we built an authenticated HTTP client, we could add, by default, the session ID cookie on the root (/) of the cluster's admin HTTP endpoint. Most tests are calling endpoints on the admin endpoints, so if we associate the cookie to that root path, it should be sent for every subsequent request. Does that make sense?

I should also have been more clear when I first made that suggestion. This can be done in follow up work if you want, as I feel it will simplify this implementation (no need for wrappers, we just give the test an http.Client instance ready to use).


pkg/util/httputil/client.go line 106 at r14 (raw file):

// SetCookies is a helper that checks if a client.CookieJar exists and creates
// one if it doesn't. It then sets the provided cookies through CookieJar.SetCookies.
func SetCookies(c *http.Client, u *url.URL, cookies []*http.Cookie) error {

It's not immediately clear that this function is not thread safe. We should either document it very explicitly, or make it thread safe.

Copy link
Contributor Author

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! I see now 😄 For some reason I was thinking var DefaultClient = NewClientWithTimeout(StandardHTTPTimeout) was just a convenience helper when it's not.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/httpclient.go line 52 at r14 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I'm curious about this API choice. IMO, tests that restart nodes and make authenticated API requests should be the ones clearing the session when it makes sense (i.e., after restart) so that we are more explicit about this behaviour. Something like client.ResetSession(). Optional, this is fine as-is.

Done. I originally considered that approach as well. I wasn't sure if it was better to make it explicit or to just hide the details from the caller all together. I was pretty on the fence though so if you think that's better I'm all for it.


pkg/cmd/roachtest/roachtestutil/httpclient.go line 103 at r14 (raw file):

we could add, by default, the session ID cookie on the root (/) of the cluster's admin HTTP endpoint

Oh I see! I think that would fix the need to pass a list of URLs for the same endpoint, but it doesn't fix the need to get a new HTTPClient if you wanted to make a request to a different node in the cluster (which some tests do, i.e acceptance/status-server). Creating a new HTTPClient means getting the sessionID again which means another run() round trip.

A solution I tried was to just set the cookie for all node endpoints in the cluster, but some tests shut down nodes, so we'd need to pass it which nodes are alive to get the addresses. I'm not convinced that doing that is simpler than the current solution.

That being said, maybe it's not a big deal to have to retrieve the sessionID again. I see your point about the wrapper adding more complexity, so I'm open to implementing it if you don't think the current complexity/efficiency trade off is worth it.

One thing I will note is that without the wrapper, I'm not sure how to easily make setCookies() thread safe.


pkg/cmd/roachtest/tests/knex.go line 124 at r7 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Sorry, I should have been more specific. When I suggested embedding, I meant using the embed package [1] which is the best of both worlds (embeds into the binary while also allowing you to keep it as a separate file). This is an optional improvement, feel free to do it in follow up work if you want.

[1] https://pkg.go.dev/embed

Ah, that makes more sense 😆

I think I'll add this as follow up work in the spirit of minimizing this PR size, as there's more than just this test that could benefit from this approach. Would be nice to switch them all over at once too.


pkg/util/httputil/client.go line 106 at r14 (raw file):

Previously, renatolabs (Renato Costa) wrote…

It's not immediately clear that this function is not thread safe. We should either document it very explicitly, or make it thread safe.

Added a mutex lock around it. I also moved it to the roachtestutil package since I don't think the actual DB code would want to use this function.

Copy link
Contributor

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Changes look good to me! If you think this is ready to go, :shipit: :YOLO:

Thanks for iterating on this and carefully updating every test. 🎉

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DarrylWong, @herkolategan, and @srosenberg)


pkg/cmd/roachtest/roachtestutil/httpclient.go line 103 at r14 (raw file):

Previously, DarrylWong wrote…

we could add, by default, the session ID cookie on the root (/) of the cluster's admin HTTP endpoint

Oh I see! I think that would fix the need to pass a list of URLs for the same endpoint, but it doesn't fix the need to get a new HTTPClient if you wanted to make a request to a different node in the cluster (which some tests do, i.e acceptance/status-server). Creating a new HTTPClient means getting the sessionID again which means another run() round trip.

A solution I tried was to just set the cookie for all node endpoints in the cluster, but some tests shut down nodes, so we'd need to pass it which nodes are alive to get the addresses. I'm not convinced that doing that is simpler than the current solution.

That being said, maybe it's not a big deal to have to retrieve the sessionID again. I see your point about the wrapper adding more complexity, so I'm open to implementing it if you don't think the current complexity/efficiency trade off is worth it.

One thing I will note is that without the wrapper, I'm not sure how to easily make setCookies() thread safe.

I think the current solution is good, we can always iterate if needed. Thanks for looking into it!

@DarrylWong
Copy link
Contributor Author

Reran the tests that were changed and all looks good.

TFTR!

bors r=renatolabs

@craig
Copy link
Contributor

craig bot commented Feb 9, 2024

Merge conflict.

The majority of calls to InternalPGUrl and ExternalPGUrl
did not specify a tenant or SQLInstance, leaving them as
"" and 0 respectively. This change refactors those functions
to instead use PGURLOptions which contains the above two
options.

An auth option is also added which specifies the mode of
authentication to be used, defaulting to the previous
behavior of root authentication if not specified.

Release note: None
…ient

These helpers are for roachtests to easily create a HTTPClient with
appropriate cookies. It also sets InsecureSkipVerify to true as any
certs we create won't be valid anyway.

This change is so that we can create http requests to secure clusters.

Release note: None
In order to run on secure clusters, many tests were
changed to explicity specify a pgurl or certs directory
to authenticate with. Currently most tests authenticate
with the root user, but in the future we want to use
a non root user when possible.

This change also fixes roachtests to use
the new defaultHTTPClient helper to disable cert
verification and automatically retrieve and use
sessionID for auth.

Release note: None
@DarrylWong
Copy link
Contributor Author

VM was preempted for FIPS local roachtest, reran it here but CI didn't recognize it like usually does for some reason.

bors r=renatolabs

@craig
Copy link
Contributor

craig bot commented Feb 9, 2024

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants