Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel: acceptance tests fail when run under Bazel #59446

Closed
rickystewart opened this issue Jan 26, 2021 · 2 comments · Fixed by #71770
Closed

bazel: acceptance tests fail when run under Bazel #59446

rickystewart opened this issue Jan 26, 2021 · 2 comments · Fixed by #71770
Assignees
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-dev-inf

Comments

@rickystewart
Copy link
Collaborator

rickystewart commented Jan 26, 2021

bazel test pkg/acceptance:acceptance_test fails immediately with:

panic: cannot find package "github.com/cockroachdb/cockroach" in any of:
	GOROOT/src/github.com/cockroachdb/cockroach (from $GOROOT)
	($GOPATH not set. For more details see: 'go help gopath')

goroutine 1 [running]:
github.com/cockroachdb/cockroach/pkg/acceptance/cluster.glob..func1(0xc000d23b60, 0xc000d23b68)
	pkg/acceptance/cluster/dockercluster.go:74 +0xf4
github.com/cockroachdb/cockroach/pkg/acceptance/cluster.init()
	pkg/acceptance/cluster/dockercluster.go:79 +0x36a

Not sure why this is happening. I assume the acceptance tests run the go binary from within the test somewhere?

Epic CRDB-8306

@rickystewart rickystewart added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-build-system labels Jan 26, 2021
@rickystewart
Copy link
Collaborator Author

I assume it's this.

@rickystewart
Copy link
Collaborator Author

See documentation for how to appropriately access binaries from a test. The gotags for the test also need to be updated. I didn't look to see whether there were other issues that would prevent the tests from running.

This issue is specifically about the acceptance tests, but it would be good to audit around and make sure many tests aren't affected by these issues.

@rail rail self-assigned this Aug 5, 2021
rail added a commit to rail/cockroach that referenced this issue Aug 7, 2021
Touches cockroachdb#59446

Previously, in order to run the acceptance test we were implicitly
identifying the `cockroach` binary to the acceptance test. This is a bit
problematic under bazel, where the binary location may be in different
places.

This patch explicitly passes the location if the cockroach binary to the
acceptance test.

Release note: None
rail added a commit to rail/cockroach that referenced this issue Aug 9, 2021
Touches cockroachdb#59446

Previously, in order to run the acceptance test we were implicitly
identifying the `cockroach` binary to the acceptance test. This is a bit
problematic under bazel, where the binary location may be in different
places.

This patch explicitly passes the location if the cockroach binary to the
acceptance test.

Release note: None
craig bot pushed a commit that referenced this issue Aug 9, 2021
68418: sqlproxy: add a timeout when dialing tenant pods r=JeffSwenson a=JeffSwenson

Dialing dead or non existent pods on GKS can black hole
the network connection. This leaves the connection process
stuck until the OS TCP timeout of 2 minutes.

The SQL proxy and the tenant pod are colocated. 5 seconds should
be more than enough to establish the TCP connection. Timeing out
will cause the retry loop in proxy_handler.go to refetch the tenant
IP address and retry the connection.

Release note: None

68483: release: update cockroach-cloud-images repository location r=DuskEagle a=DuskEagle

This commit updates the location of the cockroach-cloud-images Docker
repository. For reasons described in https://cockroachlabs.atlassian.net/browse/CC-4667,
we are switching our internal CockroachDB Docker repository to this location,
which is build on top of Google Artifact Registry.

Release note: None

68524: sql: change physical planning heuristics a bit to prefer local execution r=yuzefovich a=yuzefovich

This commit changes two parts of the physical planner heuristics:
- we now say that the lookup join "can be distributed" rather than
  "should be distributed"
- for top K sort we also say that it "can be" rather than "should be"
  distributed.

I'm not certain whether both of these changes are always beneficial, but
here is some justification.

The change to the lookup join heuristic will make it so that the
distribution of the join reader stage is determined by other stages of
the physical plan in `distsql=auto` mode. Consider an example when the
input to the lookup join is the table reader that scans only a handful
of rows. Previously, because of the "should distribute" heuristic, such
a plan would be "distributed" meaning we would plan a single table
reader on the leaseholder for the relevant range (most likely a remote
node from the perspective of the gateway node for the query); this, in
turn, would force the planning of the join reader on the same node, and
all consequent stages - if any - too. Such a decision can create
a hotspot if that particular range is queried often (think append-only
access pattern where the latest data is accessed most frequently).

With this change in such a scenario we will get more even compute
utilization across the cluster because the flow will be fully planned on
the gateway (which assumed to be chosen randomly by a load balancer),
and the lookup join will be performed from the gateway (we'll still need
to perform a remote read from the leaseholder of that single range).

The change to the top K sort heuristic seems less controversial to me,
yet I don't have a good justification. My feeling is that usually the
value of K is small, so it's ok if we don't "force" ourselves to
distribute the sort operation if the physical plan otherwise isn't
calling for it.

Overall, the choice of making changes to these two heuristics isn't very
principled and is driven by a single query from one of our largest
customers which happened to hit the hot spot scenario as described
above. In their case, they have append-like workload that is constantly
updating a single range. Eventually that range is split automatically,
but both new ranges stay on the same node. The latest data is accessed
far more frequently than any other data in the table, yet according to
the KV heuristics the ranges aren't being reallocated because the scans
hitting the hot ranges aren't exceeding the threshold. What isn't
accounted for is the fact that other parts of the flow are far more
compute-intensive, so this change attempts to alleviate such a hot node
scenario.

Release note (sql change): Some queries with lookup joins and/or top
K sorts are now more likely to be executed in "local" manner with
`distsql=auto` session variable.

68569: acceptance: explicitly specify binary path r=rickystewart a=rail

Touches #59446

Previously, in order to run the acceptance test we were implicitly
identifying the `cockroach` binary to the acceptance test. This is a bit
problematic under bazel, where the binary location may be in different
places.

This patch explicitly passes the location if the cockroach binary to the
acceptance test.

Release note: None

68573: sql: fix COPY CSV so it handles multiple records at a time r=otan a=rafiss

fixes #68484

This was broken since the golang csv reader reads the entire input all
at once, so the underlying buffer is consumed. This caused the loop that
reads each record to terminate early.

Release note (bug fix): Fixed the COPY CSV command so that it handles
multiple records separated by newline characters.

68605: sql: rework SHOW JOBS WHEN COMPLETE delegate r=arulajmani a=ajwerner

The previous pattern was fragile because the filter could be re-arranged.

Release note: None

68609: cloud: bump orchestrator version to 21.1.7 r=mgartner a=mgartner

Release note: None

68615: roachtest: bump predecessor to v21.1.7 r=mgartner a=mgartner

Release note: None

Co-authored-by: Jeff <[email protected]>
Co-authored-by: Joel Kenny <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
sajjadrizvi pushed a commit to sajjadrizvi/cockroach that referenced this issue Aug 10, 2021
Touches cockroachdb#59446

Previously, in order to run the acceptance test we were implicitly
identifying the `cockroach` binary to the acceptance test. This is a bit
problematic under bazel, where the binary location may be in different
places.

This patch explicitly passes the location if the cockroach binary to the
acceptance test.

Release note: None
rail added a commit to rail/cockroach that referenced this issue Oct 25, 2021
This patch makes the acceptance test work under Bazel.

* Add `AbsCertsDir()` in order to keep track of certificate path for
  cases when tests change the working directory.
* docker-compose tests to use interpolation and environment variables in
  order to override `CERTS_DIR` and `COCKROACH_BINARY`.
* Add `copyRunfiles()` in order to copy Bazel-generated symlinked
  runfiles as regular files to make them available in docker mounted
  volumes.

Related: cockroachdb#71932, cockroachdb#71930
Fixes cockroachdb#59446

Release note: None
rail added a commit to rail/cockroach that referenced this issue Oct 25, 2021
This patch makes the acceptance test work under Bazel.

* Add `AbsCertsDir()` in order to keep track of certificate path for
  cases when tests change the working directory.
* docker-compose tests to use interpolation and environment variables in
  order to override `CERTS_DIR` and `COCKROACH_BINARY`.
* Add `copyRunfiles()` in order to copy Bazel-generated symlinked
  runfiles as regular files to make them available in docker mounted
  volumes.

Related: cockroachdb#71932, cockroachdb#71930
Fixes cockroachdb#59446

Release note: None
craig bot pushed a commit that referenced this issue Nov 23, 2021
71770: bazel: run acceptance tests under Bazel r=rickystewart a=rail

This patch makes the acceptance test work under Bazel.

* Add `AbsCertsDir()` in order to keep track of certificate path for
  cases when tests change the working directory.
* docker-compose tests to use interpolation and environment variables in
  order to override `CERTS_DIR` and `COCKROACH_BINARY`.
* Add `copyRunfiles()` in order to copy Bazel-generated symlinked
  runfiles as regular files to make them available in docker mounted
  volumes.

Related: #71932, #71930
Fixes: #59446

Release note: None

72574: ci: add bazel roachtest gce teamcity job r=rail a=rickystewart

Release note: None

73055: ui: prevent undefined axis label on custom chart r=zachlite a=zachlite

Release note (bug fix): Y-axis labels on custom charts no longer display 'undefined'.

@thtruo, this fix addresses the issue as described in #72115.

Now, when the user selects a new unit from the dropdown, the user will experience up to a 10 second delay before the axis label refreshes.  @nathanstilwell and I investigated this last week, and we'd need more time to think of a solution that doesn't create long term maintenance headaches.  The benefit of this PR as it stands is that the Y-axis label is noticeably less broken. FYI.





73080: bazel: don't shard `kvserver` test r=rail a=rickystewart

The `exclusive` tag here prevents the shards from running concurrently.
See #65407, #65582.

Release note: None

Co-authored-by: Rail Aliiev <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Zach Lite <[email protected]>
@craig craig bot closed this as completed in ccf0dd9 Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-dev-inf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants