Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: remove default port assumption in start and start-sql #114097

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

DarrylWong
Copy link
Contributor

@DarrylWong DarrylWong commented Nov 8, 2023

Previously, the default sql port and admin ui port were set to 26257
and 26258 respectively in start and start-sql. This change now removes
that default assignment and now randomly assigns an available open port.

Many roachtests rely on this default port assumption and fail after this
change. To fix this, we now explicity specify the port when applicable.
Some roachtests use a third party library, where the port cannot be easily
passed in and is hardcoded. In these cases, we retain the previous behaviour
by specifying StartOpts.RoachprodOpts.SQLPort.

Epic: None
Fixes: #111052
Release note: none

@DarrylWong DarrylWong added the do-not-merge bors won't merge a PR with this label. label Nov 8, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong DarrylWong force-pushed the random-port branch 6 times, most recently from 2e2af7a to 5392f16 Compare November 16, 2023 19:32
@rafiss
Copy link
Collaborator

rafiss commented Nov 16, 2023

drive-by comment: a lot of the ORM tests rely on a known port. i do think there might be a way to refactor a lot of them to be aware of this.

@DarrylWong
Copy link
Contributor Author

Thanks for the heads up! And yeah, the ORM tests aren't alone in this. I got over 100 roachtest fails 😢 so refactoring them to be compatible will be the bulk of this PR.

@DarrylWong DarrylWong force-pushed the random-port branch 5 times, most recently from 5b7e957 to df4e698 Compare November 19, 2023 17:01
@renatolabs
Copy link
Contributor

drive-by comment: a lot of the ORM tests rely on a known port

Another drive-by comment: the intention is just to change the default port and remove assumptions on any particular assignment. Individual tests that rely on a specific configuration, such as ORM tests, should be able to choose a ports to bind but that is now made explicit by settingStartOpts.RoachprodOpts.SQLPort to the desired value.

@DarrylWong DarrylWong force-pushed the random-port branch 4 times, most recently from 8923adf to 822a673 Compare November 22, 2023 17:07
@DarrylWong DarrylWong changed the title roachtest: use a randomly available port by default in start and star… roachprod: remove default port assumption in start and start-sql Nov 22, 2023
@DarrylWong DarrylWong removed the do-not-merge bors won't merge a PR with this label. label Nov 22, 2023
@DarrylWong
Copy link
Contributor Author

DarrylWong commented Nov 22, 2023

Test runs:

AWS

GCE Lots of new failed tests due to rebasing off master and introducing a new DNS caching bug. Will rerun when fixed.

Debating if I should run the weeklies or not... Maybe I can just run the first hour or so of them since most fail fast?

@DarrylWong DarrylWong force-pushed the random-port branch 3 times, most recently from 2a7184b to a022097 Compare December 1, 2023 15:49
@DarrylWong DarrylWong force-pushed the random-port branch 4 times, most recently from 27617fc to a8f2747 Compare December 5, 2023 16:23
craig bot pushed a commit that referenced this pull request Dec 18, 2023
116218: roachtest: run consistency checks after tests r=erikgrinaker a=erikgrinaker

This patch runs replica consistency checks during post-test assertions. Previously, this only checked for stats inconsistencies, it now checks for actual data inconsistencies. This is useful to detect data corruption bugs, typically in e.g. Pebble or Raft snapshots.

The consistency check runs at full speed for up to 20 minutes before timing out, and will report any inconsistencies found during a partial run. This is sufficient to check about 200 GB of data. Tests can opt out by skipping `registry.PostValidationReplicaDivergence`.

Storage checkpoints are not yet taken when inconsistencies are detected, since it requires additional support in SQL builtins to take them at the same Raft log index across nodes. This will be addressed separately.

Touches #115770.
Epic: none
Release note: None

116614: sql: add telemetry for mixed DDL/DML transactions r=rafiss a=rafiss

This patch adds feature counter telemetry for explicit transactions that have schema changes. We track if a transaction has DDL only or a mixture of DDL and DML, and if it succeeded or failed.

fixes #115012
Release note: None

116672: sql: skip TestExperimentalRelocateNonVoters under duress r=rafiss a=rafiss

fixes #115935
Release note: None

116695: roachtest: specify port for post test timeseries collection r=renatolabs a=DarrylWong

After #114097, we need to specify ports for cockroach node connections. This one was missed and timeseries were not being collected in test failures.

Release note: none
Epic: none

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: DarrylWong <[email protected]>
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Dec 28, 2023
The change cockroachdb#114097 removed default port assumptions for SQLPort and AdminUIPort,
and instead finds open ports to assign. However, Prometheus scraping relies on
this hardcoded AdminUIPort as it can't discover the port itself.

This change temporarily reverts AdminUIPort back to the hardcoded port.

Release note: none
Epic: none
Informs: cockroachdb#117125
craig bot pushed a commit that referenced this pull request Dec 28, 2023
117141: roachprod: revert AdminUIPort back to DefaultAdminUIPort r=srosenberg a=DarrylWong

The change #114097 removed default port assumptions for SQLPort and AdminUIPort, and instead finds open ports to assign. However, Prometheus scraping relies on this hardcoded AdminUIPort as it can't discover the port itself.

This change temporarily reverts AdminUIPort back to the hardcoded port.

Release note: none
Epic: none
Informs: #117125

Co-authored-by: DarrylWong <[email protected]>
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Jan 8, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Jan 8, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 10, 2024
117505: roachtest: assign adminui ports dynamically for virtual clusters r=srosenberg,renatolabs a=DarrylWong

This was originally removed in #115599 due to #114097 merging, but adminui was reverted in #117141 and mistakenly did not revert the special case for virtual clusters. We unskip the multitenant/distsql tests as well.

Release note: None
Fixes: #117150
Fixes: #117149
Epic: None

117545: rpc: rm rangefeed RPC stream window special case r=erikgrinaker,miretskiy a=pav-kv

The rangefeed stream window size tuning was introduced to mitigate OOM in rangefeeds caused by the excessive number of streams (one per `Range`). Since we now use mux rangefeeds (which multiplexes all the rangefeed traffic into a single stream), this setting is no longer needed, so this commit removes it.

Part of #108992

Release note (ops change): `COCKROACH_RANGEFEED_RPC_INITIAL_WINDOW_SIZE` env variable has been removed, and rangefeed connection now uses the same window size as other RPC connections.

117554: sqlproxyccl: improve authentication throttle error r=JeffSwenson a=JeffSwenson

The sql proxy will throttle connection attempts if a (client IP, tenant cluster) pair has too many authentication failures. The error is usually caused by a misconfigured password in a connection pool. This change replaces the "connection attempt throttled" error message with "too many failed authentication attempts". There is a hint that includes this message but not all drivers are configured to log hints.

Fixes #117552

Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: Jeff <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Jan 10, 2024
This was originally removed in #115599 due to #114097 merging,
but adminui was reverted in #117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit that referenced this pull request Jan 12, 2024
This was originally removed in #115599 due to #114097 merging,
but adminui was reverted in #117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Jan 16, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Feb 6, 2024
This was originally removed in cockroachdb#115599 due to cockroachdb#114097 merging,
but adminui was reverted in cockroachdb#117141 and mistakenly did not
revert the special case for virtual clusters.

Release note: None
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Mar 7, 2024
After cockroachdb#114097, we need to specify ports for cockroach node
connections. This one was missed and timeseries were not being
collected in test failures.

Release note: none
Epic: none
DarrylWong added a commit to DarrylWong/fork that referenced this pull request Mar 7, 2024
The change cockroachdb#114097 removed default port assumptions for SQLPort and AdminUIPort,
and instead finds open ports to assign. However, Prometheus scraping relies on
this hardcoded AdminUIPort as it can't discover the port itself.

This change temporarily reverts AdminUIPort back to the hardcoded port.

Release note: none
Epic: none
Informs: cockroachdb#117125
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachprod: use a randomly available port by default in start and start-sql
5 participants