-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: don't scp files via local host when doing treedist #99103
Comments
aliher1911
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-testing
Testing tools and infrastructure
T-testeng
TestEng Team
labels
Mar 21, 2023
cc @cockroachdb/test-eng |
Test start timing example:
|
I think we are missing -A -R flags to enable direct copy between hosts and agent forwarding to ensure authentication to target host could be performed regardless of hosts being correctly setup to talk to eachother. |
craig bot
pushed a commit
that referenced
this issue
Mar 23, 2023
99283: roachprod: change scp flags to avoid copy via local host r=tbg a=aliher1911 Previously default scp option made it to copy files via local host. This was causing unnecessary delays and network usage because files were first copied to one node, then distributed from that node using scp between remote locations. Unfortunately the default scp behaviour is to fetch from src and put to dst to avoid authenticating from src to dst. This commit adds scp options to do direct copy. Fixes: #99103 Release note: None Co-authored-by: Oleg Afanasyev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Roachprod uses syncedCluster.UseTreeDist=true by default. This options allows doing tree fanout when copying files by seeding one node with file and then using
scp user@host1:file user@host2:file
. Unfortunately it is not doing what one might think and retrieves file from host1 and then sending it to host2. You can see that if you try doing put with UseTreeDist and comparing its execution time with put to single host first and then comparing withscp file user@host2
executed directly on host1. This behaviour is documented inman scp
.The immediate impact of this behaviour for example is inability to use
pkg/cmd/roachtest/roachstress.sh -c 10
without explicitly setting-p 1
to disable parallel test runs. If multiple tests are started in parallel, network bandwidth is saturated (for remote workers using ADSL) and only copying cockroach takes 10-15 minutes of allocated test run time which in most cases cause tests to timeout. The other negative effect is when doing roachtest failure investigation, each iteration takes at least 10 minutes to copy files. This effect became worse with introduction ofcockroach-default
.We can work that around by calling scp on the host1 directly via ssh. It should be fine to do so as we know that cluster hosts have certificate auth set up in dev clusters. Alternatively, we could disable TreeDist altogether as it is making things worse in my experience because we need to transfer data twice as much.
Jira issue: CRDB-25704
The text was updated successfully, but these errors were encountered: