Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jepsen: apt failures #31944

Closed
cockroach-teamcity opened this issue Oct 27, 2018 · 5 comments
Closed

jepsen: apt failures #31944

cockroach-teamcity opened this issue Oct 27, 2018 · 5 comments
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. no-issue-activity O-robot Originated from a bot. X-stale

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/d07351b2d1a3e9b5519aa8bc662db0ceb7b7ef48

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=jepsen-batch3/register/majority-ring PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=988969&tab=buildLog

The test failed on 31663:
	test.go:639,cluster.go:1110,jepsen.go:87,jepsen.go:127,jepsen.go:313: /home/agent/work/.go/bin/roachprod run teamcity-988969-jepsen-batch3:1-6 -- sh -c "sudo apt-get -y update > logs/apt-upgrade.log 2>&1" returned:
		stderr:
		
		stdout:
		teamcity-988969-jepsen-batch3: sh -c "sudo apt-get -y upda...........
		   1: 
		   2: 
		   3: 
		exit status 100
		   4: 
		   5: 
		   6: 
		Error:  exit status 100
		: exit status 1
	test.go:639,cluster.go:1110,jepsen.go:74,asm_amd64.s:573,panic.go:377,test.go:640,cluster.go:1110,jepsen.go:87,jepsen.go:127,jepsen.go:313: test already failed

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 27, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 27, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5ef4d2c8621fc5465f73a96221b0bd0bc5cd27aa

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=jepsen-batch3/register/majority-ring PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=990073&tab=buildLog

The test failed on master:
	test.go:639,jepsen.go:243,jepsen.go:304: /home/agent/work/.go/bin/roachprod run teamcity-990073-jepsen-batch3:6 -- bash -e -c "\
		cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
		 ~/lein run test \
		   --tarball file://${PWD}/cockroach.tgz \
		   --username ${USER} \
		   --ssh-private-key ~/.ssh/id_rsa \
		   --os ubuntu \
		   --time-limit 300 \
		   --concurrency 30 \
		   --recovery-time 25 \
		   --test-count 1 \
		   -n 10.128.0.19 -n 10.128.0.36 -n 10.128.0.18 -n 10.128.0.24 -n 10.128.0.21 \
		   --test register --nemesis majority-ring \
		> invoke.log 2>&1 \
		" returned:
		stderr:
		
		stdout:
		Error:  exit status 255
		: exit status 1

@petermattis petermattis assigned bdarnell and unassigned petermattis Oct 28, 2018
@bdarnell
Copy link
Contributor

Two different apt failures. First one is a failure in apt-get update.

Get:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [869 kB]
Ign:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
Get:21 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [698 kB]
Get:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [1,123 kB]
Ign:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
Err:19 http://us-central1.gce.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
  Writing more data than expected (1123337 > 1123271) [IP: 35.184.213.5 80]
Get:31 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Get:32 http://security.ubuntu.com/ubuntu xenial-security/main Sources [136 kB]
Get:33 http://security.ubuntu.com/ubuntu xenial-security/restricted Sources [2,116 B]
Get:34 http://security.ubuntu.com/ubuntu xenial-security/universe Sources [78.8 kB]
Get:35 http://security.ubuntu.com/ubuntu xenial-security/multiverse Sources [2,088 B]
Get:36 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [573 kB]
Get:37 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [393 kB]
Get:38 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [151 kB]
Get:39 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [3,460 B]
Get:40 http://security.ubuntu.com/ubuntu xenial-security/multiverse Translation-en [1,744 B]
Fetched 24.5 MB in 3s (6,669 kB/s)
Reading package lists...
E: Failed to fetch http://us-central1.gce.archive.ubuntu.com/ubuntu/dists/xenial-updates/main/binary-amd64/Packages  Writing more data than expected (1123337 > 1123271) [IP: 35.184.213.5 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.

Second one is #31780: apt-get install failed, but the logs don't show any evidence of failure.

@bdarnell bdarnell changed the title roachtest: jepsen-batch3/register/majority-ring failed jepsen: apt failures Oct 29, 2018
@tbg tbg added A-testing Testing tools and infrastructure C-test-failure Broken test (automatically or manually discovered). and removed C-test-failure Broken test (automatically or manually discovered). A-testing Testing tools and infrastructure labels Oct 30, 2018
@tbg
Copy link
Member

tbg commented Dec 11, 2018

These are now "caught" by grepping them out in the jepsen nightly script. I'll leave this issue open but move it out of the test-failure label.

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. and removed C-test-failure Broken test (automatically or manually discovered). C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Dec 11, 2018
@awoods187 awoods187 removed this from the 19.1 milestone Mar 22, 2019
andreimatei added a commit to andreimatei/cockroach that referenced this issue May 9, 2019
Jepsen setup sometimes gets an exit code 100 from apt-get. This patch
skips the test instead of failing when that happens.
apt problems are tracked in cockroachdb#31944

Fixes cockroachdb#37375

Release note: None
@andreimatei
Copy link
Contributor

These are now "caught" by grepping them out in the jepsen nightly script.

That wasn't true, but I'm making it true in #37430

craig bot pushed a commit that referenced this issue May 9, 2019
37430: roachtest: ignore apt errors in jepsen r=andreimatei a=andreimatei

Jepsen setup sometimes gets an exit code 100 from apt-get. This patch
skips the test instead of failing when that happens.
apt problems are tracked in #31944

Fixes #37375

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
@bdarnell bdarnell removed their assignment Jul 31, 2019
@github-actions
Copy link

github-actions bot commented Jun 5, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. no-issue-activity O-robot Originated from a bot. X-stale
Projects
None yet
Development

No branches or pull requests

6 participants