Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jepsen: failure due to unexpected retry error in Sets test #15733

Closed
knz opened this issue May 6, 2017 · 2 comments
Closed

jepsen: failure due to unexpected retry error in Sets test #15733

knz opened this issue May 6, 2017 · 2 comments
Assignees
Milestone

Comments

@knz
Copy link
Contributor

knz commented May 6, 2017

Found here:
https://teamcity.cockroachdb.com/viewLog.html?buildId=243251&buildTypeId=Cockroach_Nightlies_Jepsen&tab=buildLog#_state=147,1998,4680,4704&focus=4704

(Using the new scripts from #15717)

The error occurs very early during the test's initialization, presumably when creating the tables:

[...]
INFO [2017-05-06 02:21:41,705] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Cockroach installed
INFO [2017-05-06 02:21:41,705] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Cockroach installed
INFO [2017-05-06 02:21:41,705] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Cockroach installed
INFO [2017-05-06 02:21:41,705] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Cockroach installed
INFO [2017-05-06 02:21:41,746] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Cockroach installed
INFO [2017-05-06 02:21:48,315] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 clock reset:  6 May 02:21:48 ntpdate[32069]: step time server 91.189.91.157 offset 0.000011 sec
INFO [2017-05-06 02:21:48,315] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 clock reset:  6 May 02:21:48 ntpdate[32578]: step time server 91.189.91.157 offset -0.000003 sec
INFO [2017-05-06 02:21:48,315] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 clock reset:  6 May 02:21:48 ntpdate[31391]: step time server 91.189.91.157 offset 0.000005 sec
INFO [2017-05-06 02:21:48,315] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 clock reset:  6 May 02:21:48 ntpdate[31997]: step time server 91.189.91.157 offset 0.000034 sec
INFO [2017-05-06 02:21:48,356] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 clock reset:  6 May 02:21:48 ntpdate[32302]: step time server 91.189.91.157 offset 0.000012 sec
INFO [2017-05-06 02:21:48,460] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Starting CockroachDB...
INFO [2017-05-06 02:21:48,465] jepsen node 35.185.94.39 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure  --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:21:48,566] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Cockroach started
INFO [2017-05-06 02:21:53,669] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Starting packet capture (filtering on 35.185.107.12 )...
INFO [2017-05-06 02:21:53,669] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Starting packet capture (filtering on 35.185.107.12 )...
INFO [2017-05-06 02:21:53,669] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Starting packet capture (filtering on 35.185.107.12 )...
INFO [2017-05-06 02:21:53,669] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Starting packet capture (filtering on 35.185.107.12 )...
INFO [2017-05-06 02:21:53,669] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Starting packet capture (filtering on 35.185.107.12 )...
INFO [2017-05-06 02:21:54,075] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Starting CockroachDB...
INFO [2017-05-06 02:21:54,075] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Starting CockroachDB...
INFO [2017-05-06 02:21:54,075] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Starting CockroachDB...
INFO [2017-05-06 02:21:54,075] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Starting CockroachDB...
INFO [2017-05-06 02:21:54,077] jepsen node 35.190.131.90 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.133.148,35.190.137.205,35.190.136.99 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:21:54,077] jepsen node 35.190.136.99 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.133.148,35.190.137.205,35.190.131.90 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:21:54,077] jepsen node 35.190.133.148 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.137.205,35.190.136.99,35.190.131.90 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:21:54,077] jepsen node 35.190.137.205 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.133.148,35.190.136.99,35.190.131.90 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:21:54,178] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Cockroach started
INFO [2017-05-06 02:21:54,178] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Cockroach started
INFO [2017-05-06 02:21:54,179] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Cockroach started
INFO [2017-05-06 02:21:54,182] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Cockroach started
INFO [2017-05-06 02:21:59,344] jepsen node 35.185.94.39 - jepsen.cockroach 35.185.94.39 Creating database...
INFO [2017-05-06 02:21:59,648] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Cockroach killed.
INFO [2017-05-06 02:21:59,649] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Cockroach killed.
INFO [2017-05-06 02:21:59,649] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Cockroach killed.
INFO [2017-05-06 02:21:59,649] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Cockroach killed.
INFO [2017-05-06 02:21:59,649] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Cockroach killed.
INFO [2017-05-06 02:21:59,751] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Starting CockroachDB...
INFO [2017-05-06 02:21:59,751] jepsen node 35.185.94.39 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure  --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:21:59,853] jepsen node 35.185.94.39 - jepsen.cockroach.auto 35.185.94.39 Cockroach started
INFO [2017-05-06 02:22:04,853] jepsen node 35.185.94.39 - jepsen.cockroach 35.185.94.39 Restarted to work around balancing bug
INFO [2017-05-06 02:22:04,854] jepsen node 35.185.94.39 - jepsen.cockroach 35.185.94.39 Setup complete
INFO [2017-05-06 02:22:04,955] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Starting CockroachDB...
INFO [2017-05-06 02:22:04,955] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Starting CockroachDB...
INFO [2017-05-06 02:22:04,955] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Starting CockroachDB...
INFO [2017-05-06 02:22:04,956] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Starting CockroachDB...
INFO [2017-05-06 02:22:04,956] jepsen node 35.190.137.205 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.133.148,35.190.136.99,35.190.131.90 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:22:04,956] jepsen node 35.190.136.99 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.133.148,35.190.137.205,35.190.131.90 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:22:04,956] jepsen node 35.190.133.148 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.137.205,35.190.136.99,35.190.131.90 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:22:04,956] jepsen node 35.190.131.90 - jepsen.control {:cmd sudo -S -u cockroach bash -c "cd /; env COCKROACH_LINEARIZABLE=false COCKROACH_MAX_OFFSET=250ms start-stop-daemon --start --background --make-pidfile --remove-pidfile --pidfile /opt/cockroach/pid --no-close --chuid cockroach --chdir /opt/cockroach --exec /opt/cockroach/cockroach -- start --insecure --join=35.185.94.39,35.190.133.148,35.190.137.205,35.190.136.99 --logtostderr >> /opt/cockroach/logs/cockroach.stderr 2>&1", :in root
}
INFO [2017-05-06 02:22:05,057] jepsen node 35.190.133.148 - jepsen.cockroach.auto 35.190.133.148 Cockroach started
INFO [2017-05-06 02:22:05,057] jepsen node 35.190.137.205 - jepsen.cockroach.auto 35.190.137.205 Cockroach started
INFO [2017-05-06 02:22:05,058] jepsen node 35.190.136.99 - jepsen.cockroach.auto 35.190.136.99 Cockroach started
INFO [2017-05-06 02:22:05,058] jepsen node 35.190.131.90 - jepsen.cockroach.auto 35.190.131.90 Cockroach started
INFO [2017-05-06 02:22:10,058] jepsen node 35.190.136.99 - jepsen.cockroach 35.190.136.99 Restarted to work around balancing bug
INFO [2017-05-06 02:22:10,058] jepsen node 35.190.137.205 - jepsen.cockroach 35.190.137.205 Restarted to work around balancing bug
INFO [2017-05-06 02:22:10,058] jepsen node 35.190.131.90 - jepsen.cockroach 35.190.131.90 Restarted to work around balancing bug
INFO [2017-05-06 02:22:10,058] jepsen node 35.190.133.148 - jepsen.cockroach 35.190.133.148 Restarted to work around balancing bug
INFO [2017-05-06 02:22:10,059] jepsen node 35.190.136.99 - jepsen.cockroach 35.190.136.99 Setup complete
INFO [2017-05-06 02:22:10,059] jepsen node 35.190.137.205 - jepsen.cockroach 35.190.137.205 Setup complete
INFO [2017-05-06 02:22:10,059] jepsen node 35.190.131.90 - jepsen.cockroach 35.190.131.90 Setup complete
INFO [2017-05-06 02:22:10,059] jepsen node 35.190.133.148 - jepsen.cockroach 35.190.133.148 Setup complete
INFO [2017-05-06 02:22:10,128] clojure-agent-send-off-pool-3 - jepsen.cockroach.sets 35.190.133.148 Connected
INFO [2017-05-06 02:22:10,128] clojure-agent-send-off-pool-2 - jepsen.cockroach.sets 35.190.136.99 Connected
INFO [2017-05-06 02:22:10,128] clojure-agent-send-off-pool-1 - jepsen.cockroach.sets 35.190.137.205 Connected
INFO [2017-05-06 02:22:10,128] clojure-agent-send-off-pool-4 - jepsen.cockroach.sets 35.185.94.39 Connected
INFO [2017-05-06 02:22:10,128] clojure-agent-send-off-pool-0 - jepsen.cockroach.sets 35.190.131.90 Connected
INFO [2017-05-06 02:22:12,150] clojure-agent-send-off-pool-3 - jepsen.cockroach.sets 35.190.133.148 Creating table
WARN [2017-05-06 02:22:14,614] clojure-agent-send-off-pool-3 - jepsen.cockroach.sets Encountered error with conn [:cockroach "35.190.133.148"]; reopening
WARN [2017-05-06 02:22:14,626] clojure-agent-send-off-pool-3 - jepsen.cockroach.sets Encountered error with conn [:cockroach "35.185.94.39"]; reopening
WARN [2017-05-06 02:22:14,626] clojure-agent-send-off-pool-8 - jepsen.cockroach.sets Encountered error with conn [:cockroach "35.190.137.205"]; reopening
WARN [2017-05-06 02:22:14,627] clojure-agent-send-off-pool-7 - jepsen.cockroach.sets Encountered error with conn [:cockroach "35.190.131.90"]; reopening
WARN [2017-05-06 02:22:14,627] clojure-agent-send-off-pool-9 - jepsen.cockroach.sets Encountered error with conn [:cockroach "35.190.136.99"]; reopening
INFO [2017-05-06 02:22:14,636] jepsen test runner - jepsen.core Snarfing log files
[...]
ERROR [2017-05-06 02:22:22,622] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
org.postgresql.util.PSQLException: ERROR: restart transaction: HandledRetryableTxnError: TransactionRetryError: retry txn "sql txn" id=f9637e48 key=/Table/0/0 rw=true pri=0.01474274 iso=SERIALIZABLE stat=PENDING epo=0 ts=1494037334.449734196,1 orig=1494037332.150969729,0 max=1494037332.151492593,0 wto=false rop=false seq=6
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2458) ~[postgresql-9.4.1211.jar:9.4.1211]
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2158) ~[postgresql-9.4.1211.jar:9.4.1211]
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:291) ~[postgresql-9.4.1211.jar:9.4.1211]
        at org.postgresql.jdbc.PgConnection.executeTransactionCommand(PgConnection.java:779) ~[postgresql-9.4.1211.jar:9.4.1211]
        at org.postgresql.jdbc.PgConnection.commit(PgConnection.java:800) ~[postgresql-9.4.1211.jar:9.4.1211]
        at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:616) ~[classes/:na]
        at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:585) ~[classes/:na]
        at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:598) ~[classes/:na]
        at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:585) ~[classes/:na]
        at clojure.java.jdbc$db_do_execute_prepared_statement.invokeStatic(jdbc.clj:783) ~[classes/:na]
        at clojure.java.jdbc$db_do_execute_prepared_statement.invoke(jdbc.clj:778) ~[classes/:na]
        at clojure.java.jdbc$db_do_prepared.invokeStatic(jdbc.clj:814) ~[classes/:na]
        at clojure.java.jdbc$db_do_prepared.invoke(jdbc.clj:795) ~[classes/:na]
        at clojure.java.jdbc$execute_BANG_$execute_helper__1854.invoke(jdbc.clj:961) ~[classes/:na]
        at clojure.java.jdbc$execute_BANG_.invokeStatic(jdbc.clj:963) ~[classes/:na]
        at clojure.java.jdbc$execute_BANG_.invoke(jdbc.clj:943) ~[classes/:na]
        at clojure.java.jdbc$execute_BANG_.invokeStatic(jdbc.clj:954) ~[classes/:na]
        at clojure.java.jdbc$execute_BANG_.invoke(jdbc.clj:943) ~[classes/:na]
        at jepsen.cockroach.sets.SetsClient$fn__3670.invoke(sets.clj:110) ~[classes/:na]
        at jepsen.cockroach.sets.SetsClient.setup_BANG_(sets.clj:103) ~[classes/:na]
        at jepsen.core$run_case_BANG_$fn__1245.invoke(core.clj:287) ~[classes/:na]
        at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.8.0.jar:na]
        at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.8.0.jar:na]
        at clojure.core$apply.invokeStatic(core.clj:646) ~[clojure-1.8.0.jar:na]
        at clojure.core$apply.invoke(core.clj:641) ~[clojure-1.8.0.jar:na]
        at jepsen.util$fcatch$wrapper__66.doInvoke(util.clj:27) ~[classes/:na]
        at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.8.0.jar:na]
        at jepsen.util$real_pmap$launcher__71$fn__72.invoke(util.clj:47) ~[classes/:na]
        at clojure.core$binding_conveyor_fn$fn__4676.invoke(core.clj:1938) ~[clojure-1.8.0.jar:na]
        at clojure.lang.AFn.call(AFn.java:18) ~[clojure-1.8.0.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
@knz knz assigned knz and unassigned knz May 6, 2017
@petermattis petermattis modified the milestone: 1.1 Jun 1, 2017
@bdarnell
Copy link
Contributor

Decoding this error: sets.clj:110 is when the test creates its table. This is done without a retry loop (and it shouldn't need one, since it's a single statement outside a transaction). If it fails, all the other test threads fail too.

Other tests (including comments.clj) are also seeing restart transaction errors in their table creations.

@bdarnell
Copy link
Contributor

Fixed by #16874

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants