Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queries are A LOT slower after losing and re-establishing a connection to zookeeper #183

Open
bsara opened this issue Sep 2, 2022 · 20 comments

Comments

@bsara
Copy link
Contributor

bsara commented Sep 2, 2022

In the tests that I conducted on my local machine, I was seeing query speeds of ~50ms go to no better than 10 seconds for the same query after re-establishing a lost connection to zookeeper.

Setup

  • single rdf-delta (v1.1.2) server utilizes zookeeper and s3 bucket for storage
  • at least two rdf-delta-fuseki (v1.1.2) servers running and using the single rdf-delta server

Steps to Reproduce

  1. Start zookeeper, s3 bucket, rdf-delta, and all rdf-delta-fuseki servers.
  2. Run the following query against any of the fuseki servers:
    ASK WHERE { ?s ?p ?o }
    You should receive a response very, very quickly (mine usually returned in under 100 ms)
  3. Restart the zookeeper server so that the rdf-delta server looses its connection and has to re-establish it.
  4. After the connection is re-established, submit the query from step 2 to any fuseki server again. The query will take significantly longer to complete and will continue to take a long time to complete until the rdf-delta server is restarted (for me, I couldn't get anything better than a 10 second response time).

RDF Delta Logs

I've added labels to the below log to indicate when each step described above (except for step 1) was performed during the logging.

...3:30,056 DEBUG [main-1]                                                    D.Delta : Connection string: zookeeper:2181
...3:30,073 DEBUG [main-1]                                                    D.Delta : Delta initialization (level 0)
...3:30,074 DEBUG [main-1]                                                    D.Delta : Delta initialization (level 1)
...3:30,077 DEBUG [main-1]                                                    D.Delta : Provider: file
...3:30,077 DEBUG [main-1]                                                    D.Delta : Provider: rdb
...3:30,077 DEBUG [main-1]                                                    D.Delta : Provider: mem
...3:30,077 DEBUG [main-1]                                                    D.Delta : Provider: zk
...3:30,078 DEBUG [main-1]                                                    D.Delta : Provider: Local
...3:30,083 DEBUG [main-1]                                                   D.Config : Setup for provider: zookeeper+s3
...3:31,349  INFO [main-1]                             work.imps.CuratorFrameworkImpl : Starting
...3:31,355 DEBUG [main-1]                             curator.CuratorZookeeperClient : Starting
...3:31,355 DEBUG [main-1]                             apache.curator.ConnectionState : Starting
...3:31,356 DEBUG [main-1]                             apache.curator.ConnectionState : reset
...3:31,362  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
...3:31,362  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:host.name=rdf-delta
...3:31,363  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.version=11.0.14
...3:31,364  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.vendor=Alpine
...3:31,365  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.home=/usr/lib/jvm/java-11-openjdk
...3:31,366  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.class.path=/opt/rdf-delta/delta-server.jar
...3:31,367  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.library.path=/usr/lib/jvm/java-11-openjdk/lib/server:/usr/lib/jvm/java-11-openjdk/lib:/usr/lib/jvm/java-11-openjdk/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
...3:31,367  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.io.tmpdir=/tmp
...3:31,368  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:java.compiler=<NA>
...3:31,369  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:os.name=Linux
...3:31,370  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:os.arch=amd64
...3:31,371  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:os.version=5.10.104-linuxkit
...3:31,372  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:user.name=root
...3:31,373  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:user.home=/root
...3:31,373  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:user.dir=/opt/rdf-delta
...3:31,374  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:os.memory.free=992MB
...3:31,374  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:os.memory.max=2048MB
...3:31,375  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Client environment:os.memory.total=1024MB
...3:31,384  INFO [main-1]                               o.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=zookeeper:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@771a7d53
...3:31,394  INFO [main-1]                             ache.zookeeper.common.X509Util : Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
...3:31,404  INFO [main-1]                             che.zookeeper.ClientCnxnSocket : jute.maxbuffer value is 1048575 Bytes
...3:31,414  INFO [main-1]                              o.apache.zookeeper.ClientCnxn : zookeeper.request.timeout value is 0. feature enabled=false
...3:31,438 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...3:31,440  INFO [main-1]                             work.imps.CuratorFrameworkImpl : Default schema
...3:31,441  INFO [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...3:31,442  INFO [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...3:31,460  INFO [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48114, server: zookeeper/172.18.0.2:2181
...3:31,465 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...3:31,471  INFO [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x100006632d40003, negotiated timeout = 10000
...3:31,474 DEBUG [main-EventThread-15]                apache.curator.ConnectionState : Negotiated session timeout: 10000
...3:31,478  INFO [main-EventThread-15]                k.state.ConnectionStateManager : State change: CONNECTED
...3:31,479 DEBUG [main-EventThread-15]                work.imps.CuratorFrameworkImpl : Clearing sleep for 1 operations
...3:31,492 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4  replyHeader:: 1,1148278,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...3:31,496  INFO [main-EventThread-15]                framework.imps.EnsembleTracker : New config event received: {}
...3:31,496 DEBUG [main-EventThread-15]                framework.imps.EnsembleTracker : Ignoring new config as it is empty
...3:31,497 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 2,4  replyHeader:: 2,1148278,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...3:31,498  INFO [main-EventThread-15]                framework.imps.EnsembleTracker : New config event received: {}
...3:31,499 DEBUG [main-EventThread-15]                framework.imps.EnsembleTracker : Ignoring new config as it is empty
...3:31,507  INFO [main-1]                             ocal.patchstor.zk.PatchStoreZk : Connecting to zookeeper
...3:31,513 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 3,3  replyHeader:: 3,1148278,0  request:: '/delta,F  response:: s{1148099,1148099,1661472301524,1661472301524,0,3,0,0,0,3,1148109}
...3:31,519 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 4,12  replyHeader:: 4,1148278,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,524 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 5,4  replyHeader:: 5,1148278,0  request:: '/delta/activeLogs/my-log,F  response:: #7b202022696422203a202269643a61613862623464392d373139662d346362342d616366622d38353561343532333664373722202c20226e616d6522203a2022636f6c6c6563746976652d6f6e746f6c6f677922202c202275726922203a202264656c74613a636f6c6c6563746976652d6f6e746f6c6f677922207d,s{1148121,1148257,1661472334619,1662141396906,2,0,0,0,124,0,1148121}
...3:31,544 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 6,3  replyHeader:: 6,1148278,0  request:: '/delta/logs/my-log,F  response:: s{1148114,1148114,1661472334570,1661472334570,0,6,0,0,0,6,1148120}
...3:31,552 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 7,12  replyHeader:: 7,1148278,0  request:: '/delta/logs/my-log/versions,F  response:: v{'00000001},s{1148117,1148117,1661472334599,1661472334599,0,1,0,0,0,1,1148205}
...3:31,555 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 8,4  replyHeader:: 8,1148278,0  request:: '/delta/logs/my-log/versions/00000001,F  response:: #fffffff164ffffffb224ffffffbaffffff804dffffffbbffffff9a443f43ffffffe314ffffff85ffffff90,s{1148205,1148205,1661472404031,1661472404031,0,0,0,0,16,0,1148205}
...3:31,558 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 9,4  replyHeader:: 9,1148278,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...3:31,576 DEBUG [main-1]                                                    D.Delta : Register datasource: id:aa8bb4
...3:31,578 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 10,4  replyHeader:: 10,1148278,0  request:: '/delta/activeLogs/other-log,F  response:: #7b202022696422203a202269643a65313338366533322d663236382d343465352d383761632d65326637396364376236303322202c20226e616d6522203a2022636d742d6465706c6f7922202c202275726922203a202264656c74613a636d742d6465706c6f7922207d,s{1148134,1148134,1661472335155,1661472335155,0,0,0,0,106,0,1148134}
...3:31,582 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 11,3  replyHeader:: 11,1148278,0  request:: '/delta/logs/other-log,F  response:: s{1148127,1148127,1661472335106,1661472335106,0,6,0,0,0,6,1148133}
...3:31,585 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 12,12  replyHeader:: 12,1148278,0  request:: '/delta/logs/other-log/versions,F  response:: v{},s{1148130,1148130,1661472335119,1661472335119,0,0,0,0,0,0,1148130}
...3:31,588 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 13,4  replyHeader:: 13,1148278,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...3:31,594 DEBUG [main-1]                                                    D.Delta : Register datasource: id:e1386e
...3:31,597 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 14,12  replyHeader:: 14,1148278,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,601 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 15,12  replyHeader:: 15,1148278,0  request:: '/delta/activeLogs,F  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,603 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 16,4  replyHeader:: 16,1148278,0  request:: '/delta/logs/my-log/dsd,F  response:: #7b202022696422203a202269643a61613862623464392d373139662d346362342d616366622d38353561343532333664373722202c20226e616d6522203a2022636f6c6c6563746976652d6f6e746f6c6f677922202c202275726922203a202264656c74613a636f6c6c6563746976652d6f6e746f6c6f677922207d,s{1148115,1148115,1661472334574,1661472334574,0,0,0,0,124,0,1148115}
...3:31,607 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 17,4  replyHeader:: 17,1148278,0  request:: '/delta/logs/other-log/dsd,F  response:: #7b202022696422203a202269643a65313338366533322d663236382d343465352d383761632d65326637396364376236303322202c20226e616d6522203a2022636d742d6465706c6f7922202c202275726922203a202264656c74613a636d742d6465706c6f7922207d,s{1148128,1148128,1661472335110,1661472335110,0,0,0,0,106,0,1148128}
...3:31,611 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 18,12  replyHeader:: 18,1148278,0  request:: '/delta/activeLogs,F  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,613 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 19,12  replyHeader:: 19,1148278,0  request:: '/delta/activeLogs,F  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,613  INFO [main-1]                                                    D.Delta : Provider: zks3
...3:31,615 DEBUG [main-1]                                                    D.Delta :   [id:aa8bb4, my-log, <delta:my-log>]
...3:31,615 DEBUG [main-1]                                                    D.Delta :   [id:e1386e, other-log, <delta:other-log>]
...3:31,616 DEBUG [main-1]                                                    D.Delta : DataSources: zks3 : [[id:aa8bb4, my-log, <delta:my-log>], [id:e1386e, other-log, <delta:other-log>]]
...3:31,619  INFO [main-1]                                                    D.Config : Delta Server jetty config=/opt/rdf-delta/jetty.xml
...3:31,635  INFO [main-1]                                                    D.Server : Jetty server config file = /opt/rdf-delta/jetty.xml
...3:31,919  INFO [main-1]                                                    D.Delta : RDF Delta 1.1.2 2022-08-17T12:45:57+0000
...3:31,919 DEBUG [main-1]                                                    D.Delta : Server start: port=1063
...3:31,921 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 20,12  replyHeader:: 20,1148278,0  request:: '/delta/activeLogs,F  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,927 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 21,4  replyHeader:: 21,1148278,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...3:31,929  INFO [main-1]                                                    D.Delta :   Data source: [id:e1386e, other-log, <delta:other-log>] version [0,0] type=zks3
...3:31,932 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 22,4  replyHeader:: 22,1148278,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...3:31,933  INFO [main-1]                                                    D.Delta :   Data source: [id:aa8bb4, my-log, <delta:my-log>] version [1,1] type=zks3
...3:35,270 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...3:38,608 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...3:41,946 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...3:44,362 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 23,4  replyHeader:: 23,1148278,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...3:44,363 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.

------------------------------------------------------------------------------------------------------------------------------------
  Step 2 query happened here
------------------------------------------------------------------------------------------------------------------------------------

...3:44,371 DEBUG [qtp1482087396-23-23]                                         D.RPC : [1] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => {  "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...3:47,701 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...3:51,037 DEBUG [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.

------------------------------------------------------------------------------------------------------------------------------------
  Step 3 zookeeper restart happened here
------------------------------------------------------------------------------------------------------------------------------------

...3:53,710  WARN [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Session 0x100006632d40003 for sever zookeeper/172.18.0.2:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x100006632d40003, likely server has closed socket
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) ~[delta-server.jar:?]
...3:54,998 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...3:54,998  INFO [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...3:54,998  INFO [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...3:56,011  WARN [main-SendThread(zookeeper:2181)-14]  o.apache.zookeeper.ClientCnxn : Session 0x100006632d40003 for sever zookeeper/172.18.0.2:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) ~[?:?]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) ~[delta-server.jar:?]
...3:56,012 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.ClientCnxnSocketNIO : Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
	at sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:160) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:913) ~[?:?]
	at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:398) ~[?:?]
	at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:193) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1390) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1331) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1309) ~[delta-server.jar:?]
...3:56,013 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.ClientCnxnSocketNIO : Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
	at sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:160) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:930) ~[?:?]
	at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:406) ~[?:?]
	at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1390) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1331) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1309) ~[delta-server.jar:?]
...3:56,116 DEBUG [main-EventThread-15]                o.apache.curator.RetryLoopImpl : Retry-able exception received
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /delta/logs/my-log/state
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[delta-server.jar:?]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[delta-server.jar:?]
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316) ~[delta-server.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35) ~[delta-server.jar:?]
	at org.seaborne.delta.zk.Zk.zkFetch(Zk.java:155) ~[delta-server.jar:?]
	at org.seaborne.delta.zk.Zk.zkFetchJson(Zk.java:138) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.getWatchedState(PatchLogIndexZk.java:288) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.syncState(PatchLogIndexZk.java:282) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.lambda$new$0(PatchLogIndexZk.java:112) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:588) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[delta-server.jar:?]
...3:57,714 DEBUG [main-SendThread(zookeeper:2181)-14]  .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...3:57,715  INFO [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...3:57,716  INFO [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...3:57,717  INFO [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48126, server: zookeeper/172.18.0.2:2181
...3:57,718 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...3:57,739  INFO [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x100006632d40003, negotiated timeout = 10000
...3:57,748 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 24,101  replyHeader:: 24,1148278,0  request:: 1148278,v{'/zookeeper/config,'/delta/logs/my-log/state,'/delta/logs/other-log/state},v{},v{'/delta/activeLogs}  response:: null
...4:01,081 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 3ms.
...4:02,297 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 25,4  replyHeader:: 25,1148278,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}

------------------------------------------------------------------------------------------------------------------------------------
  Step 4 queries start happening here
------------------------------------------------------------------------------------------------------------------------------------

...4:02,299 DEBUG [qtp1482087396-18-18]                                          D.RPC : [2] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => {  "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...4:05,614 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...4:06,117 DEBUG [main-EventThread-15]                 o.apache.curator.RetryLoopImpl : Retrying operation
...4:06,119 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 26,4  replyHeader:: 26,1148278,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:06,131 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 27,4  replyHeader:: 27,1148278,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:06,132  INFO [main-EventThread-15]                 k.state.ConnectionStateManager : State change: SUSPENDED
...4:09,470 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...4:12,807 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...4:16,121 DEBUG [main-SendThread(zookeeper:2181)-14]   o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...4:16,132  WARN [Curator-ConnectionStateManager-0-13] k.state.ConnectionStateManager : Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted session timeout ms: 10000
...4:16,134  INFO [Curator-ConnectionStateManager-0-13] he.zookeeper.ZooKeeperTestable : injectSessionExpiration() called
...4:16,237 DEBUG [main-EventThread-15]                 o.apache.curator.RetryLoopImpl : Retry-able exception received
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /delta/activeLogs
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:134) ~[delta-server.jar:?]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[delta-server.jar:?]
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2857) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231) ~[delta-server.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchStoreZk.getWatchLogs(PatchStoreZk.java:187) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchStoreZk.lambda$new$1(PatchStoreZk.java:120) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:588) ~[delta-server.jar:?]
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[delta-server.jar:?]
...4:21,180 DEBUG [qtp1482087396-24-24]                 o.apache.curator.RetryLoopImpl : Retry-able exception received
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /delta/logs/my-log/state
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:134) ~[delta-server.jar:?]
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[delta-server.jar:?]
	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316) ~[delta-server.jar:?]
	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304) ~[delta-server.jar:?]
	at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35) ~[delta-server.jar:?]
	at org.seaborne.delta.zk.Zk.zkFetch(Zk.java:155) ~[delta-server.jar:?]
	at org.seaborne.delta.zk.Zk.zkFetchJson(Zk.java:138) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.getWatchedState(PatchLogIndexZk.java:288) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.syncState(PatchLogIndexZk.java:282) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.syncVersionInfo(PatchLogIndexZk.java:293) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.patchstores.PatchLogBase.getInfo(PatchLogBase.java:111) ~[delta-server.jar:?]
	at org.seaborne.delta.server.local.DeltaLinkLocal.getPatchLogInfo(DeltaLinkLocal.java:195) ~[delta-server.jar:?]
	at org.seaborne.delta.server.http.S_DRPC.describePatchLog(S_DRPC.java:325) ~[delta-server.jar:?]
	at org.seaborne.delta.server.http.S_DRPC.executeAction(S_DRPC.java:159) ~[delta-server.jar:?]
	at org.seaborne.delta.server.http.DeltaServlet.doCommon(DeltaServlet.java:147) ~[delta-server.jar:?]
	at org.seaborne.delta.server.http.S_DRPC.doPost(S_DRPC.java:60) ~[delta-server.jar:?]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:523) ~[delta-server.jar:?]
	at org.seaborne.delta.server.http.DeltaServlet.service(DeltaServlet.java:87) ~[delta-server.jar:?]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) ~[delta-server.jar:?]
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764) ~[delta-server.jar:?]
	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1630) ~[delta-server.jar:?]
	at org.seaborne.delta.server.http.F_PatchFilter.doFilter(F_PatchFilter.java:77) ~[delta-server.jar:?]
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) ~[delta-server.jar:?]
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600) ~[delta-server.jar:?]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1378) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) ~[delta-server.jar:?]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:463) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1300) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.Server.handle(Server.java:562) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497) ~[delta-server.jar:?]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:530) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:379) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:146) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[delta-server.jar:?]
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894) ~[delta-server.jar:?]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038) ~[delta-server.jar:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
...4:26,133  WARN [Curator-ConnectionStateManager-0-13]                            k.state.ConnectionStateManager : Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10001. Adjusted session timeout ms: 10000
...4:26,134 DEBUG [Curator-ConnectionStateManager-0-13]                            apache.curator.ConnectionState : reset
...4:26,134 DEBUG [Curator-ConnectionStateManager-0-13]                              o.apache.zookeeper.ZooKeeper : Close called on already closed client
...4:26,134  INFO [Curator-ConnectionStateManager-0-13]                              o.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=zookeeper:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@771a7d53
...4:26,135  INFO [Curator-ConnectionStateManager-0-13]                            che.zookeeper.ClientCnxnSocket : jute.maxbuffer value is 1048575 Bytes
...4:26,136  INFO [Curator-ConnectionStateManager-0-13]                             o.apache.zookeeper.ClientCnxn : zookeeper.request.timeout value is 0. feature enabled=false
...4:26,139 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...4:26,139  INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...4:26,140  INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...4:26,140  INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48140, server: zookeeper/172.18.0.2:2181
...4:26,141 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...4:26,147  INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x1000069b68a0000, negotiated timeout = 10000
...4:26,147 DEBUG [Curator-ConnectionStateManager-0-EventThread-29]                apache.curator.ConnectionState : Negotiated session timeout: 10000
...4:26,147  INFO [Curator-ConnectionStateManager-0-EventThread-29]                k.state.ConnectionStateManager : State change: LOST
...4:26,147  INFO [Curator-ConnectionStateManager-0-EventThread-29]                k.state.ConnectionStateManager : State change: RECONNECTED
...4:26,147 DEBUG [Curator-ConnectionStateManager-0-EventThread-29]                work.imps.CuratorFrameworkImpl : Clearing sleep for 0 operations
...4:26,149 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4  replyHeader:: 1,1148279,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...4:26,150  INFO [Curator-ConnectionStateManager-0-EventThread-29]                framework.imps.EnsembleTracker : New config event received: {}
...4:26,150 DEBUG [Curator-ConnectionStateManager-0-EventThread-29]                framework.imps.EnsembleTracker : Ignoring new config as it is empty
...4:26,237 DEBUG [main-EventThread-15]                                            o.apache.curator.RetryLoopImpl : Retrying operation
...4:26,240 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 2,12  replyHeader:: 2,1148279,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,243 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 3,4  replyHeader:: 3,1148279,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,245 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 4,4  replyHeader:: 4,1148279,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:26,246 DEBUG [main-EventThread-15]                                            work.imps.CuratorFrameworkImpl : Clearing sleep for 0 operations
...4:26,248 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 5,12  replyHeader:: 5,1148279,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,250 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 6,4  replyHeader:: 6,1148279,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,252 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 7,4  replyHeader:: 7,1148279,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:26,253  WARN [main-EventThread-15]                                            apache.curator.ConnectionState : Session expired event received
...4:26,253 DEBUG [main-EventThread-15]                                            apache.curator.ConnectionState : reset
...4:26,253 DEBUG [main-EventThread-15]                                              o.apache.zookeeper.ZooKeeper : Closing session: 0x1000069b68a0000
...4:26,253 DEBUG [main-EventThread-15]                                             o.apache.zookeeper.ClientCnxn : Closing client for session: 0x1000069b68a0000
...4:26,255 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28]  o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 8,-11  replyHeader:: 8,1148280,0  request:: null response:: null
...4:26,256 DEBUG [main-EventThread-15]                                             o.apache.zookeeper.ClientCnxn : Disconnecting client for session: 0x1000069b68a0000
...4:26,358  INFO [main-EventThread-15]                                              o.apache.zookeeper.ZooKeeper : Session: 0x1000069b68a0000 closed
...4:26,358  INFO [main-EventThread-15]                                              o.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=zookeeper:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@771a7d53
...4:26,358  INFO [main-EventThread-15]                                            che.zookeeper.ClientCnxnSocket : jute.maxbuffer value is 1048575 Bytes
...4:26,360  INFO [main-EventThread-15]                                             o.apache.zookeeper.ClientCnxn : zookeeper.request.timeout value is 0. feature enabled=false
...4:26,361  INFO [main-EventThread-15]                                            k.state.ConnectionStateManager : State change: LOST
...4:26,362 DEBUG [main-SendThread(zookeeper:2181)-30]                             .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...4:26,362  INFO [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...4:26,362  INFO [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...4:26,362  INFO [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48142, server: zookeeper/172.18.0.2:2181
...4:26,363 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...4:26,368  INFO [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x1000069b68a0001, negotiated timeout = 10000
...4:26,369 DEBUG [main-EventThread-31]                                            apache.curator.ConnectionState : Negotiated session timeout: 10000
...4:26,369  INFO [main-EventThread-31]                                            k.state.ConnectionStateManager : State change: RECONNECTED
...4:26,370 DEBUG [main-EventThread-31]                                            work.imps.CuratorFrameworkImpl : Clearing sleep for 0 operations
...4:26,374 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 1,12  replyHeader:: 1,1148281,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,375 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 2,4  replyHeader:: 2,1148281,0  request:: '/zookeeper/config,T  response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...4:26,376 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 3,4  replyHeader:: 3,1148281,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,380  INFO [main-EventThread-31]                                            framework.imps.EnsembleTracker : New config event received: {}
...4:26,381 DEBUG [main-EventThread-31]                                            framework.imps.EnsembleTracker : Ignoring new config as it is empty
...4:26,381 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 4,4  replyHeader:: 4,1148281,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:26,383 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 5,4  replyHeader:: 5,1148281,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,386 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 6,12  replyHeader:: 6,1148281,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,386  INFO [Curator-ConnectionStateManager-0-EventThread-29]                 o.apache.zookeeper.ClientCnxn : EventThread shut down for session: 0x1000069b68a0000
...4:26,387 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 7,4  replyHeader:: 7,1148281,0  request:: '/delta/logs/other-log/state,T  response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:29,724 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:31,183 DEBUG [qtp1482087396-24-24]                                            o.apache.curator.RetryLoopImpl : Retrying operation
...4:31,186 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 8,4  replyHeader:: 8,1148282,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:31,188 DEBUG [qtp1482087396-24-24]                                                                     D.RPC : [3] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => {  "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...4:34,524 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:36,389 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 9,12  replyHeader:: 9,1148282,0  request:: '/delta/activeLogs,T  response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:36,389  INFO [main-EventThread-15]                                             o.apache.zookeeper.ClientCnxn : EventThread shut down for session: 0x100006632d40003
...4:39,725 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:43,061 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:44,602 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 10,4  replyHeader:: 10,1148282,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:44,604 DEBUG [qtp1482087396-21-21]                                                                     D.RPC : [4] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => {  "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...4:47,939 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:51,277 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 2ms.
...4:54,613 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:56,799 DEBUG [main-SendThread(zookeeper:2181)-30]                              o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 11,4  replyHeader:: 11,1148282,0  request:: '/delta/logs/my-log/state,T  response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:56,800 DEBUG [qtp1482087396-24-24]                                                                     D.RPC : [5] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => {  "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }

@afs
Copy link
Owner

afs commented Sep 6, 2022

Maybe what you are seeing is possibly due to the time it takes for zookeeper server to be ready to respond to it's client requests.

A ZK server can take a significantly long time to be ready. It is not just a matter of starting the server. But during this time, it's networking is running so at the low level, it clients can connect. They don't get a response though until the server is ready.

The time a ZK server it takes to come up to speed gets longer, the longer the cluster has been in operation - it replays the ZK log. There is also a leader election.

One approach I have seen is to add a server to the cluster, making N+1, then moving a server (e.g for maintenance). The cluster can still operate. Simpy restarting a server drops the cluster to below the critical threshold which causes knock-on effects.

@bsara
Copy link
Contributor Author

bsara commented Sep 6, 2022

Maybe what you are seeing is possibly due to the time it takes for zookeeper server to be ready to respond to it's client requests.

I've seen the slowness regardless of waiting several minutes or immediately after the zookeeper server is back up and running. Also, at initial startup, I don't see the slowness at all. It only occurs AFTER the rdf-delta server has connected to the zookeeper, then loses that connection and re-establishes it. No amount of time is changing the slowness of the responses from what I'm seeing.

@afs
Copy link
Owner

afs commented Sep 8, 2022

Why did the zookeeper server restart?

I'm not sure what is going on them - I'm don't know which part of the system is pausing.

Is there always a pause? Is it always the same length of time?
Is it always less than a certain amount (e.g. 2 minutes)? That suggest a TCP level timeout.

Maybe there is stale TCP connection (not properly shutdown). The patch server doesn't know the original zookeeper has gone until usage is attempted and then fails.

When the Fuseki server makes a query - is it a query or an update? Updates involve more ZK interaction.

@bsara
Copy link
Contributor Author

bsara commented Sep 8, 2022

Why did the zookeeper server restart?

I'm not exactly sure. Regardless, the main point is that a connection is lost for one reason or another to zookeeper.

Is there always a pause?

After a connection to zookeeper is re-established, yes. After that connection is re-established and then the entire rdf-delta server is restarted, no. A full restart of the rdf-delta server always fixes the issue.

Is it always the same length of time?

It seems to be. I was clocking the times to complete the simple ask query that I mention in the original post at 10 seconds very consistently.

That suggest a TCP level timeout.

There isn't a timeout on the rdf-delta side, it does complete eventually, it just takes a super long time to complete after the zookeeper connection is re-established.

When the Fuseki server makes a query - is it a query or an update?

It is a query. The only request sent to rdf-delta is the log description RPC call to determine if any new patches are available.


To me, all of the evidence I'm seeing points to the issue being inside the rdf-delta server somewhere (or an api that it is using in the curator libs); especially given the fact that the only way that I've found to resolve the issue is to restart the rdf-delta server itself. I wonder if there are connections that, when closed, aren't actually disposed of or something like that? I really have no idea though. My knowledge about interacting with zookeeper, let alone via curator, is extremely limited.

@afs
Copy link
Owner

afs commented Sep 8, 2022

Loss of a ZK server is supposed to be a rare event.

There are several 10s time related constants:

Zk.createCuratorClient:

        RetryPolicy policy = new ExponentialBackoffRetry(10000, 5);
        CuratorFramework client =
            CuratorFrameworkFactory.builder()
                .connectString(connectString)
                .sessionTimeoutMs(10000)
                .connectionTimeoutMs(10000)
...

@bsara
Copy link
Contributor Author

bsara commented Sep 8, 2022

Loss of a ZK server is supposed to be a rare event.

I can say that the loss of connection is definitely rare for us, but the few times where it has happened, it has created some serious issues.


Those 10s constants are a good point. That could definitely point us in the right direction.

@bsara
Copy link
Contributor Author

bsara commented Sep 8, 2022

Also, not all of the zookeeper instances have gone down when the loss of connection occurs, it seems that it will happen if only one of our more than two instances drops...I'll have to double check though.

@bsara
Copy link
Contributor Author

bsara commented Sep 8, 2022

speaking of those timeouts, it would be really nice to be able to customize them via dcmd.

@afs
Copy link
Owner

afs commented Sep 9, 2022

if only one of our more than two instances drops

I'm not sure what that means.

@bsara
Copy link
Contributor Author

bsara commented Sep 9, 2022

I'm not sure what that means.

Sorry, I mean that we have more than 2 zookeeper servers running and that if one of them is "lost" momentarily, then this seems to happen.

@afs
Copy link
Owner

afs commented Sep 10, 2022

I hope you have at least 3!

You do have a SPOF at the single patch log server. As these are locally stateless, you won't loose data but can have a service interruption for updates.

@afs
Copy link
Owner

afs commented Sep 10, 2022

So far:

Some sort of of pause when the ZK server changes is to be expected.
The pause seems to be of a length determined by the Curator setup.

When you say "and will continue to take a long time to complete", any queries in the switchover window will have a delay.

Are queries issued serially or overlapping?

If a sequence of queries is sent, waiting for each to complete before sending the next, so queries 2 onwards experience the delay? Even after several minutes? (this is probing the working of the Curator/Zookeeper libraries.)

@bsara
Copy link
Contributor Author

bsara commented Sep 13, 2022

I hope you have at least 3

Ha. Yes, we do.

You do have a SPOF at the single patch log server.

That is understood.

Are queries issued serially or overlapping?

In our test and production environments, they are overlapping. However, when I did my local testing, I did a single query at a time and waited for a response before submitting another and got the same results (I.E. the 10 second delay in the responses).

@oliverbrandt
Copy link

As we do run into the same problem, we did some tests and came to the following conclusions:

  • The retry policy does not kick in at all.
  • In most cases. the connection timeout is the one determining the time.
  • In rare cases, the session timeout is added on top of the connection timeout.
  • With the timeouts set to 10 secs, the system does not recover from the long answer times, not even after a couple of hours. However, when reducing the connection timeout to 100 ms, the system is able to recover in most cases. However, we are unsure about the implications of such a short timeout.

The tests were done with queries issued serially and some random times between the queries.

@afs
Copy link
Owner

afs commented Nov 2, 2022

Hi @oliverbrandt -- thank you very much for the information; it helps to have another environment reporting.

In what environment are you running? Is it a cloud platform?

@afs
Copy link
Owner

afs commented Nov 3, 2022

May be relevant:

Apache Curator 5.4.0 has just been released. It includes an upgrade of the ZooKeeper dependency from 3.6.3 (As in RDF Delta 1.1.2) to 3.7.1.

@oliverbrandt
Copy link

The tests to determine the effect of different timeouts were done locally.
The experiences with long running instances have been made on Azure Kubernetes Service.

@afs
Copy link
Owner

afs commented Nov 4, 2022

@bsara @oliverbrandt -- In #154, the use of load balancer confused ZooKeeper. Do either of you use load balancers in front of the ZooKeeper servers?

Azure question: What is being used for the object blob storage? #154 uses MinIO in front of the Azure Blog Store in a k8s environment. It's probably unrelated but it's always useful to know about the deployment setup.

@oliverbrandt
Copy link

We did configure the components to directly address the ZKs instead of going through a service.

For the blob storage, MinIO is used as a frontend to the Azure Blob Storage.

@michaelnielson-pcc
Copy link

@bsara's deployment does not have a load balancer in front of zookeeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants