-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patchlog server starts re-syncing #171
Comments
We just saw that after the patchlog servers did reach the latest patch they start all over again |
@tomkxy - not sure what's going on here - it is not something I've seen myself or seen any reports. Do you have the log of a patch log server doing this? Is it happening when the patch log server starts/restarts? Were there any Zookeeper incidents around the time this happened? Given this and #154, could you run as a test a simpler setup (not production) with the file-backed patch log server? |
@afs - Yes, we saw that after a restart. However, what I don't get it really starts from version 0 and we have right now 292234 patch versions. Btw, did you consider my request in #154, we desperately looking forward to somebody having a look into your config and helping us to figure out the root cause of the problems we are facing. |
some additional infos:
Obviously, it started from version 0, we saw that the state file looked like:
Now after a couple of hours, it has synced up to version 293230, four patches are missing.
However, the patch log server is showing that it fetches starting from version 0, so it goes over all patches again. |
@afs Today, I was able to re-produce that behavior. It turns out, that while a Fuseki server is syncing any new query hitting the server will trigger a resync with the same initial and target version of the currently running sync. |
Hi @tomkxy , that's sounds like something I can investigate. I'll look at setting up a test case. Could you attach the Fuseki configuration file for the server to this PR please? |
Hi @afs, I re-produced the problem with a rather simple setup. One patchlog server, one Fuseki instance. Fuseki Log (see the two entries: INFO [id:ac5a53:ABC] Sync: Versions [, ver:268]). The second one was triggered by the query while the first one was running. `A71396588@T000ad93b4 rdf-delta-1.1.0 % ./dcmd fuseki --port 3031 --conf fuseki-config1.ttl ` PatchServer Log: Query: ` Fuseki config:
|
Hi @tomkxy Thanks for that. I think I know where to look. Not directly related: |
Hi @afs, the Fuseki servers are running as stateful set in a Kubernetes cluster with a PVC mounted where the database resides. The database and the ZONE files are usually filled. Anyhow, thinking about that: If a Fuseki node is in initial sync it is not responsive (not sure how it is if it if just catching up for a couple of patches). In a HA setup with load balancers in front of the Fusekis that is really an issue (since this usually takes longer). I think, the Fuseki server should have an endpoint which returns whether they are able to process requests. So if the Fuseki goes into an initial sync, the endpoint would indicate that, e.g. returning a 5xx or 4xx. The implementation must be very lightweight since the endpoint would be called frequently. This could then be easily by a load balancer to determine whether the node can process requests.e e.g. in Kubernetes a readiness probe could be configured. Kubernetes would not direct any traffic to a pod which is not in ready state. What do you think about it? |
Hi @afs update to the sync topic: |
Hi @tomkxy I have found the place where overlapping sync's occur. With long initial sync, that's going to cause overlap when the first requests come in. Thought - if the problem of #154 is that patch servers are not in-step or close, then there may be large jumps especially around restart because a Fuseki server may be allocated to a different patch server. The issue about being unresponsive to requests if starting with a very large sync - certainly something can be done. Please raise a separate issue for this. There are choices - whether to build into Fuseki or have a separate subcommand of Each Fuseki server should have it's own database storage and zone. There is some process checking but if PVCs are shared, the single-copy lock on a running TDB database can't be guaranteed. |
Hi @afs ,
I raised a new issue. However, as long as we cannot avoid that a re-sync is just started in the middle of its operation, a separate sync would not help us.
What happens if a Fuseki server is allocated to a different patch server? Why could cause that a problem. Because what we see with that initial sync is that it happens even so the latest version number in the state file says that the Fuseki is up to date, but then for whatever reason it resets the version number and starts the re-sync.
The Fusekis are all having a separate db and ZONE file. |
Release 1.1.2 done - contains this fix. |
In our setup (see issue #154) we see now all over sudden, the patch log servers trying to resync. Applying all patch log. We didn't have a new Fuseki database or anything else and don't understand what triggers the re-sync. Since we have quite some patches, this takes very long
Any thoughts?
The text was updated successfully, but these errors were encountered: