Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for advice: how to set modifiedIndex of wait #2039

Closed
erictune opened this issue Jan 5, 2015 · 11 comments
Closed

Request for advice: how to set modifiedIndex of wait #2039

erictune opened this issue Jan 5, 2015 · 11 comments

Comments

@erictune
Copy link

erictune commented Jan 5, 2015

Over at Kubernetes, we just discovered coreos/fleet#408.

I see that fleet used coreos/fleet#411 as a fix.
If I understand that fix correctly, when the client tries to wait with waitIndex which is too old, and the client gets an EcodeEventIndexCleared event, then tries idx = idx + 1 until it finds a suitable index.
I see that was subsequently replaced with a new implementation, it doesn't use indexes (coreos/fleet@bdf5f72), though I don't quite follow how that works.

What is the current best practice for a client that wants to maintain an eventually consistent copy of a subset of data stored in etcd, without excessive polling?

That is, I have a loop like this

  while true {
      GET a bunch of keys I am interested in
      wait at some waitIndex
     handle event or error
 }

The tricky part is, what waitIndex to use? It has to be large enough so that it does not cause an EcodeEventIndexCleared but small enough so that it does not miss changes that happened after the initial GET. As coreos/fleet#408 points out, the minimum correct waitIndex might be larger than any modifiedIndex in the GET response. Is there way to get the maximum modifiedIndex. Obviously, I could GET every key, but if I am only interested in few keys, that seems inefficient. Is there a way to just get the global maximum modifiedIndex? Is there a way to get it at the same time as I GET a subset of keys, so that I can be sure I am talking to the same replica?

@erictune erictune changed the title Request for advice: how to set modifiedIndex of watch Request for advice: how to set modifiedIndex of wait Jan 5, 2015
@erictune
Copy link
Author

erictune commented Jan 5, 2015

@lavalamp

@xiang90
Copy link
Contributor

xiang90 commented Jan 5, 2015

@erictune @lavalamp I think the bottom half part of the section should help to explain the problem https://github.com/coreos/etcd/blob/master/Documentation/2.0/api.md#waiting-for-a-change.

Also we plan to make this a little bit easier to handle . Basically, etcd keeps all the versions of the keys until the the application ask etcd to compact the versions.

@erictune
Copy link
Author

erictune commented Jan 5, 2015

Thanks, that explains it!

@smarterclayton
Copy link
Contributor

Even when applications can ask etcd to compact versions, the applications would still need to coordinate to ensure all watchers are reopened prior to that. It's possible that when the version compaction happens, existing watchers should be closed with a final event indicating what the latest etcdIndex is, and that's the value that those watchers should begin watching at.

I think it's extremely difficult for clients to solve this correctly without some sort of window notification coming from etcd. Even today, a watcher that sees 1000 events go by without an event should really receive an update that tells them the window has been updated and they should resynchronize to the latest etcd-index. That has to come in-band with the watch (since an out-of-band request can't ensure that the watcher's clock and the oob requester's clock are aligned).

@xiang90
Copy link
Contributor

xiang90 commented Jan 5, 2015

Even when applications can ask etcd to compact versions, the applications would still need to coordinate to ensure all watchers are reopened prior to that.

As I said, we plan to make it easier to reason about. There is no way to completely solve the problem unless you have unlimited memory/disk. So you always need to prepare for a fresh restart.

It's possible that when the version compaction happens, existing watchers should be closed with a final event indicating what their latest update is, and that's the value that those watchers would begin watching at.

Compaction will not affect the current watcher. You will never compact into the future or actually near to current time.

Even today, a watcher that sees 1000 events go by should really receive an update that tells them the window has been updated and they should resynchronize to the latest etcd-index.

Well. A simpler solution is to have a application global timer to record the known progress of etcd. That can be shared among all your watchers. Notification per watcher is not necessary I think.

@smarterclayton
Copy link
Contributor

On Jan 5, 2015, at 5:15 PM, Xiang Li [email protected] wrote:

Even when applications can ask etcd to compact versions, the applications would still need to coordinate to ensure all watchers are reopened prior to that.

As I said, we plan to make it easier to reason about. There is no way to completely solve the problem unless you have unlimited memory/disk. So you always need to prepare for a fresh restart.

It's possible that when the version compaction happens, existing watchers should be closed with a final event indicating what their latest update is, and that's the value that those watchers would begin watching at.

Compaction will not affect the current watcher. You will never compact into the future or actually near to current time.

Even today, a watcher that sees 1000 events go by should really receive an update that tells them the window has been updated and they should resynchronize to the latest etcd-index.

Well. A simpler solution is to have a application global timer to record the known progress of etcd. That can be shared among all your watchers. Notification per watcher is not necessary I think.

Good point, although what if the watcher lags on the server due to latency and gc? Without doing a single watch all from the application, I don't know that individual calls to watch can reason about what events have been delivered.

We've talked about a single global watch anyway on the master, so maybe we can move to that and then provide the update mechanism in our own channel for our clients.


Reply to this email directly or view it on GitHub.

@xiang90
Copy link
Contributor

xiang90 commented Jan 5, 2015

Good point, although what if the watcher lags on the server due to latency and gc? Without doing a single watch all from the application, I don't know that individual calls to watch can reason about what events have been delivered.

I am talking about compacting the history that is generated hours ago. The lag should be at most seconds level. So probably you do not need to worry about it.

@smarterclayton
Copy link
Contributor

Ah, so as long as no watch in the app goes without waiting on an event longer than the compaction window (so the client detects hung connections prior to a compaction removing history) we can guarantee the window. I think that answers my other question about behavior in changed store - thanks.

On Jan 5, 2015, at 5:36 PM, Xiang Li [email protected] wrote:

Good point, although what if the watcher lags on the server due to latency and gc? Without doing a single watch all from the application, I don't know that individual calls to watch can reason about what events have been delivered.

I am talking about compacting the history that is generated hours ago. The lag should be at most seconds level. So probably you do not need to worry about it.


Reply to this email directly or view it on GitHub.

@smarterclayton
Copy link
Contributor

Would it be possible for the watch to heartbeat (potentially optionally) with an entry every X raft indices? Chosen well for that window that removes the need for more complex logic.

On Jan 5, 2015, at 5:36 PM, Xiang Li [email protected] wrote:

Good point, although what if the watcher lags on the server due to latency and gc? Without doing a single watch all from the application, I don't know that individual calls to watch can reason about what events have been delivered.

I am talking about compacting the history that is generated hours ago. The lag should be at most seconds level. So probably you do not need to worry about it.


Reply to this email directly or view it on GitHub.

@xiang90
Copy link
Contributor

xiang90 commented Jan 6, 2015

@smarterclayton

Would it be possible for the watch to heartbeat (potentially optionally) with an entry every X raft indices?

this is doable. Can you please open an issue for this as a feature request? I am going to close this issue.

@smarterclayton
Copy link
Contributor

Opened #2048.

On Jan 6, 2015, at 5:32 PM, Xiang Li [email protected] wrote:

Closed #2039.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants