Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the Statistics info in ETCD support using all nodes of the cluster #9

Open
drusellers opened this issue May 1, 2014 · 27 comments

Comments

@drusellers
Copy link
Owner

The goal here is that if server A is down and we know that there is also server B and C - be able to fall back to those.

@smalldave
Copy link
Contributor

Just looking at this.
So get the members of the cluster from the statistics?
How / when does the node list get updated?

@drusellers
Copy link
Owner Author

Thoughts right now.

  • Expose a method that will allow us to 'store' a list of these nodes - that the users and ourselves can call when needed to update this internal 'registery'
  • work to get our code to connect 'based' on this 'registry'
  • use this for a bit, see how it plays

Later

if we can figure out where we are putting our calls to this new 'method' use that to inform us of where to put it in the codez. I'm guessing, when we get a 'cannot connect' type exception and we start connecting to a new node, once we have we reup the list.

But I really think step one is just keeping the initial list of options and see how that goes. We can make it smarter once this is baked and we are happy.

@smalldave
Copy link
Contributor

Ok. So list of nodes in the constructor and the ability to adjust that list afterwards?
To start with should we just round robin through all nodes in list order regardless of previous success failure?
This won't be particularly efficient if the first node in the list is out of the cluster.
There is also load balancing to consider but I guess we should leave that for now.
A simple implementation is likely to put all the load on one node though.

@drusellers
Copy link
Owner Author

My initial concern is less about 'load' and more about when a failure is
detected -> shift gears to another node. Keep it simple and wait for actual
issues to come up before we get all fancy on it. :)

On Thu, May 1, 2014 at 10:01 AM, David Smith [email protected]:

Ok. So list of nodes in the constructor and the ability to adjust that
list afterwards?
To start with should we just round robin through all nodes in list order
regardless of previous success failure?
This won't be particularly efficient if the first node in the list is out
of the cluster.
There is also load balancing to consider but I guess we should leave that
for now.
A simple implementation is likely to put all the load on one node though.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-41917647
.

@smalldave
Copy link
Contributor

Agreed. I'm some of the way to this already. Needs a bit of work.

@drusellers
Copy link
Owner Author

:)

On Thu, May 1, 2014 at 10:04 AM, David Smith [email protected]:

Agreed. I'm some of the way to this already. Needs a bit of work.


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-41917953
.

@smalldave
Copy link
Contributor

Any thoughts on testing this?

@drusellers
Copy link
Owner Author

Not yet, I will as i write the code but the big tests would be something like.

  • given 2 servers A and B
  • start at A
  • should connect to A
  • make A fail
  • should try A - get error
  • should shift to B
  • connect to B
  • get success
  • future calls go to B

@smalldave
Copy link
Contributor

Current tests require one etcd node. Will this require multiple?

@smalldave
Copy link
Contributor

or do something more mocky?

@smalldave
Copy link
Contributor

The response back from a server that is down is slow. If that is first in the list then response is very slow every time. Worse if multiple servers down.
Unless can speed up that initial response I guess we need something more sophisticated than a round robin every time.
Maybe put failed nodes to the bottom of the list?

@drusellers
Copy link
Owner Author

So we could have a 'composite' EtcdClient to start with that is like 'FailOverEtcdClient' you can seed it with multiple nodes, but it could keep an ordered list or queue. We can peek the top one and use that. If it ever fails we keep popping and reque until we find one that works. Then whenever we find new ones we just 'enqueue' them. Later we can add more sophisticated policies like 'slowness' and such. That keeps the core EtcdClient code nice and simple too.

@smalldave
Copy link
Contributor

I've got something kind of working and I'm looking at unit testing
I've got a cluster going. When all the nodes in the cluster are up the unit tests seem to pass.
However when I take the first node out of the cluster tests start failing randomly (not consistently).
I've check the node retry and that seems to work find. In fact I've taken out the retry and hard coded the node and that had the same problem.
It looks like etcd is returning before actually having written to the node.
I realise writes to the cluster aren't consistent but I would have thought that writes and reads to the same node were consistent?
I guess I'm wrong?
Also the unit tests rely on this.
I'll have a read but do you know anything about this?

@drusellers
Copy link
Owner Author

Nope. You are ahead of me here, but my guess is that the gossip protocol
takes a second to sync everything. Might add a wait in the test to see how
big it has to get to confirm. ???

On Fri, May 2, 2014 at 10:40 AM, David Smith [email protected]:

I've got something kind of working and I'm looking at unit testing
I've got a cluster going. When all the nodes in the cluster are up the
unit tests seem to pass.
However when I take the first node out of the cluster tests start failing
randomly (not consistently).
I've check the node retry and that seems to work find. In fact I've taken
out the retry and hard coded the node and that had the same problem.
It looks like etcd is returning before actually having written to the
node.
I realise writes to the cluster aren't consistent but I would have thought
that writes and reads to the same node were consistent?
I guess I'm wrong?
Also the unit tests rely on this.
I'll have a read but do you know anything about this?


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-42045291
.

@smalldave
Copy link
Contributor

Yep. Thread.Sleep does it again. I'll commit that now :)
I'll have a read. Hope this is by design.
May need to take a different approach with the tests or perhaps fine to assume it will appear consistent with a single node

My understanding is Raft isn't a gossip protocol?

@drusellers
Copy link
Owner Author

derp. just that its distributed, and it may need time to percolate. :)

On Fri, May 2, 2014 at 11:35 AM, David Smith [email protected]:

Yep. Thread.Sleep does it again. I'll commit that now :)
I'll have a read. Hope this is by design.
May need to take a different approach with the tests or perhaps fine to
assume it will appear consistent with a single node

My understanding is Raft isn't a gossip protocol?


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-42051242
.

@smalldave
Copy link
Contributor

Answered by the coreos guys.
Writes to the leader are consistent otherwise not. I wasn't writing to the leader so hence inconsistent reads.
The current tests are fine as long as testing against a single node
Still not really got my head around how to run tests against a cluster.
If you have some spare time this is worth watching
https://www.youtube.com/watch?v=XiXZOF6dZuE
I'm guessing this is why the leadership and lock modules have been deprecated.

@smalldave
Copy link
Contributor

I'm looking at spinning up etcd instances as part of testing.
I need to know where the etcd binary is.
I'm wondering about creating a tools directory and putting the binary in there.
I could download it as part of the build but I'm not really sure how that would work (could do it easily with rake but VS not so much)
Also just been looking at building in mono. All builds nicely after I change the toolsversion in the proj files. I'd like to have the option to run tests from rake but again I need to know where the xunit console is.

@smalldave
Copy link
Contributor

I forgot to reply about the composite EtcdClient. The approach I am taking is to have EtcdClient accept a params of uris in it's constructor. I've got a Cluster class that EtcdClient initalises. This has a list of nodes to iterate over and a method to demote (needs a less loaded name) a node if you think it's iffy. makeXRequest then just loops over the nodes until it succeeds, demoting any notes that fail.
If there are no successful responses it throws an error.
The makeXRequest methods would need consolidating but it seems fairly straightforward so far.
I'll commit to a branch of my fork shortly so you can have a look

@smalldave
Copy link
Contributor

Here is what I have so far
https://github.com/smalldave/etcetera/commit/b038f219ce774a17bab4c18664f8a2b6cdab2d1b
Ideally there'd be something in EtcdClient very much like makeKeyRequest but with the key / lock path already appended. This could be shared between the EtcdClient and the EtcdLockModule. I'm not sure how to achieve that though without adding something public to EtcdClient.
I've chosen to force users of the stats module to specify the node they are talking about, seem reasonable?

@drusellers
Copy link
Owner Author

re: etcd binary - i'd be ok with that. a tools dir seems appropriate
re: rake 👍 (msbuild 👎 )

@drusellers
Copy link
Owner Author

put another interface on the EtcdClient and have it implement it 'explicitly' then have the 'modules' take the client in as that interface. That way they will see the method but nothing else will.

@smalldave
Copy link
Contributor

nice. like that.

@smalldave
Copy link
Contributor

problem with tools directory is that binary needed isdependent on os

@drusellers
Copy link
Owner Author

i find that windows people tend to need more help than the unix folks.
maybe just check in one for windows? Or just don't worry about it.

On Wed, May 7, 2014 at 8:36 AM, David Smith [email protected]:

problem with tools directory is that binary needed isdependent on os


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-42427221
.

@smalldave
Copy link
Contributor

My new plan is to use vagrant to spin up 3 machines with known IP addresses and then manipulate them over ssh (where necessary) as part of the tests.
Vagrant file can go in the source control and will just download the released version of etcd for now.
That is OS independent and allows me to test the cluster.

@drusellers
Copy link
Owner Author

+1-d

On Thu, May 8, 2014 at 8:02 AM, David Smith [email protected]
wrote:

My new plan is to use vagrant to spin up 3 machines with known IP addresses and then manipulate them over ssh (where necessary) as part of the tests.
Vagrant file can go in the source control and will just download the released version of etcd for now.

That is OS independent and allows me to test the cluster.

Reply to this email directly or view it on GitHub:
#9 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants