Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jepsen: (re-)enable the big-skew nemeses #15893

Closed
knz opened this issue May 12, 2017 · 4 comments
Closed

jepsen: (re-)enable the big-skew nemeses #15893

knz opened this issue May 12, 2017 · 4 comments
Assignees
Milestone

Comments

@knz
Copy link
Contributor

knz commented May 12, 2017

Following up to #15717.

These nemeses are currently disabled because of assumptions about the naming of network interfaces.
This needs to be solved, e.g. by adding the appropriate command flags to align the jepsen expectations with the interfaces proposed by the spawned VMs.

@dianasaur323 dianasaur323 added this to the 1.1 milestone May 12, 2017
@bdarnell bdarnell modified the milestones: 1.1, 1.2 Sep 20, 2017
@bdarnell
Copy link
Contributor

@knz Do you remember why we have so many different variants of the skew nemesis? We have small-skews, subcritical-skews, and critical-skews, which just mess with the clock (so they don't have network interface dependencies), and big-skews and huge-skews, which adjust the clock and inject an equivalent amount of network latency.

It seems to me that we should at least be running subcritical-skews. The big-skews nemesis creates a larger offset than we allow, so I would expect it to fail in the absence of the network manipulation. Are the big and huge skew nemeses meant to show that high network latency somehow increases our tolerance for clock offsets?

(Also, all the numbers here need to be doubled to match our increased default max-offset).

@knz
Copy link
Contributor Author

knz commented Nov 1, 2017

so I wrote small-skews and big-skews initially. Both were implemented to check that our safeguards are indeed effective in the "common" case.

Kyle then added the other three; subcriticial-skews and critical-skews to check edge behaviors, and huge-skews to add the check of our HLC logic on top of what big-skews was meant to test.

I think we should deprecate big-skews in favor of huge-skews with a random factor on the network latency; I also think our prod/test clusters are already exercising small-skews, given the large observed drifts in practice. For the other two, I don't have an opinion.

@bdarnell
Copy link
Contributor

bdarnell commented Nov 1, 2017

My question was mainly about why some (but not all) of the skews are accompanied by network offsets (and why it's a single nemesis instead of separate network-latency and clock-offset nemeses that can be run together). But it looks like that was kyle's change. I'll turn on the subcritical skews and ignore big-skews for now.

@bdarnell
Copy link
Contributor

bdarnell commented Nov 9, 2017

We're now running with subcritical skews. I don't think we really need to enable big skews or skews with network delays, so I'm closing this issue.

@bdarnell bdarnell closed this as completed Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants