Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

functional: improve TestReconfigureServer test #1554

Merged

Conversation

tixxdz
Copy link
Contributor

@tixxdz tixxdz commented Apr 15, 2016

Remove the sleep call that allowed us to produce reliable tests, it
turns out it may fail in rare cases. I suspect this due to journald
delays and how things are serialized and started when running the test.

Instead, start a normal unit then unload it to reproduce a real behaviour
then follow up with the SIGHUG on fleetd.

If the "Reloading configuration" message do not show up on the logs, then
we do not fail since the signal was received and processed, instead
continue with the "list-units" check, if it succeed then the test should have
passed at this point. However, we continue and check the logs that we ignored
previously since the test also checks for the
"Failed serving HTTP on listener" which informs us that fleetd failed.
So we check the logs again if we find them we proceed with that message
and check it, otherwise we skip the test at this point. We do not want
developers to have failing functional tests results due to obscure
delays. However the test succeed now and is more reliable. I have been running
it for more than 1 hour now (more than 100 times), it did not fail and it was
never skipped.

@dongsupark
Copy link
Contributor

Looks good. There are several typos like "sighug", but other than that it looks fine to me.

Remove the sleep call that allowed us to produce reliable tests, it
turns out it may fail in rare cases. I suspect this due to journald
delays and how things are serialized and started when running the test.

Instead, start a normal unit then unload it to reproduce a real behaviour
then follow up with the SIGHUP on fleetd.

If the "Reloading configuration" message do not show up on the logs, then
we do not fail since the signal was received and processed, instead
continue with the "list-units" check, if it succeed then the test should have
passed at this point. However, we continue and check the logs that we ignored
previously since the test also checks for the
"Failed serving HTTP on listener" which informs us that fleetd failed.
So we check the logs again if we find them we proceed with that message
and check it, otherwise we skip the test at this point. We do not want
developers to have failing functional tests results due to obscure
delays. However the test succeed now and is more reliable. I have been running
it for more than 1 hour now (more than 100 times), it did not fail and it was
never skipped.
@tixxdz tixxdz force-pushed the tixxdz/server-reconfigure-improvements branch from 60d51f7 to 9b173e1 Compare April 15, 2016 14:00
@tixxdz tixxdz merged commit 071230a into coreos:master Apr 15, 2016
// Without this sleep, the entire fleetd test always ends up succeeding
// no matter whether SIGHUP came or not.
_, _ = cluster.MemberCommand(m0, "sh", "-c", `'sleep 2'`)

err = waitForFleetdSocket(cluster, m0)
if err != nil {
t.Fatalf("Failed to get a list of fleetd sockets: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that you now ensure fleetd is fully functional by executing the commands below, I don't think there is any point in keeping this check here?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please submit another PR and make sure that it works, my change was to remove the sleep and make the test reliable so other PRs can be merged.

@antrik
Copy link
Contributor

antrik commented Apr 15, 2016

@tixxdz as a general remark on procedure, I don't think it's a good idea to merge major changes like this hastily without time for a detailed review. If a test is problematic because of the intermittent failures, it's probably better to disable it temporarily to take out the urgency, and follow up with a separate PR for fixing the test that can be reviewed without hurry :-)

@tixxdz
Copy link
Contributor Author

tixxdz commented Apr 15, 2016

@tixxdz as a general remark on procedure, I don't think it's a good idea to merge major changes like this hastily without time for a detailed review. If a test is problematic because of the intermittent failures, it's probably better to disable it temporarily to take out the urgency, and follow up with a separate PR for fixing the test that can be reviewed without hurry :-)

That's precisely what this PR does, it tries its best to test the behaviour, and at the end if it can't then it will skip it which is perfectly sane from my point of view, and this is not a major change this is improving functional test which was succeeding 90%, I tried to improve it then if we can't determine we skip which didn't happen in my tests, but better safe than sorry. Now it would be really much productive if you could please open a new PR and investigate this and probably remove that t.Skip() call.

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants