-
Notifications
You must be signed in to change notification settings - Fork 308
figure out why test suite is randomly failing #413
Comments
There have been a rash of test failures at Travis in the past day:
Are they related? What's going on? |
312 is a not-unanticipated consequence of including a couple integration tests that depend on GitHub and Twitter API. Adding a skip_expensive_tests option ticketed as #414. |
296 - 299 are part of the #410 clusterhuck. |
That leaves 302 and 295 unexplained. 302 https://travis-ci.org/whit537/www.gittip.com/builds/3536557 |
Both are in test_is_suspicious.py. They smell like intermittent failures I'm seeing even now as part of the latest, uncommitted work I'm doing on #406. I suspect it's a database interaction bug. |
These failures come several checkins and hours apart, and the failures/errors look very similar. I believe they're not caused by any immediate code change. |
I'm getting intermittent db interaction failures with the current test suite for the Participant.absorb method I'm working on, and they appear more often than not. I'm going to treat that under this and debug it. |
Hypothesis: using a connection pool with more than one connection means that the state of the database is not reset after one test before a subsequent test hits the database using a different connection. |
Hmmm ... I inspected context.db.pool._pool after taking out the maxconn argument, and the pool still had but one connection in it. I seem to recall now that it is indeed filled dynamically. Perhaps the failure comes from reusing the connection? Perhaps the occasional success in the current case is correlated to a second connection existing in the pool? Perhaps the failures in builds 302 and 295 above correlate with the opposite? |
More info: If I drop to pdb in the failing test, and immediately continue, the test passes. A timing bug, indeed. |
More info: Now it's failing consistently again. Does Postgres still have connection open? |
I deleted all logs so that I could capture clean logs for a case. Now it's working again but logfile hasn't been recreated. |
I've captured logfiles of working and non-working runs but an initial inspection is not revealing the issue to me. |
More info: If I'm not mistaken, it's always the same test that fails, when it fails, no matter what other tests are run with it. I'm testing a single test script with five tests. |
Hypothesis: The test suite starts failing a definite number of test runs after it starts working again. |
Hypothesis: Restarting the postgres server makes it start working again. |
More info: I've been kind of watching. I really think it's the same test that fails every time. |
Hypothesis: The bug is in the new get_transaction method I added to postgres.py. |
More info: I trimmed up the tests to the failing test and the one after it. It started failing again on the fifth run. |
More info: I trimmed up the tests to the single failing test. It started failing again on the second run! |
More info: It seems to alternate between success and failure with successive runs. Still on single failing test. |
Hypothesis: When the failing test is the only one in the test suite, it alternately succeeds and fails.
|
Hunch: The problem is with the way data is cleared between tests. |
More info: Oh wow. I added some print statements to the |
I modified hammer.py to restart the postgres server before running the tests. |
Now I've surpressed the restart. It's failing again ... |
All tests are on. It appears to be periodic again!
|
I turned the restart back on, and it's still failing periodically.
It's failing every eighth test run. |
I trimmed down to one test in the suite instead of six and it just passed 50 times.
|
Okay, it looks like I can trigger this by toggling the test that fails in the middle of the transaction in Participant.absorb:
|
Restarting postgres seems to have no effect:
|
|
Here are the two tests that, together, trigger this bug: def test_alice_ends_up_tipping_bob_two_dollars():
tips = [ ('alice', 'bob', 1)
, ('alice', 'carl', 1)
]
with tip_graph(*tips) as context:
Participant('bob').absorb('carl')
expected = Decimal('2.00')
actual = context.diff()['tips']['inserts'][0]['amount']
assert actual == expected, actual
def test_absorbing_unknown_fails():
tips = [ ('alice', 'bob', 1)
, ('alice', 'carl', 1)
]
with tip_graph(*tips) as context:
assert_raises(IntegrityError, Participant('bob').absorb, 'jim')
actual = context.diff()
assert actual == {}, actual Either of those tests by itself is fine. When they're both present, the test run fails every eighth run, and always on the two_dollars test. (Why was it every other run above?) |
So this is good, actually. :) I wrote the get_transaction method for this Participant.absorb method, and the unknown_fails test is in fact designed to test this kind of interaction. |
Hypothesis: The transaction is not getting rolled back as desired for the exception case. |
Hypothesis: The sort order of context.diff() is varying. |
The errors in build 302 and 295 smell a lot like this, too. |
I'm pretty sure the periodicity is coming from Python's dict ordering, not from Postgres. I'm not sure whether Postgres sequences are absolutely guaranteed to be monotonically increasing (as this StackOverflow comment suggests), but I certainly don't expect it to vary on the eights. However, for Python dict ordering to vary in some multiple of eight sounds much more plausible, given that it's affected by byte padding and whatnot blah blah blah. |
Which is to say that simply sorting the dicts that we iterate over in Context._diff to produce the lists of inserts, updates, and deletes should be quiet sufficient for our test infrastructure. |
|
No description provided.
The text was updated successfully, but these errors were encountered: