figure out why test suite is randomly failing #413

chadwhitacre · 2012-12-08T01:54:50Z

No description provided.

chadwhitacre · 2012-12-08T02:01:06Z

There have been a rash of test failures at Travis in the past day:

~~312 https://travis-ci.org/whit537/www.gittip.com/builds/3552478~~ - HTTP error contacting GitHub
302 https://travis-ci.org/whit537/www.gittip.com/builds/3536557 - ???
~~299 https://travis-ci.org/whit537/www.gittip.com/builds/3536003~~ - ImportError: cannot import name get_tips_and_total
~~298 https://travis-ci.org/whit537/www.gittip.com/builds/3535944~~ - ImportError: cannot import name get_tips_and_total
~~297 https://travis-ci.org/whit537/www.gittip.com/builds/3535920~~ - ImportError: cannot import name get_tips_and_total
~~296 https://travis-ci.org/whit537/www.gittip.com/builds/3535831~~ - ImportError: No module named participant
295 https://travis-ci.org/whit537/www.gittip.com/builds/3535764 - ???

Are they related? What's going on?

chadwhitacre · 2012-12-08T02:07:16Z

312 is a not-unanticipated consequence of including a couple integration tests that depend on GitHub and Twitter API. Adding a skip_expensive_tests option ticketed as #414.

chadwhitacre · 2012-12-08T02:08:08Z

296 - 299 are part of the #410 clusterhuck.

chadwhitacre · 2012-12-08T02:09:43Z

That leaves 302 and 295 unexplained.

302 https://travis-ci.org/whit537/www.gittip.com/builds/3536557
295 https://travis-ci.org/whit537/www.gittip.com/builds/3535764

chadwhitacre · 2012-12-08T02:11:09Z

Both are in test_is_suspicious.py. They smell like intermittent failures I'm seeing even now as part of the latest, uncommitted work I'm doing on #406. I suspect it's a database interaction bug.

chadwhitacre · 2012-12-08T02:14:46Z

These failures come several checkins and hours apart, and the failures/errors look very similar. I believe they're not caused by any immediate code change.

chadwhitacre · 2012-12-08T02:16:53Z

I'm getting intermittent db interaction failures with the current test suite for the Participant.absorb method I'm working on, and they appear more often than not. I'm going to treat that under this and debug it.

chadwhitacre · 2012-12-08T02:18:05Z

Hypothesis: using a connection pool with more than one connection means that the state of the database is not reset after one test before a subsequent test hits the database using a different connection.
Test: Constrain the connection pool to a single connection.
Result: False. I constrained the size of the connection pool by passing maxconn=1 to PostgresManager in wireup, and I confirmed that that worked to constrain the pool size by inspecting context.db.pool._pool in the failing test. The test still failed.

chadwhitacre · 2012-12-08T02:33:58Z

Hmmm ... I inspected context.db.pool._pool after taking out the maxconn argument, and the pool still had but one connection in it. I seem to recall now that it is indeed filled dynamically.

Perhaps the failure comes from reusing the connection? Perhaps the occasional success in the current case is correlated to a second connection existing in the pool? Perhaps the failures in builds 302 and 295 above correlate with the opposite?

chadwhitacre · 2012-12-08T02:34:56Z

More info: If I drop to pdb in the failing test, and immediately continue, the test passes. A timing bug, indeed.

chadwhitacre · 2012-12-08T02:46:27Z

More info: Now it's failing consistently again. Does Postgres still have connection open?

chadwhitacre · 2012-12-08T02:49:23Z

I deleted all logs so that I could capture clean logs for a case. Now it's working again but logfile hasn't been recreated.

chadwhitacre · 2012-12-08T03:03:47Z

I've captured logfiles of working and non-working runs but an initial inspection is not revealing the issue to me.

chadwhitacre · 2012-12-08T03:04:38Z

More info: If I'm not mistaken, it's always the same test that fails, when it fails, no matter what other tests are run with it. I'm testing a single test script with five tests.

chadwhitacre · 2012-12-08T03:05:21Z

Hypothesis: The test suite starts failing a definite number of test runs after it starts working again.
Test: Whenever it starts working again, count the number of times until it starts breaking.
Result: False. See below. We've seen 1 and 4 and 5 runs before failure.

chadwhitacre · 2012-12-08T03:07:31Z

Hypothesis: Restarting the postgres server makes it start working again.
Test: Restart the postgres server. Try again.
Result: False. I restart the server and it still fails. Repeatedly.

chadwhitacre · 2012-12-08T03:10:12Z

More info: I've been kind of watching. I really think it's the same test that fails every time.

chadwhitacre · 2012-12-08T03:11:58Z

Hypothesis: The bug is in the new get_transaction method I added to postgres.py.
Test: Refactor Participant.absorb to use get_connection instead.
Result: False. It started working again, and then broke on the sixth run.

chadwhitacre · 2012-12-08T03:16:25Z

More info: I trimmed up the tests to the failing test and the one after it. It started failing again on the fifth run.

chadwhitacre · 2012-12-08T03:19:08Z

More info: I trimmed up the tests to the single failing test. It started failing again on the second run!

chadwhitacre · 2012-12-08T03:20:19Z

More info: It seems to alternate between success and failure with successive runs. Still on single failing test.

chadwhitacre · 2012-12-08T03:26:49Z

Hypothesis: When the failing test is the only one in the test suite, it alternately succeeds and fails.
Test: Run this script and inspect the output.
Result: True! :D

.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X.X

chadwhitacre · 2012-12-08T03:42:55Z

Hunch: The problem is with the way data is cleared between tests.

chadwhitacre · 2012-12-08T03:45:27Z

More info: Oh wow. I added some print statements to the __exit__ method in gittip.testing:Context, where the tables are cleared of data. Now it's working?!

chadwhitacre · 2012-12-08T03:51:58Z

I modified hammer.py to restart the postgres server before running the tests.

chadwhitacre · 2012-12-08T03:54:13Z

Now I've surpressed the restart. It's failing again ...

chadwhitacre · 2012-12-08T03:54:49Z

All tests are on. It appears to be periodic again!

......X.......X.......X.......X.......X.......X...

chadwhitacre · 2012-12-08T03:56:10Z

I turned the restart back on, and it's still failing periodically.

....X.......X.......X.......X.......X.......X.....

It's failing every eighth test run.

chadwhitacre · 2012-12-08T04:00:26Z

I trimmed down to one test in the suite instead of six and it just passed 50 times.

[gittip] $ ./hammer.py 50
waiting for server to shut down.... done
server stopped
server starting
..................................................
[gittip] $

chadwhitacre · 2012-12-08T04:29:03Z

Okay, it looks like I can trigger this by toggling the test that fails in the middle of the transaction in Participant.absorb:

[gittip] $ ./hammer.py 50 -s
waiting for server to shut down.... done
server stopped
server starting
...X.......X.......X.......X.......X.......X......
[gittip] $ ./hammer.py 50
..................................................
[gittip] $ ./hammer.py 50
.X.......X.......X.......X.......X.......X.......X
[gittip] $ ./hammer.py 50
..................................................
[gittip] $ ./hammer.py 50
..................................................
[gittip] $ ./hammer.py 50
.......X.......X.......X.......X.......X.......X..
[gittip] $

chadwhitacre · 2012-12-08T04:31:37Z

Restarting postgres seems to have no effect:

[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24 -s
Password:
waiting for server to shut down..... done
server stopped
server starting
.......X.......X.......X
[gittip] $ ./hammer.py 24 -s
waiting for server to shut down.... done
server stopped
server starting
.......X.......X.......X
[gittip] $ ./hammer.py 24 -s
waiting for server to shut down.... done
server stopped
server starting
.......X.......X.......X
[gittip] $

chadwhitacre · 2012-12-08T04:36:46Z

[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $ ./hammer.py 24
.......X.......X.......X
[gittip] $

chadwhitacre · 2012-12-08T04:40:09Z

Here are the two tests that, together, trigger this bug:

def test_alice_ends_up_tipping_bob_two_dollars():
    tips = [ ('alice', 'bob', 1)
           , ('alice', 'carl', 1)
            ]
    with tip_graph(*tips) as context:
        Participant('bob').absorb('carl')
        expected = Decimal('2.00')
        actual = context.diff()['tips']['inserts'][0]['amount']
        assert actual == expected, actual

def test_absorbing_unknown_fails():
    tips = [ ('alice', 'bob', 1)
           , ('alice', 'carl', 1)
            ]
    with tip_graph(*tips) as context:
        assert_raises(IntegrityError, Participant('bob').absorb, 'jim')
        actual = context.diff()
        assert actual == {}, actual

Either of those tests by itself is fine. When they're both present, the test run fails every eighth run, and always on the two_dollars test.

(Why was it every other run above?)

chadwhitacre · 2012-12-08T04:41:07Z

So this is good, actually. :)

I wrote the get_transaction method for this Participant.absorb method, and the unknown_fails test is in fact designed to test this kind of interaction.

chadwhitacre · 2012-12-08T04:46:12Z

Hypothesis: The transaction is not getting rolled back as desired for the exception case.
Test: Log the commit/rollback under get_transaction and inspect the output.
Result: False. The commits and rollbacks are firing as expected.

chadwhitacre · 2012-12-08T04:58:43Z

Hypothesis: The sort order of context.diff() is varying.
Test: Log the output of context.diff() in the two_dollars test and inspect it.
Result: True. I printed the keys used for sorting in gittip.testing:Context._diff. They're varying somehow that's relevant.

chadwhitacre · 2012-12-08T05:04:44Z

The errors in build 302 and 295 smell a lot like this, too.

chadwhitacre · 2012-12-08T05:26:18Z

I'm pretty sure the periodicity is coming from Python's dict ordering, not from Postgres. I'm not sure whether Postgres sequences are absolutely guaranteed to be monotonically increasing (as this StackOverflow comment suggests), but I certainly don't expect it to vary on the eights. However, for Python dict ordering to vary in some multiple of eight sounds much more plausible, given that it's affected by byte padding and whatnot blah blah blah.

chadwhitacre · 2012-12-08T05:27:25Z

Which is to say that simply sorting the dicts that we iterate over in Context._diff to produce the lists of inserts, updates, and deletes should be quiet sufficient for our test infrastructure.

chadwhitacre · 2012-12-08T05:33:18Z

The __version__ failure in 302 was a legitimate bug, due to this changeset. The rest of the failures are the same as in 295. Since Context.diff is implicated in all five errors, I'm saying those errors are due to the bug uncovered here.

chadwhitacre · 2012-12-08T05:41:13Z

[gittip] $ ./hammer.py 24
........................
[gittip] $ ./hammer.py 128
.............................................................................................
...................................
[gittip] $ ./hammer.py 256
.............................................................................................
.............................................................................................
......................................................................
[gittip] $

chadwhitacre mentioned this issue Dec 8, 2012

skip_expensive_tests #414

Closed

chadwhitacre closed this as completed in b1ebc3d Dec 8, 2012

chadwhitacre added a commit that referenced this issue Dec 8, 2012

Now I'm all sort-happy (#413)

fa48819

chadwhitacre mentioned this issue Dec 8, 2012

use an ORM #129

Closed

figure out why test suite is randomly failing #413

figure out why test suite is randomly failing #413

Comments

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012

chadwhitacre commented Dec 8, 2012