Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling flaky tests #489

Closed
glatterf42 opened this issue Aug 1, 2023 · 1 comment · Fixed by #490
Closed

Handling flaky tests #489

glatterf42 opened this issue Aug 1, 2023 · 1 comment · Fixed by #490
Labels
ci Continuous integration enh New features & functionality

Comments

@glatterf42
Copy link
Member

For roughly the past month, I have collected data on flaky CI tests. The initial idea was to mark them as flaky, but as per pytest's docs on flaky tests, that should never be a long-term solution. Instead, tests should be (randomly) re-ordered, re-written for more atomic assertions, or split up into different groups to find the root cause of the flaky behavior and eliminate it. We will have to see when time permits this. For now, we could mark them as flaky to save us from re-running them manually.
Here are the flaky tests of this repository I gathered so far:

Flaky tests

Auxiliary

Some error message are common and so long that they would render the table even more complex than it already is. They are included here and referenced by their names in the table below.

Notebook cell timeout reticulate {#notebook-cell-timeout-reticulate}

nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 10 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
# Load reticulate, used to access the Python API from R
library(reticulate)

# Import ixmp and message_ix, just as in Python
ixmp <- import("ixmp")
-------------------

Notebook cell timeout import packages {#notebook-cell-timeout-import-packages}

nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 10 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
# load required packages
import pandas as pd
import ixmp
-------------------

Notebook cell timeout platform {#notebook-cell-timeout-platform}

nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 10 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
# launch the ix modeling platform using the local default database
mp <- ixmp$Platform()
-------------------

Runtime error DB connection {#runtime-error-db-connection}

RuntimeError: unhandled Java exception: 
Unable to obtain connection from database (jdbc:hsqldb:file:/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_multi_db_run0/mp2) for user 'ixmp': Database lock acquisition failure: lockFile: org.hsqldb.persist.LockFile@6486fbbd[file =/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_multi_db_run0/mp2.lck, exists=true, locked=false, valid=false, ] method: checkHeartbeat read: 2023-07-09 05:22:39 heartbeat - read: -862 ms.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL State  : S1000
Error Code : -451
Message    : Database lock acquisition failure: lockFile: org.hsqldb.persist.LockFile@6486fbbd[file =/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/test_multi_db_run0/mp2.lck, exists=true, locked=false, valid=false, ] method: checkHeartbeat read: 2023-07-09 05:22:39 heartbeat - read: -862 ms.

DB connection cannot be closed {#db-connection-cannot-be-closed}

AssertionError: assert 'Database connection could not be closed or was already closed' in ''
 +  where '' = CaptureResult(out='', err='').out

DB connection message wrong {#db-connection-message-wrong}

assert "connected to database 'jdbc:hsqldb:mem://ixmptest' (user: ixmp)..." in ''
 +  where '' = CaptureResult(out='', err='').out

Note that windows-latest-py3.10 had this additional information once:

where '' = CaptureResult(out='', err="2023-07-03 12:58:22,174  INFO at.ac.iiasa.ixmp.Platform:182 - closed the connection to database 'jdbc:hsqldb:mem://ixmptest'\r\n2023-07-03 12:58:22,180  INFO at.ac.iiasa.ixmp.Platform:165 - Welcome to the IX modeling platform!\r\n2023-07-03 12:58:22,180  INFO at.ac.iiasa.ixmp.Platform:166 -  connected to database 'jdbc:hsqldb:mem://ixmptest' (user: ixmp)...\r\n").out

Names are shortened to ixmp/tests as the starting directory. Grouping the tests by names shows commonalities between them. All notebook cell timeout originate on macos, while ubuntu and windows only struggle with DB connections. Windows in particular seems to have trouble closing the DB connection either too early or not at all.

Test name Error message Runners (# of occurrences if > 1)
test_tutorials.py:: test_R_transport_scenario Notebook cell timeout reticulate macos-latest-py3.7 (3)
test_tutorials.py:: test_R_transport Notebook cell timeout reticulate macos-latest-py3.7 (5), macos-latest-py3.11
test_tutorials.py:: test_py_transport Notebook cell timeout import packages macos-latest-py3.7
test_tutorials.py:: test_py_transport_scenario Notebook cell timeout import packages macos-latest-py3.7
test_tutorials.py:: test_R_transport_scenario Notebook cell timeout platform macos-latest-py3.11
------------------ ------------------ ------------------
------------------ ------------------ ------------------
test_access.py:: test_check_single_model_access ConnectionRefusedError: [Errno 61] Connection refused macos-latest-py3.7
------------------ ------------------ ------------------
------------------ ------------------ ------------------
test_integration.py::test_multi_db_run Runtime error DB connection macos-latest-py3.8, macos-latest-py3.10
------------------ ------------------ ------------------
------------------ ------------------ ------------------
backend/test_jdbc.py:: test_jvm_warn AssertionError: ResourceWarning("unclosed file <_io.BufferedReader name='/tmp/pytest-of-runner/pytest-0/test_read_excel_big0/output.xlsx'>") assert 1 == 0 where 1 = len(WarningsRecorder(record=True)) ubuntu-latest-py3.8
------------------ ------------------ ------------------
------------------ ------------------ ------------------
backend/test_jdbc.py:: test_close DB connection cannot be closed windows-latest-py3.7 (3), windows-latest-py3.8, windows-latest-py3.10 (2), windows-latest-py3.11 (2)
backend/test_jdbc.py:: test_connect_message DB connection message wrong windows-latest-py3.7 (3), windows-latest-py3.8, windows-latest-py3.10 (2), windows-latest-py3.11 (2)
@glatterf42 glatterf42 added enh New features & functionality ci Continuous integration labels Aug 1, 2023
glatterf42 added a commit to glatterf42/ixmp that referenced this issue Aug 2, 2023
@glatterf42 glatterf42 mentioned this issue Aug 2, 2023
2 tasks
glatterf42 added a commit to glatterf42/ixmp that referenced this issue Aug 4, 2023
@glatterf42
Copy link
Member Author

For future reference: the flaky tests on the backend seem to be related to pytest's capfd not capturing stdout and stderr reliably on Windows. See also pytest-dev/pytest#10843.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration enh New features & functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant