Address intermittently failing tests #9122

stinodego · 2023-05-30T13:28:46Z

There are some tests that fail occasionally:

We should look into these to fix them, or skip them temporarily.

mcrumiller · 2023-05-31T12:41:24Z

Ok, I looked into the test_polars_import and I've found the issue:

        if_err = f"Possible import speed regression; took {total_import_time//1_000}ms"
        assert total_import_time < 200_000, f"{if_err}\n{df_import}"

The total import time was taking too long. The test runs the following:

python -X importtime -c "import polars"

And looking at the output, we have this:

FAILED tests/unit/test_polars_import.py::test_polars_import - AssertionError: Possible import speed regression; took 205ms
  ┌───────────────────────────────────────────────────────────────┬──────────┬─────────────────┐
  │ import                                                        ┆ own_time ┆ cumulative_time │
  │ ---                                                           ┆ ---      ┆ ---             │
  │ str                                                           ┆ i32      ┆ i32             │
  ╞═══════════════════════════════════════════════════════════════╪══════════╪═════════════════╡
  │  polars                                                       ┆ 1298     ┆ 123721          │
  |                                      ...                                                   |
  │        email                                                  ┆ 469      ┆ 469             │
  │  site                                                         ┆ 64801    ┆ 205091          │
  │    sitecustomize                                              ┆ 232      ┆ 232             │
  │    _sysconfigdata_m_linux_x86_64-linux-gnu                    ┆ 734      ┆ 734             │
  │    coverage                                                   ┆ 341      ┆ 122808          │
  │      coverage.control                                         ┆ 2058     ┆ 122007          │
                   ...

The site import is separate from polars. Looking at the docs for the site module, we have this:

This module is automatically imported during initialization. The automatic import can be suppressed using the interpreter’s -S option.

Importing this module will append site-specific paths to the module search path and add a few builtins, unless -S was used. In that case, this module can be safely imported with no automatic modifications to the module search path or additions to the builtins. To explicitly trigger the usual site-specific additions, call the site.main() function.

So, for our import test, I believe we should use the -S option, since we're only concerned with how long polars takes to load, not the site module, which is outside of our control.

mcrumiller · 2023-05-31T12:45:31Z

I'm going to make a separate issue for the first test since I have a PR to fix it and don't want to close this whole issue.

ritchie46 · 2023-05-31T14:19:04Z

It is allowed to fail. It serves as a safeguard against import time regressions and there will be volatility.

mcrumiller · 2023-05-31T14:28:13Z

@ritchie46 the import time test is also testing the time it takes to import additional python modules; we should probably only look at the polars import time, no? It looks like we're getting false positives due to other modules taking longer than expected.

stinodego · 2023-05-31T14:30:13Z

I think the solution @mcrumiller is working on reduces volatility, which would be welcome.

Right now the false positive rate is too high - I just mindlessly rerun it without being worried about import times. So it's just noise right now.

mcrumiller · 2023-05-31T15:00:43Z

Yeah, this may take a little bit more research. I *think* the results from python -X importtime returns contiguous results related to individual packages, but I can't find that guarantee, and it's also hard to isolate polars alone. It also seems machine-dependent: when I run the command on my machine here, I get import times for io, zipimport. So one of the questions is: were these called by polars (I know io is used but zipimport is not), in which case we should probably include their load time.

Let me look a bit into this more and hopefully something will resolve.

alexander-beedie · 2023-06-01T13:18:40Z

I can address test_polars_import tonight, since that's one I added a little while ago (as an aside, one of the reasons it may fail locally a lot more often than in CI is because things can be a bit slower when running a debug build). Got back home last night, so am available once again... ;)

mcrumiller · 2023-06-01T13:30:08Z

@alexander-beedie I think the issue is that it's failing a lot more in CI than locally, at least that's what I've noticed. Might also be having a wailing fast machine helps...

stinodego · 2023-06-03T05:30:52Z

I think I know how to address the groupby one. It looks like the culprit are those temporary directories. Let's try to use pytest's tmp_path and see if that resolves some of our issues. I will pick this up later.

stinodego · 2023-06-04T10:49:48Z

Ah, looks like the test_streaming_groupby_ooc test actually doesn't use a temporary directory. So my idea does not affect this test.

That begs the question: where does this test write its data? Can we control where this is written somehow? The error states OSError: No such file or directory, so it must be related to this somehow.

@ritchie46 could you point me in the right direction here?

ritchie46 · 2023-06-04T16:59:17Z

That begs the question: where does this test write its data?

Because it test out-of-core. The algorithm spills to disk when we run out of memory. Here we force it to take that code path.

stinodego added ci Related to the continuous integration setup test Related to the test suite development labels May 30, 2023

This was referenced May 31, 2023

test_polars_import intermittently fails #9138

Closed

test(python): add -S option to prevent site import #9139

Closed

stinodego self-assigned this Jun 3, 2023

stinodego mentioned this issue Jun 4, 2023

test(python): Use pytest tmp_path #9206

Merged

stinodego removed their assignment Jun 4, 2023

This was referenced Jul 12, 2023

test(python,rust): Refactor failing test #9823

Merged

chore(python, rust): Remove stray arr.eval references #9821

Merged

stinodego added this to Backlog Jul 12, 2023

github-project-automation bot moved this to Untriaged in Backlog Jul 12, 2023

stinodego moved this from Untriaged to In progress in Backlog Jul 12, 2023

stinodego self-assigned this Jul 12, 2023

stinodego mentioned this issue Jul 12, 2023

test(python): Fix OOC tests #9833

Closed

stinodego added accepted Ready for implementation internal An internal refactor or improvement and removed development labels Jul 15, 2023

avimallu mentioned this issue Jul 17, 2023

feat(python): adds SQRT, CBRT, PI functions to SQLContext #9936

Merged

stinodego removed their assignment Jul 20, 2023

stinodego mentioned this issue Aug 14, 2023

test(python): Run all streaming tests on the same test runner #10469

Merged

stinodego closed this as completed in #10469 Aug 14, 2023

github-project-automation bot moved this from In progress to Done in Backlog Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address intermittently failing tests #9122

Address intermittently failing tests #9122

stinodego commented May 30, 2023 •

edited

Loading

mcrumiller commented May 31, 2023

mcrumiller commented May 31, 2023

ritchie46 commented May 31, 2023

mcrumiller commented May 31, 2023 •

edited

Loading

stinodego commented May 31, 2023

mcrumiller commented May 31, 2023

alexander-beedie commented Jun 1, 2023 •

edited

Loading

mcrumiller commented Jun 1, 2023

stinodego commented Jun 3, 2023

stinodego commented Jun 4, 2023

ritchie46 commented Jun 4, 2023

Address intermittently failing tests #9122

Address intermittently failing tests #9122

Comments

stinodego commented May 30, 2023 • edited Loading

mcrumiller commented May 31, 2023

mcrumiller commented May 31, 2023

ritchie46 commented May 31, 2023

mcrumiller commented May 31, 2023 • edited Loading

stinodego commented May 31, 2023

mcrumiller commented May 31, 2023

alexander-beedie commented Jun 1, 2023 • edited Loading

mcrumiller commented Jun 1, 2023

stinodego commented Jun 3, 2023

stinodego commented Jun 4, 2023

ritchie46 commented Jun 4, 2023

stinodego commented May 30, 2023 •

edited

Loading

mcrumiller commented May 31, 2023 •

edited

Loading

alexander-beedie commented Jun 1, 2023 •

edited

Loading