[Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky #22115

Abacn · 2022-06-30T16:56:26Z

What happened?

TImeout exception:

self = <apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocessesAndMultiWorkers testMethod=test_assert_that>

    def test_assert_that(self):
      # TODO: figure out a way for fn_api_runner to parse and raise the
      # underlying exception.
      with self.assertRaisesRegex(Exception, 'Failed assert'):
        with self.create_pipeline() as p:
>         assert_that(p | beam.Create(['a', 'b']), equal_to(['a']))
E         AssertionError: "Failed assert" does not match "Pipeline timed out waiting for job service subprocess."

There was a resolved ticket BEAM-9118 and resolved in #12633 . It's happening again. Seems occurs more frequently when jenkins are more busy.

Issue Priority

Priority: 2

Issue Component

Component: runner-py-direct

The text was updated successfully, but these errors were encountered:

Abacn · 2022-06-30T16:56:57Z

e.g. https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/5815/

ryanthompson591 · 2022-10-17T17:38:01Z

.take-issue

ryanthompson591 · 2022-10-18T16:52:23Z

I've hopefully fixed this in #23696.

However, I had two other theories on what could be going that I'm not sure about.

I'm going to note them here in case they turn out to be true:

It's possible _pick_unused_ports is running into a race case and picking an already used port and timing out. Ideally this would raise a different exception.
It's possible that multiple processes running on the same machine are having collisions in starting subprocesses on similar ports.

I think both these are unlikely and the most likely thing happening is these tests are slow (I was getting long 15 second unit tests on my local machine). Likely above PR is the solution.

Abacn · 2022-10-18T16:56:14Z

Thanks @ryanthompson591. Case 1 is likely. I observe that in most case the time needed for setup is small, much less than 30 seconds. If it is just a performance issue the time should have some distribution.

kennknowles · 2022-12-07T16:38:04Z

@ryanthompson591 are you actively working on this?

kennknowles · 2023-03-23T20:12:48Z

Haven't seen this flake in a while. Is it disabled or is it green now?

Abacn · 2023-03-23T20:19:25Z

It is green now. It has been migrated to https://ci-beam.apache.org/job/beam_PreCommit_Python_Runners_Cron/ and checked that test still running. We can close this now

Abacn added awaiting triage bug labels Jun 30, 2022

github-actions bot added direct P2 python runners labels Jun 30, 2022

Abacn mentioned this issue Jul 6, 2022

Fix pydoc rendering for annotated classes #22121

Merged

4 tasks

Abacn mentioned this issue Jul 14, 2022

[Bug]: Python Lots of fn runner test items cost exactly 5 seconds to run #22283

Open

tvalentyn added flake and removed awaiting triage labels Aug 4, 2022

damccorm added P1 and removed P2 labels Oct 4, 2022

github-actions bot assigned ryanthompson591 Oct 17, 2022

ryanthompson591 mentioned this issue Oct 18, 2022

Update portable runner test timeout #23696

Merged

4 tasks

tvalentyn mentioned this issue Oct 27, 2022

Reduce log spam of Py37PostCommit #23829

Merged

4 tasks

Abacn mentioned this issue Oct 27, 2022

Migrate BINARY, VARBINARY, CHAR, VARCHAR jdbc logical types to portable #23548

Merged

4 tasks

Abacn mentioned this issue Nov 9, 2022

Python TextIO Performance Test #23951

Merged

4 tasks

kennknowles unassigned ryanthompson591 Feb 13, 2023

Abacn closed this as completed Mar 23, 2023

github-actions bot added this to the 2.47.0 Release milestone Mar 23, 2023

tvalentyn added the done & done Issue has been reviewed after it was closed for verification, followups, etc. label Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky #22115

[Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky #22115

Abacn commented Jun 30, 2022

Abacn commented Jun 30, 2022

ryanthompson591 commented Oct 17, 2022

ryanthompson591 commented Oct 18, 2022

Abacn commented Oct 18, 2022

kennknowles commented Dec 7, 2022

kennknowles commented Mar 23, 2023

Abacn commented Mar 23, 2023

[Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky #22115

[Bug]: apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses is flaky #22115

Comments

Abacn commented Jun 30, 2022

What happened?

Issue Priority

Issue Component

Abacn commented Jun 30, 2022

ryanthompson591 commented Oct 17, 2022

ryanthompson591 commented Oct 18, 2022

Abacn commented Oct 18, 2022

kennknowles commented Dec 7, 2022

kennknowles commented Mar 23, 2023

Abacn commented Mar 23, 2023