Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat: Fix flaky test case on macOS #39860

Merged
merged 5 commits into from
Jun 13, 2024

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Jun 11, 2024

Proposed commit message

This PR fixes following test cases which are flaky for macOS:

  • tests/system/test_harvester.py::Test::test_debug_reader
  • tests/system/test_crawler.py::Test::test_tail_files
  • tests/system/test_crawler.py::Test::test_encodings
  • tests/system/test_registrar.py::Test::test_restart_state_reset_ttl_no_clean_inactive
  • tests/system/test_registrar.py::Test::test_restart_state_reset
  • tests/system/test_registrar.py::Test::test_restart_state
  • tests/system/test_registrar.py::Test::test_registry_file_update_permissions
  • tests/system/test_shutdown.py::Test::test_shutdown

On MacOS, the FQDN lookup takes some time (~5 seconds) before signal handlers are set up

  • the lingering FQDN lookup code is here
    To make sure that the tests are passed, we need to increase the timeout by few seconds in each of the failing cases.

You can view the full discussion https://elastic.slack.com/archives/C047NDNGUMU/p1717091056290869.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Fixes #39613

@VihasMakwana VihasMakwana added flaky-test Unstable or unreliable test cases. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Jun 11, 2024
@VihasMakwana VihasMakwana requested a review from a team as a code owner June 11, 2024 14:48
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 11, 2024
Copy link
Contributor

mergify bot commented Jun 11, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @VihasMakwana? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@VihasMakwana VihasMakwana requested a review from rdner June 11, 2024 14:49
@VihasMakwana VihasMakwana changed the title filebeatL Fix flaky test case on macOS filebeat: Fix flaky test case on macOS Jun 11, 2024
Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question: Do we need those tests using the FQDN? Could we just disable it as those tests are not testing the FQDN and have the suit run faster?

filebeat/tests/system/test_registrar.py Outdated Show resolved Hide resolved
filebeat/tests/system/test_registrar.py Outdated Show resolved Hide resolved
@VihasMakwana
Copy link
Contributor Author

A general question: Do we need those tests using the FQDN? Could we just disable it as those tests are not testing the FQDN and have the suit run faster?

Hmm, I'm not sure if we can selectively disable FQDN. These integration test cases try to run the filebeat binary and test for output.
Also, this problem is specific to Darwin based systems.
On other systems, the wait calls should resolve quickly.

@VihasMakwana VihasMakwana requested a review from belimawr June 12, 2024 11:54
@belimawr
Copy link
Contributor

A general question: Do we need those tests using the FQDN? Could we just disable it as those tests are not testing the FQDN and have the suit run faster?

Hmm, I'm not sure if we can selectively disable FQDN. These integration test cases try to run the filebeat binary and test for output. Also, this problem is specific to Darwin based systems. On other systems, the wait calls should resolve quickly.

The Elastic-Agent has got a flag to select if we're using FQDN or not, I don't fully recall how that works for a standalone Beat, it might be possible.

But if the OS is slow to respond regardless of how we query it, then there is not much we can do.

Anyway, the PR looks good now!

@pierrehilbert pierrehilbert merged commit f9fec1e into elastic:main Jun 13, 2024
18 checks passed
@pierrehilbert pierrehilbert added the backport-v8.14.0 Automated backport with mergify label Jun 13, 2024
mergify bot pushed a commit that referenced this pull request Jun 13, 2024
* fix: fix a flaky test on macos

* fix: fix more such test cases

* fix: only update ignore_older

* fix: also fix flaky test_restart_state

* fix: fix CI

(cherry picked from commit f9fec1e)
pierrehilbert pushed a commit that referenced this pull request Jun 14, 2024
* fix: fix a flaky test on macos

* fix: fix more such test cases

* fix: only update ignore_older

* fix: also fix flaky test_restart_state

* fix: fix CI

(cherry picked from commit f9fec1e)

Co-authored-by: VihasMakwana <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.14.0 Automated backport with mergify flaky-test Unstable or unreliable test cases. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filebeat (Python) Unit Tests are flaky on Buildkite macOS x86_64 agents especially on 8.13 branch
5 participants