Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Integration tests are failing intermittently #268

Closed
dblock opened this issue Jul 27, 2022 · 4 comments · Fixed by #304
Closed

[BUG] Integration tests are failing intermittently #268

dblock opened this issue Jul 27, 2022 · 4 comments · Fixed by #304

Comments

@dblock
Copy link
Member

dblock commented Jul 27, 2022

What is the bug?

See https://github.com/opensearch-project/opensearch-js/runs/7454554035?check_suite_focus=true, currently failing on main.

What is the expected behavior?

Tests to pass.

Do you have any additional context?

Red herring in #266 (comment)

@kavilla
Copy link
Member

kavilla commented Aug 5, 2022

Re-ran it and success. Indicating that these tests are flaky, but they've seem pretty consistent. So might have been a fluke are some availability issue.

@dblock dblock changed the title [BUG] Integration tests are failing [BUG] Integration tests are failing intermittently Aug 7, 2022
@nhtruong
Copy link
Collaborator

nhtruong commented Oct 3, 2022

I've successfully recreated this error on my local env!
Since all flaky workflows run integration tests, my focus were on the 4 integration helper tests. All of these tests read from stackoverflow.ndjson as a stream, which is similar to this issue on node-tap repo. Intentionally making the stream fail (like giving it a wrong path) will result in identical error message that we've seen on the failed jobs:

theotr@88665a51fc3f opensearch-js % tap test/integration/helpers-secure/search.test.js -C
TAP version 13
(node:9350) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
(Use `node --trace-warnings ...` to show where the warning was created)
# Subtest: test/integration/helpers-secure/search.test.js
    1..0 # no tests found
not ok 1 - test/integration/helpers-secure/search.test.js # time=30010.195ms
  ---
  env: {}
  file: test/integration/helpers-secure/search.test.js
  timeout: 30000
  command: /usr/local/Cellar/node/18.9.1/bin/node
  args:
    - test/integration/helpers-secure/search.test.js
  stdio:
    - 0
    - pipe
    - 2
  cwd: /Users/theotr/IdeaProjects/opensearch-js
  exitCode: null
  signal: SIGTERM
  ...

1..1
# failed 1 test
# time=30043.437ms

@nhtruong
Copy link
Collaborator

nhtruong commented Oct 3, 2022

It is actually the OS Instance in the container that caused this issue, even though an error in the file stream will throw the exact same timeout/no-tests error. Configurations that failed had the following printout from the container's logs:

docker logs opensearch_opensearch_1
Killing opensearch process 10
Killing performance analyzer process 11

Even though the container still appeared to be running

docker ps -a;
CONTAINER ID   IMAGE                   COMMAND                  CREATED          STATUS          PORTS                                                                     NAMES
ad8261a2bc82   opensearch_opensearch   "./opensearch-docker…"   41 seconds ago   Up 40 seconds   9300/tcp, 9600/tcp, 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9650/tcp   opensearch_opensearch_1
curl localhost:9200/_cat/health
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (56) Recv failure: Connection reset by peer
Error: Process completed with exit code 56.

@nhtruong
Copy link
Collaborator

nhtruong commented Oct 4, 2022

Those "Killing process" messages came from opensearch-build repo. It's most likely that said script encountered an error and trigger the trap and killed these 2 processes. Others have encountered this bug in opensearch-build as well.

For now, I think the best course of action is restarting the container whenever we see the those messages in the logs (I've tested it, and it works great!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants