Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MonoVertex pods often unhealthy in e2e test #314

Closed
juliev0 opened this issue Oct 2, 2024 · 4 comments · Fixed by #336
Closed

MonoVertex pods often unhealthy in e2e test #314

juliev0 opened this issue Oct 2, 2024 · 4 comments · Fixed by #336
Assignees
Labels
bug Something isn't working

Comments

@juliev0
Copy link
Collaborator

juliev0 commented Oct 2, 2024

Describe the bug
I'm not sure that this is any issue on our side, but would be worth investigating. Ultimately, could be something to hand over to Numaflow team to look at after some analysis on our side.

I was seeing that the MonoVertex pod was in a crash loop at the very end of the e2e test. I'm not sure if it's consistent or not, but I've seen it more than once. (Perhaps it's okay and it eventually fixes itself?)

This is the CI log from the test I ran locally:
ci.log.txt
These are the outputs from tests/e2e/outputs directory:
output.zip

If you look at outputs/resources/monovertexrollouts/pods you can see many Pods in there, which seems to indicate that the Pods restarted a lot.

To Reproduce
Steps to reproduce the behavior:

  1. DATA_LOSS_PREVENTION=true make start
  2. DATA_LOSS_PREVENTION=true make test-e2e

I assume this also happens for DATA_LOSS_PREVENTION=false, but I didn't try it.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

@juliev0 juliev0 added the bug Something isn't working label Oct 2, 2024
@juliev0 juliev0 added this to the 0.5 Enhance CRUD milestone Oct 2, 2024
@juliev0
Copy link
Collaborator Author

juliev0 commented Oct 15, 2024

Hey @chandankumar4 - I unassigned this from you. Instead, I'll try running it again and since Sidhant has now run our e2e himself locally, he could be the one to look at it if it's occurring.

@juliev0
Copy link
Collaborator Author

juliev0 commented Oct 15, 2024

Just re-ran this locally. It's after we update the MonoVertexRollout that the Monovertex is in a crash loop with this error:

jvogelman@macos-VF3V14X2QJ controller % k logs test-monovertex-rollout-mv-0-x4p8j  
2024-10-15T16:03:04.621585Z  INFO monovertex::server_info: Server info file: ServerInfo { protocol: "uds", language: "java", minimum_numaflow_version: "", version: "0.6.0", metadata: Some({}) }
2024-10-15T16:03:04.623577Z  INFO monovertex::server_info: Version_info: VersionInfo { version: "latest+unknown", build_date: "1970-01-01T00:00:00Z", git_commit: "", git_tag: "", git_tree_state: "", go_version: "unknown", compiler: "", platform: "linux/x86_64" }
2024-10-15T16:03:04.623761Z  WARN monovertex::server_info: Failed to get the minimum numaflow version, skipping numaflow version compatibility check
2024-10-15T16:03:04.625997Z  WARN monovertex::startup: Error waiting for source server info file: ServerInfoError("SDK version 0.6.0 must be upgraded to at least 0.8.0, in order to work with the current numaflow version")
2024-10-15T16:03:04.626288Z ERROR monovertex: Application error: ForwarderError("Error waiting for server info file")
2024-10-15T16:03:04.626458Z  INFO monovertex: Gracefully Exiting...

@juliev0
Copy link
Collaborator Author

juliev0 commented Oct 15, 2024

Hey @dpadhiar - not super high priority, but would be good to fix the e2e test so that after updating MonoVertexRollout, the MonoVertex Pod is not in a crash loop (see log above)

@dpadhiar
Copy link
Contributor

Hey @dpadhiar - not super high priority, but would be good to fix the e2e test so that after updating MonoVertexRollout, the MonoVertex Pod is not in a crash loop (see log above)

I see, looks like the version I change the upgrade to (from stable to 0.6.0) causes an issue. Will change that soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants