Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs not printed when there is an error during shutdown #1558

Closed
skunert opened this issue Nov 28, 2023 · 11 comments · Fixed by #1566
Closed

Logs not printed when there is an error during shutdown #1558

skunert opened this issue Nov 28, 2023 · 11 comments · Fixed by #1566
Assignees
Labels
bug Something isn't working

Comments

@skunert
Copy link
Contributor

skunert commented Nov 28, 2023

Issue Description

In some runs the links to the logs in loki are not printed.
See here: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4513620

It looks like this is due to some shutdown error (one pod could not be found).
This makes debugging of these jobs impossible, logs are also not under the artifacts folder.

Steps to reproduce the issue

Describe the results you received

Logs not available

Describe the results you expected

Logs available

Zombienet version

1.3.83

Provider

Kubernetes

Provider version

CI

Upstream Latest Release

Yes

Additional environment details

CI

Additional information

Only occasionally

Screenshots

No response

@skunert skunert added the bug Something isn't working label Nov 28, 2023
@pepoviola
Copy link
Collaborator

Hi @skunert, thanks for reporting. Yes, in this case there was an unhandled exception and the process crash. I will fix asap, as workaround until is fixed you can see the logs in loki by namespace
https://grafana.teleport.parity.io/goto/434r-pNSg?orgId=1

Thanks!!

@mrcnski
Copy link
Contributor

mrcnski commented Dec 10, 2023

@pepoviola I might be getting something similar. Zombienet never errors for me, even when the test is clearly wrong.

For example, I change some value in 0001-parachains-pvf.zndsl, let's say I set alice: reports node_roles is 3. AFAICT zombienet never exits, it just keeps running as if everything is fine. I tried changing various lines in various tests.

Reopening since I think this ticket is related.

 $ zombienet version
1.3.86

@mrcnski mrcnski reopened this Dec 10, 2023
@pepoviola
Copy link
Collaborator

Hi @mrcnski, thanks for reopen. Are you trying locally with the native provider or those changes are in CI? I will try to reproduce and fix asap.

Thanks!

@mrcnski
Copy link
Contributor

mrcnski commented Dec 11, 2023

Thanks @pepoviola, appreciate it! I'm trying locally with native. I wonder, is there a way to get logs out of zombienet to see what it's doing?

@mrcnski
Copy link
Contributor

mrcnski commented Dec 11, 2023

@pepoviola Could this be the issue? (changed from Promise.all in tag 1.3.85):

await Promise.allSettled(dumpsPromises);

I checked out 1.3.84 and I see errors now, but unfortunately I get stuck on the dreaded Error fetching metrics from loop.

@pepoviola
Copy link
Collaborator

Hey @mrcnski, yes I made that change to prevent the rejection off all dumps if one fails. I would check why doesn't resolve as expected or handle the rejection internally in the dump fn to prevent the issue using all

Error fetching metrics. This error is caused mostly when the node didn't start as expected. Can you check the logs of the node? We just make a get to the prometheus endpoint to check the readyness.

Thanks!!

@mrcnski
Copy link
Contributor

mrcnski commented Dec 11, 2023

Thanks @pepoviola! Looks like the metrics error was caused by the nodes missing the insecure flag I added - oops!

I'm back to not seeing any errors anymore. I mean, I added a ridiculous line like alice: parachain 2000 block height is at least 1000 within 2 seconds, shouldn't zombienet exit with an error? I can't seem to make it fail. I'm still on version 1.3.84.

@pepoviola
Copy link
Collaborator

Hey @mrcnski, great that now the metrics error is gone. Zombienet will continue working until the end of the last defined assertion and produce a report in the console. Are you having more assertion beside tha one? Can you send me the config and zndsl to reproduce?

Thanks!!

@mrcnski
Copy link
Contributor

mrcnski commented Dec 11, 2023

@pepoviola For a really simple case I've changed 0001-parachains-smoke-test.zndsl to something crazy like

Description: Smoke Test
Network: ./0001-parachains-smoke-test.toml
Creds: config

alice: parachain 100 is registered within 2 seconds
alice: parachain 100 block height is at least 1000000 within 3 seconds

and I run with

zombienet --provider=native spawn zombienet_tests/smoke/0001-parachains-smoke-test.toml 

Zombienet doesn't fail on this. I checked the logs of Alice and she's still alive doing stuff.

@pepoviola
Copy link
Collaborator

Hi @mrcnski, I can reproduce locally. In my end changing the value produce this output

image

Also, you are using the spawn command in your comment. That command only spawn the network and don't make any assertion. To run the test you should run something like

zombienet --provider=native test zombienet_tests/smoke/0001-parachains-smoke-test.zndsl

Thanks!!

@mrcnski
Copy link
Contributor

mrcnski commented Dec 13, 2023

Let's pretend this never happened.

image

@mrcnski mrcnski closed this as completed Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants