Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AllowedOutgoingLinksHostnames changes not always causing link status to change #762

Closed
hash-d opened this issue May 12, 2022 · 3 comments
Closed
Assignees

Comments

@hash-d
Copy link
Member

hash-d commented May 12, 2022

How to reproduce

TBD

The error can be reliably reproduced using the statusTestTable test on my fork at dh-policy-testing-2/test/integration/acceptance/custom/hello_policy/hostnames_test.go.

However I have not been able to create a minimum reproducer, yet.

Description

On a test in which the list of allowed hostnames will alternate between including or not the inter-router-host, an existing link is expected to come up when the router is included, and down when it is not.

This is not happening 100% of the time, using the test mentioned above. Instead, sometimes the system will not react to a changed policy. Both possibilities occur: links not going down when the host is removed, and links not going up when it is added.

Notes

  • When checking the policies with get policies outgoinglink while the error is happening, it reflects the installed policy (ie, even if the link status did not change, the policy engine reports the correct value).
@fgiorgetti fgiorgetti self-assigned this May 12, 2022
@hash-d
Copy link
Member Author

hash-d commented May 12, 2022

I have tried to reproduce the issue with a bash script, but I was not able to. You can check the code here:

https://gist.github.com/hash-d/151fbc9fc7df56c76c499e88b439052f

I can't tell what's special about the test where it is failing, as the workings are the same: change policy, run link status waiting for it to turn.

@hash-d
Copy link
Member Author

hash-d commented May 13, 2022

Discussing the issue with the developer, he noticed that the test was giving access only to the inter-router host, and not to the edge host as well. Adding the edge host to the policy made the test pass: the link went up and down as expected, depending on the policy that was defined.

However, that does not explain why, in the previous setup, the link would sometimes be up, even when the allowedOutgoingLinksHostnames clearly matched neither the inter-router nor the edge host.

I'm still unable to reproduce that situation out of the go testing code, however.

@hash-d
Copy link
Member Author

hash-d commented Jul 6, 2022

I have retested this as follows:

  • Checkout my fork/branch on commit dc39730, which's the one active at the time of opening this ticket
  • Change the code to skip the createTestTable (to save time) and activate the statusTestTable (where the test was failing)
  • Run the test against the most recent skupper binary and images

With that, the following tests failed:

  • first-dot-left-anchor
  • replaced-by-soft-dots
  • lots-of-dots
  • anchored
  • hardify-dots

These are all link-is-up type of tests, and as such they were expected to fail. The reason the test was not originally working was that it was getting only the inter-router host, and not the edge host as well. That's how the test works at this point in the commit history, so it's basically giving the wrong hostname on the policy.

All of the link-is-down type of tests worked fine. The only two link-is-up tests that did not fail as expected were as follows:

  • same
    • on a second run, with ENV_POST_POLICY_CHANGE_SLEEP is set to 10 seconds, the test failed as expected. At that point during test development, there were no GET checks, so the first test simply reported the current link status, before the policy system was able to detect the change and switch the link down.
  • last-dot-right-anchor
    • This was expected. The way this test works is that it takes the hostnames from a secret, and tries different transformations on it to generate the different 'hostnames' being tested. In this specific case, the hostname is formed by the characters to the right of the last dot. In a FQDN, that string would match both hostnames.

With that, I conclude that I could not reproduce a situation where the link would be up when it was truly expected to be down, or the other way around. Whether that's because of changes to the code fixing the original issue, I cannot tell at this point. I believe so, but I could not point to any specific commits that fixed the issue.

I have checked the logs for the last test I ran before opening this ticket, and there were failures on link-is-down type of tests, which was my concern at the time. Those are not happening at the moment, so I'm closing this ticket.

@hash-d hash-d closed this as completed Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants