Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lower library interface timeout for remote config tests #2805

Merged
merged 8 commits into from
Jul 29, 2024

Conversation

rochdev
Copy link
Member

@rochdev rochdev commented Jul 23, 2024

Motivation

Waiting 100 seconds seems like a lot, especially since it should be possible to get the remote config from client libraries in milliseconds.

Changes

Lower library interface timeout for remote config tests.

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes (if something not related to your task is failing, you can ignore it)
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner. We're working on refining the codeowners file quickly.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • If PR title starts with [<language>], double-check that only <language> is impacted by the change
  • No system-tests internal is modified. Otherwise, I have the approval from R&P team
  • CI is green, or failing jobs are not related to this change (and you are 100% sure about this statement)
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added (or removed)?

@rochdev
Copy link
Member Author

rochdev commented Jul 24, 2024

I'm not sure what this is actually waiting for. It seems that waiting for 20s doesn't work with a poll interval of 2s, but it works with a polling interval of 1s, and the previous values were timeout of 100s with a polling interval of 5s. Does the timeout need to be 20x the polling interval? Is this checking for 20 remote config requests? It's very unclear to me why it needs to wait for so long given that we can configure the polling interval all the way down to 1ms if needed.

cc @cbeauchesne

@rochdev rochdev changed the title lower library interface timeout lower library interface timeout for remote config tests Jul 24, 2024
@rochdev
Copy link
Member Author

rochdev commented Jul 24, 2024

Failures are on tests that are not affected by this change.

@rochdev rochdev marked this pull request as ready for review July 24, 2024 16:04
@rochdev rochdev requested a review from a team as a code owner July 24, 2024 16:04
Copy link
Collaborator

@cbeauchesne cbeauchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you compute the time gain with the change of DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS ?

Because it's far away from what we have in real condition, and the purpose of end-to-end is to be as close as possible from real conditions. We can deviate a little bit from this ideal line, but it must be worthy.

I've updated your branch to fix failures

Copy link
Collaborator

@cbeauchesne cbeauchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, the CI seems happy.

But I'd really like to know what is the gain of DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS=0.25 before approving.

The change on library_interface_timeout is totally legit IMO.

@rochdev
Copy link
Member Author

rochdev commented Jul 25, 2024

Could you compute the time gain with the change of DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS ?

Time gain is pretty easy to calculate as it's literally the timeout, so it goes from 100 seconds to 5 seconds.

the purpose of end-to-end is to be as close as possible from real conditions

It's still real condition, just faster. The fact that we're waiting 5 seconds every time in real environments is only so that we don't overload the backend, but in an isolated test there is no reason not to do this faster.

@cbeauchesne
Copy link
Collaborator

Time gain is pretty easy to calculate as it's literally the timeout, so it goes from 100 seconds to 5 seconds.

I was speaking about DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS, not library_interface_timeout

@rochdev
Copy link
Member Author

rochdev commented Jul 25, 2024

I was speaking about DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS, not library_interface_timeout

Oh, well without lowering the poll interval, the remote config won't have been fetched before the timeout. As to why it needs to be ~20x lower than the timeout, I have no idea and the commit history was lost for that file so I don't know who could answer it. At the end of the day, the ratio is the same as before and everything is passing 🤷‍♂️

Copy link
Member

@simon-id simon-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but not I'm not part of system-tests-core team

@pawelchcki
Copy link
Collaborator

Charles has been mostly OK with this change and Simon approved it.
I can merge it - but.... If nightly tests blow up - I will come back here ;)

@pawelchcki pawelchcki self-requested a review July 29, 2024 14:34
@rochdev rochdev merged commit 0a85fb0 into main Jul 29, 2024
311 of 328 checks passed
@rochdev rochdev deleted the library-interface-timeout-15 branch July 29, 2024 17:26
@cbeauchesne
Copy link
Collaborator

cbeauchesne commented Aug 5, 2024

I'm not ok with this merge. I've explicitly asked here and here a vision of the time gained with this change.

It's still real condition, just faster

So it's not the real condition. As explained many times, being as close as possible from real conditions is a golden rules. We can slighlty deviate from it, but only with a good reason. Faster CI time is definitly a good reason, but iif the gain is consequent. So far, we don't know if the gain in consequent.

=> I'm rverting the change on DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS. If you want to change it, provide visibility on the gain.

cbeauchesne added a commit that referenced this pull request Aug 5, 2024
cbeauchesne added a commit that referenced this pull request Aug 19, 2024
Fix interaction between interface timeout and RC POLL INTERVAL + Partial revert #2805
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants