Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iroh-net): Fix a hot-loop when the probes time out #2699

Merged
merged 1 commit into from
Sep 5, 2024

Conversation

flub
Copy link
Contributor

@flub flub commented Sep 5, 2024

Description

When all the probes time out the reportgen actor went into a hot-loop.
We slow this down by resetting the timer to be in the future again.
This way the actor loop can finish normally from the probes without
entering a hot-loop. The actual timeout doesn't matter too much,
normally it should not happen twice as the actor should be finished by
the time another PROBES_TIMEOUT expires.

Breaking Changes

None

Notes & open questions

Fixes #2684.

I've never been very happy with the mainloop of the reportgen actor.
There is too much manual state-tracking also with the way
self.outstanding_tasks is structured. Also the way that
Message::ProbeWouldHelp exists is not nice, the actor should be able
to abort any probes on it's own when they are no longer needed. Maybe
I should sit down again sometime and think better about how to
structure this. Nevertheless, this is a good fix for a real problem.

I'm not sure it's possible to write tests for this in a reasonable way
without manipulating the state entirely artificially. Another reason
that this logic would be better expressed with more typesystem help.

Change checklist

  • Self-review.
  • [ ] Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • [ ] All breaking changes documented.

When all the probes time out the reportgen actor went into a hot-loop.
We slow this down by resetting the timer to be in the future again.
This way the actor loop can finish normally from the probes without
entering a hot-loop.  The actual timeout doesn't matter too much,
normally it should not happen twice as the actor should be finished by
the time another PROBES_TIMEOUT expires.
Copy link

github-actions bot commented Sep 5, 2024

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2699/docs/iroh/

Last updated: 2024-09-05T11:02:54Z

@flub flub added this pull request to the merge queue Sep 5, 2024
Merged via the queue into main with commit 874030a Sep 5, 2024
28 checks passed
@flub flub deleted the flub/probe-timeout-flood branch September 5, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

weird occurrence: WARN flood at startup: probes timed out
2 participants