Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly cleanups #4284

Merged
merged 8 commits into from
Jan 4, 2021
Merged

Weekly cleanups #4284

merged 8 commits into from
Jan 4, 2021

Conversation

cdecker
Copy link
Member

@cdecker cdecker commented Dec 18, 2020

Just the usual bunch of drive-by fixes, mostly while trying to get Cirrus-CI to pass.

Changelog-None

We sometimes have very specific sequences of tx broadcasts and blocks
being generated to confirm them. If the confirmation is missed the
test can completely get out of sync. Make debugging this easier by
logging what we confirmed.
Useful if we want to debug a bit better
Both my machine and apparently the CI tester machines regularly run
into issues with load on the system, causing timeouts (and
unresponsiveness). The throttler throttles the speed with which new
instances of c-lightning get started to avoid overloading. Since the
plugin used for parallelism when testing spawns multiple processes we
need to lock on the fs. Since we have that file open already, we'll
also write a couple of performance metics to it.
I stumbled over this in a test run and it seems benign.
We were getting bad gossip because some nodes discarded the channel
announcement for being in the future. This is because the node was, at
that time, below the confirmation height. It'd then discard the
followup messages because not preceded by an announcement, and getting
upset about that.
This was causing the following error

```
Exception in thread Thread-553:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/cirrus-ci-build/contrib/pyln-testing/pyln/testing/utils.py", line 232, in tail
    self.err_logs.append(line.rstrip().decode('UTF-8', 'replace')).rstrip()
AttributeError: 'NoneType' object has no attribute 'rstrip'

[gw5] [ 33%] FAILED tests/test_misc.py::test_bitcoin_failure
```

Notice the second call to `.rstrip()` on the return value of `.append()`
The CI regularly trips up on this very taxing test (100 nodes) so only
run it if we have the required horsepower.
Copy link
Collaborator

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a question about the throttler

delay = time.time() - start_time
with open("/tmp/ltest-throttler.csv", "a") as f:
f.write("{}, {}, {}, {}\n".format(time.time(), self.load(), self.target, delay))
self.current_load = 100 # Back off slightly to avoid triggering right away
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read this comment a few times, but i still don't get it 😅 . What would a < 75 load would trigger ? And why setting it to 100 after getting each node would avoid the trigger ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that if we leave the current_load at 75 percent a followup call, e.g., another lightningd wanting to start, could end up seeing 75% and decide to immediately schedule its startup (maybe the first one hasn't had time to start up and push the load up yet), thus we might end up clearing the queue despite only the first one getting the all-clear.

Setting this to 100% means the ewma smoothing needs a couple of rounds before allowing the next one to go through, avoiding the flooding scenario above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i had it the other way around, thanks!

Copy link
Collaborator

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 336e972

@cdecker cdecker merged commit 5b3af00 into ElementsProject:master Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants