Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always print connectivity report #3677

Merged
merged 1 commit into from
Mar 13, 2024
Merged

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Mar 13, 2024

This is printed every 10 minutes, I see no reason why it shouldn't be in all the logs, it would give us valuable information about what is going on with node connectivity when validators come-back to us to report issues.

This is printed every 10 minutes, I see no reason why it shouldn't be in
all the logs, it would give us valuable information about what is going
on with node connectivity when validators come-back to us to report
issues.

Signed-off-by: Alexandru Gheorghe <[email protected]>
@sandreim
Copy link
Contributor

This can get pretty big if there are many unconnected authorities. Are we interested in exactly these? Because otherwise the metrics are always available per protocol - polkadot_parachain_peer_count

@eskimor
Copy link
Member

eskimor commented Mar 13, 2024

This can get pretty big if there are many unconnected authorities. Are we interested in exactly these? Because otherwise the metrics are always available per protocol - polkadot_parachain_peer_count

Right, but then there is also some network issue which would justify a bit of verbosity every 10 minutes. If we assume that these logs would be 1Meg in the worst case, then it would still take us 7 days to fill a GB. Sounds sensible for today's hardware.

@alexggh alexggh added the R0-silent Changes should not be mentioned in any release notes label Mar 13, 2024
@sandreim
Copy link
Contributor

This can get pretty big if there are many unconnected authorities. Are we interested in exactly these? Because otherwise the metrics are always available per protocol - polkadot_parachain_peer_count

Right, but then there is also some network issue which would justify a bit of verbosity every 10 minutes. If we assume that these logs would be 1Meg in the worst case, then it would still take us 7 days to fill a GB. Sounds sensible for today's hardware.

Your estimations might be right, but my question is mostly if we do this specifically to know the unconnected authorities or we are interested mainly in the connectivity ratio.

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end it shouldn't hurt.

@alexggh
Copy link
Contributor Author

alexggh commented Mar 13, 2024

This can get pretty big if there are many unconnected authorities. Are we interested in exactly these? Because otherwise the metrics are always available per protocol

Those metrics are good to tell you something went wrong, but they will fail to tell which nodes are not connecting or some other stuff.

The flow I'm actually trying to optimise here, is people coming to us observing high level stuff is not working and providing us the log files they have, but there is literally nothing in there to tell you what went wrong. So at minimum we need to understand how well connected that node was by default.

Right, but then there is also some network issue which would justify a bit of verbosity every 10 minutes. If we assume that these logs would be 1Meg in the worst case, then it would still take us 7 days to fill a GB. Sounds sensible for today's hardware.

My thoughts as well given how rarely we print this, I don't thing there is any danger there.

@eskimor eskimor added this pull request to the merge queue Mar 13, 2024
@alexggh
Copy link
Contributor Author

alexggh commented Mar 13, 2024

Your estimations might be right, but my question is mostly if we do this specifically to know the unconnected authorities or we are interested mainly in the connectivity ratio.

The unnconnected list proved useful here #3314 (comment), so yeah I think we want it.

@alexggh
Copy link
Contributor Author

alexggh commented Mar 13, 2024

In the end it shouldn't hurt.

famous last words :D

Merged via the queue into master with commit 878b5dd Mar 13, 2024
133 of 136 checks passed
@eskimor eskimor deleted the alexaggh/dump_connectivity branch March 13, 2024 14:52
@burdges
Copy link

burdges commented Mar 13, 2024

Always possible to make a runlength encoded bitfield and then 7-bit or hex encode that, if youre worried about size.

ordian added a commit that referenced this pull request Mar 16, 2024
* master: (65 commits)
  collator protocol changes for elastic scaling (validator side) (#3302)
  Contracts use polkavm workspace deps (#3715)
  Add elastic scaling support in ParaInherent BenchBuilder  (#3690)
  Removes `as [disambiguation_path]` from `derive_impl` usage (#3652)
  fix(paseo-spec): New Paseo Bootnodes (#3674)
  Improve Penpal runtime + emulated tests (#3543)
  Staking ledger bonding fixes (#3639)
  DescribeAllTerminal for HashedDescription (#3349)
  Increase timeout for assertions (#3680)
  Add subsystems regression tests to CI (#3527)
  Always print connectivity report (#3677)
  Revert "FRAME: Create `TransactionExtension` as a replacement for `SignedExtension` (#2280)" (#3665)
  authority-discovery: Add log for debugging DHT authority records (#3668)
  Construct Runtime v2  (#1378)
  Support for `keyring` in runtimes (#2044)
  Add api-name in `cannot query the runtime API version` warning (#3653)
  Add a PolkaVM-based executor (#3458)
  Adds default config for assets pallet (#3637)
  Bump handlebars from 4.3.7 to 5.1.0 (#3248)
  [Collator Selection] Fix weight refund for `set_candidacy_bond` (#3643)
  ...
dharjeezy pushed a commit to dharjeezy/polkadot-sdk that referenced this pull request Mar 24, 2024
This is printed every 10 minutes, I see no reason why it shouldn't be in
all the logs, it would give us valuable information about what is going
on with node connectivity when validators come-back to us to report
issues.

Signed-off-by: Alexandru Gheorghe <[email protected]>
@bkchr
Copy link
Member

bkchr commented Mar 30, 2024

Debug output is not to be printed using info logs. If people come to you complaining, you should either tell them which cli flags to change or come up for example with a dedicated RPC endpoint that generates debug information.

This said, the change in this pr should be reverted.

@bkchr
Copy link
Member

bkchr commented Mar 30, 2024

#3913

@alexggh alexggh mentioned this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants