Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(share/eds): print error for stuck register shard #2516

Merged
merged 3 commits into from
Aug 2, 2023

Conversation

walldiss
Copy link
Member

Sometimes register shard can take longer, that provided context timeout. This PR logs errors if register shard is stuck.

@codecov-commenter
Copy link

Codecov Report

Merging #2516 (1ffe024) into main (e354bb5) will decrease coverage by 0.28%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #2516      +/-   ##
==========================================
- Coverage   52.60%   52.33%   -0.28%     
==========================================
  Files         156      156              
  Lines        9995    10013      +18     
==========================================
- Hits         5258     5240      -18     
- Misses       4272     4305      +33     
- Partials      465      468       +3     
Files Changed Coverage Δ
share/eds/store.go 56.58% <0.00%> (-3.88%) ⬇️

... and 7 files with indirect coverage changes

Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case after all the hangups are removed?

@walldiss
Copy link
Member Author

Latest tests shown not hangups. But still, we don't want to lose event of hangup and info all related info of if it happens.

@Wondertan
Copy link
Member

Agreed, yet sounds like it better be a metric

Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Wondertan , this is better recorded as a metric.

@walldiss
Copy link
Member Author

Metric would need logic implemented in this PR. There are 3 cases for hangups, so I would rather have specified log lines. Also metrics are optional, but it will be important to be able to find those errors from logs in case this problem happens again

@walldiss walldiss added the kind:fix Attached to bug-fixing PRs label Aug 1, 2023
@walldiss walldiss requested a review from renaynay August 1, 2023 12:30
Copy link
Collaborator

@distractedm1nd distractedm1nd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we discussed making Put async?

@walldiss
Copy link
Member Author

walldiss commented Aug 2, 2023

Async Put would just kick the can down the road.

@walldiss walldiss enabled auto-merge (squash) August 2, 2023 10:34
@walldiss walldiss merged commit 165f81a into celestiaorg:main Aug 2, 2023
11 of 13 checks passed
walldiss added a commit to walldiss/celestia-node that referenced this pull request Aug 4, 2023
Sometimes register shard can take longer, that provided context timeout.
This PR logs errors if register shard is stuck.
walldiss added a commit to walldiss/celestia-node that referenced this pull request Aug 4, 2023
Sometimes register shard can take longer, that provided context timeout.
This PR logs errors if register shard is stuck.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:shares Shares and samples area:storage kind:fix Attached to bug-fixing PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants