Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Health endpoint #836

Merged
merged 22 commits into from
Dec 9, 2024
Merged

Adding Health endpoint #836

merged 22 commits into from
Dec 9, 2024

Conversation

otherview
Copy link
Member

Description

A health check service that lives inside the node as an API endpoint.

I think it's a good illustrative start for a PR, there are a few tech choices made that I consider would be worth discussing, such as:

  • Should this endpoint live in the Admin API ?
  • Should this be a singleton service ?
  • Is the naming on point ?

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • New and existing E2E tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have not added any vulnerable dependencies to my code

health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 53.33333% with 77 lines in your changes missing coverage. Please review.

Project coverage is 60.65%. Comparing base (49d9704) to head (05e7dbf).
Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
cmd/thor/main.go 0.00% 31 Missing ⚠️
api/admin/health/health_api.go 63.15% 10 Missing and 4 partials ⚠️
api/admin/loglevel/log_level.go 69.76% 12 Missing and 1 partial ⚠️
api/admin/admin.go 0.00% 10 Missing ⚠️
api/admin/health/health.go 91.89% 1 Missing and 2 partials ⚠️
api/admin_server.go 0.00% 3 Missing ⚠️
comm/communicator.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #836      +/-   ##
==========================================
+ Coverage   60.62%   60.65%   +0.03%     
==========================================
  Files         215      218       +3     
  Lines       23099    23229     +130     
==========================================
+ Hits        14003    14089      +86     
- Misses       7947     7984      +37     
- Partials     1149     1156       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@otherview otherview marked this pull request as ready for review October 29, 2024 10:49
@otherview otherview requested a review from a team as a code owner October 29, 2024 10:49
@leszek-vechain
Copy link
Contributor

do we have any tests for this new endpoint ?

@otherview otherview force-pushed the pedro/health_endpoint branch from b124448 to 710e6c6 Compare October 30, 2024 13:10
@otherview otherview force-pushed the pedro/health_endpoint branch from 710e6c6 to ed0097c Compare October 30, 2024 14:41
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The admin endpoints were buckled in this package and reformatted to follow the same pattern as other endpoints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, this can live in the init functions of the thor and thor-solo ?

@darrenvechain
Copy link
Member

darrenvechain commented Nov 4, 2024

I got this when I sync a new node, is it expected?

Edit: I just saw previous comment. IMO it is healthy if its syncing like this

curl http://localhost:2113/admin/health
{"healthy":false,"blockIngestion":{"bestBlock":"0x00014fe5a9e48a02268655e8206bb4cbcc95a1ae0d1f5f4360f697f02077c727","bestBlockIngestionTimestamp":"2024-11-04T10:49:16.06031Z"},"chainSync":false}
image

@darrenvechain
Copy link
Member

darrenvechain commented Nov 4, 2024

Is the response always 200? From my biased experienced with spring actuator, it returns 503 if some aspect of the application is unhealthy. This makes it easier for @kgapos health checks as he doesn't have to get the response body and parse and check etc. He can just check the response code, which is built into AWS very easily

FYI we can still have a response body with 503, so client can grab the response and do what they want

Edit: nvm: looks like it has 503, wasn't seeing it in testing locally

@kgapos
Copy link
Member

kgapos commented Nov 4, 2024

Yeah, @darrenvechain is right, it's going to be useless as far as the AWS ALB is concerned if it doesn't return a 5XX status code when the node is unhealthy. I understand this is unfortunate and counter intuitive, but unless you build it like that I have to implement some sort of wrapper, which is the exact thing we're trying to replace. Tagging @otherview for visibility.

Note that I mentioned this constraint in the related issue.

@otherview
Copy link
Member Author

I got this when I sync a new node, is it expected?
Edit: I just saw previous comment. IMO it is healthy if its syncing like this

I understand what you mean, it's the regular and expected behaviour to sync to the lastest block.
Perhaps it's possible to find another naming system, (ready comes to mind,) but the idea is that a node is not healthy to provide node operations if it's in the sync period.

This is the standard I've seen in other nodes, as it helps node operators to know when the node is 100% ready to process blockchain operations.

@otherview
Copy link
Member Author

Yeah, @darrenvechain is right, it's going to be useless as far as the AWS ALB is concerned if it doesn't return a 5XX status code when the node is unhealthy. I understand this is unfortunate and counter intuitive, but unless you build it like that I have to implement some sort of wrapper, which is the exact thing we're trying to replace. Tagging @otherview for visibility.

Yeah I got you covered :) It's returning a 503 when "healthy":false

health/health.go Outdated Show resolved Hide resolved
@otherview otherview force-pushed the pedro/health_endpoint branch from 885ddd5 to 17c2ce9 Compare November 4, 2024 18:34
health/health.go Outdated Show resolved Hide resolved
health/health.go Outdated Show resolved Hide resolved
@libotony
Copy link
Member

libotony commented Nov 5, 2024

I would suggest to only leverage best block's timestamp in this function.

health/health.go Outdated
Comment on lines 45 to 51
func New(repo *chain.Repository, p2p *comm.Communicator, timeBetweenBlocks time.Duration) *Health {
return &Health{
repo: repo,
timeBetweenBlocks: timeBetweenBlocks + delayBuffer,
p2p: p2p,
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the health service to accept components and work in a pull fashion.
I think it looks better, thanks guys 🙏

health/health.go Outdated Show resolved Hide resolved
darrenvechain
darrenvechain previously approved these changes Nov 21, 2024
darrenvechain
darrenvechain previously approved these changes Nov 21, 2024
darrenvechain
darrenvechain previously approved these changes Nov 29, 2024
@otherview otherview requested a review from libotony December 2, 2024 11:05
api/admin/health/health.go Outdated Show resolved Hide resolved
libotony
libotony previously approved these changes Dec 6, 2024
@otherview otherview force-pushed the pedro/health_endpoint branch from e14ff2f to 05e7dbf Compare December 9, 2024 10:44
@otherview otherview merged commit 7579db4 into master Dec 9, 2024
15 checks passed
@otherview otherview deleted the pedro/health_endpoint branch December 9, 2024 10:55
otherview added a commit that referenced this pull request Dec 9, 2024
* fix(documentation): use absolute links in markdown (#889)

* Add benchmark test to node block process (#892)

* Add benchmark test to node block process

* added file-based storage

* use tempdir

* update dependency go-ethereum (#895)

* chore: update API metrics bucket and endpoint names (#893)

* chore: update API metrics bucket and endpoint names

* fix: typo & tests

* fix: lint

* chore: add websocket total counter

* fix: txs endpoints names & ws subject

* fix: unit tests

* chore: standardise naming convention

* chore: add websocke duration & http code

* chore: add websocke duration & http code

* fix: lint issues

* fix: sync issues with metrics

* chore: update websocket durations bucket

* fix: PR comments - use sync.Once

* chore: update builtin generation (#896)

* chore: update builtin generation

* fix: update GHA

* getreceipts metrics + lint (#902)

* chore: add flag to enable/disable deprecated APIs (#897)

* chore: add flag to enable/disable deprecated APIs

* chore: update for PR comments

* chore: update for PR comments

* fix: update e2e commit sha

* fix: update e2e commit sha

* fix: update flag name

* fix: solo start flags (#906)

* chore: make thorclient configurable + fix type error (#908)

* chore: make thorclient configurable

* fix: subscriptions block type

* fix: compile errors

* fix: remove test with lint error

* add 'raw' query parameter to the blocks (#899)

* add 'raw' query parameter to the blocks

* summary -> summary.Header

Co-authored-by: libotony <[email protected]>

* change variable name

* make expanded and raw mutually exclusive

* add unit tests

* fix linting

---------

Co-authored-by: libotony <[email protected]>

* Adding Health endpoint (#836)

* Adding Health endpoint

* pr comments + 503 if not healthy

* refactored admin server and api + health endpoint tests

* fix health condition

* fix admin routing

* added comments + changed from ChainSync to ChainBootstrapStatus

* Adding healthcheck for solo mode

* adding solo + tests

* fix log_level handler funcs

* refactor health package + add p2p count

* remove solo methods

* moving health service to api pkg

* added defaults + api health query

* pr comments

* pr comments

* pr comments

* Update cmd/thor/main.go

* Darren/admin api log toggler (#877)

* Adding Health endpoint

* pr comments + 503 if not healthy

* refactored admin server and api + health endpoint tests

* fix health condition

* fix admin routing

* added comments + changed from ChainSync to ChainBootstrapStatus

* Adding healthcheck for solo mode

* adding solo + tests

* fix log_level handler funcs

* feat(admin): toggle api logs via admin API

* feat(admin): add license headers

* refactor health package + add p2p count

* remove solo methods

* moving health service to api pkg

* added defaults + api health query

* pr comments

* pr comments

---------

Co-authored-by: otherview <[email protected]>

* Darren/chore/backport metrics (#909)

* chore(muxdb): backport muxdb cache metrics

* chore(muxdb): backport muxdb cache metrics

* chore(metrics): backport disk IO

* chore(metrics): fix lint

* chore(chain): add repo cache metrics

* fix(chain): fix cache return value

* refactor(chain): cache hit miss

* chore(thor): update version (#912)

* chore(thor): update version

* chore(openapi): version

* feat(api/debug): support debug trace without blockId (#905)

* api/debug: support debug with txhash

Signed-off-by: jsvisa <[email protected]>

api/debug: blockId should use tx's instead

Signed-off-by: jsvisa <[email protected]>

fix tests

Signed-off-by: jsvisa <[email protected]>

* debug: add test

Signed-off-by: jsvisa <[email protected]>

* improve parseTarget

Signed-off-by: jsvisa <[email protected]>

* update doc

Signed-off-by: jsvisa <[email protected]>

* fix tests

Signed-off-by: jsvisa <[email protected]>

---------

Signed-off-by: jsvisa <[email protected]>
Co-authored-by: tony <[email protected]>

* version

---------

Signed-off-by: jsvisa <[email protected]>
Co-authored-by: Darren Kelly <[email protected]>
Co-authored-by: libotony <[email protected]>
Co-authored-by: YeahNotSewerSide <[email protected]>
Co-authored-by: Delweng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants