-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(server/v2/api/telemetry): enable global metrics #22571
Conversation
📝 WalkthroughWalkthroughThe pull request introduces modifications to the telemetry server's initialization process by updating the Changes
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
b4303d5
to
ff7807d
Compare
@tac0turtle your pull request is missing a changelog! |
telemetry/metrics.go
Outdated
return globalTelemetryEnabled | ||
} | ||
|
||
// SetTelemetryEnabled allows for the global telemetry enabled state to be set. | ||
func EnableTelemetry() { | ||
globalTelemetryEnabled = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change potentially affects state.
Call sequence:
github.com/cosmos/cosmos-sdk/telemetry.IsTelemetryEnabled (telemetry/metrics.go:23)
github.com/cosmos/cosmos-sdk/telemetry.IncrCounter (telemetry/metrics.go:53)
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).deliverTx (telemetry/metrics.go:761)
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).internalFinalizeBlock (telemetry/metrics.go:747)
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).FinalizeBlock (telemetry/metrics.go:902)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, just some nits
Adding backport label because of simappv2 and sdk changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
server/v2/api/telemetry/server.go (1)
42-48
: Enhance panic message for better debuggingWhile the validation and conditional invocation logic is correct, the panic message could be more descriptive.
Consider this improvement:
- panic("enableTelemetry must be provided") + panic("telemetry: enableTelemetry callback function must be provided")server/v2/api/telemetry/metrics.go (1)
Line range hint
153-154
: Enhance error message consistency and specificity.Consider standardizing error messages for better debugging and consistency:
- return GatherResponse{}, errors.New("prometheus metrics are not enabled") + return GatherResponse{}, fmt.Errorf("prometheus metrics: not enabled") - return GatherResponse{}, errors.New("non in-memory metrics sink does not support generic format") + return GatherResponse{}, fmt.Errorf("metrics sink: generic format not supported for non in-memory sink") - return GatherResponse{}, fmt.Errorf("failed to gather in-memory metrics: %w", err) + return GatherResponse{}, fmt.Errorf("metrics gather: failed to collect in-memory metrics: %w", err)Also applies to: 166-167, 182-183
telemetry/metrics.go (1)
27-30
: Consider refactoring telemetry state managementThe current design splits telemetry state management between
New()
andEnableTelemetry()
, which could lead to confusion about the proper way to manage telemetry state. Consider adopting a more structured approach:
- Move state management into a dedicated type
- Make the initialization path clear and consistent
- Document the expected usage pattern
Consider this alternative design:
type TelemetryManager struct { enabled atomic.Bool mu sync.RWMutex metrics *Metrics } func NewTelemetryManager(cfg Config) (*TelemetryManager, error) { tm := &TelemetryManager{} if cfg.Enabled { if err := tm.Enable(cfg); err != nil { return nil, err } } return tm, nil } func (tm *TelemetryManager) Enable(cfg Config) error { tm.mu.Lock() defer tm.mu.Unlock() if tm.enabled.Load() { return nil } metrics, err := initializeMetrics(cfg) if err != nil { return err } tm.metrics = metrics tm.enabled.Store(true) return nil }
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
📒 Files selected for processing (4)
server/v2/api/telemetry/metrics.go
(1 hunks)server/v2/api/telemetry/server.go
(2 hunks)simapp/v2/simdv2/cmd/commands.go
(2 hunks)telemetry/metrics.go
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
server/v2/api/telemetry/metrics.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
server/v2/api/telemetry/server.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
simapp/v2/simdv2/cmd/commands.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
telemetry/metrics.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
🔇 Additional comments (5)
server/v2/api/telemetry/server.go (2)
31-31
: LGTM: Function signature change aligns with PR objectives
The addition of enableTelemetry func()
parameter is a clean way to implement the callback pattern and avoid unnecessary imports.
Line range hint 50-54
: Verify metrics initialization behavior
The metrics initialization happens after the telemetry enabling. Let's verify that NewMetrics properly handles the telemetry state.
✅ Verification successful
Metrics initialization properly handles telemetry state
The NewMetrics
implementation in server/v2/api/telemetry/metrics.go
correctly handles the telemetry state by:
- Checking the
Enable
flag in the config - Returning
nil, nil
when telemetry is disabled - Only proceeding with metrics initialization when telemetry is enabled
The implementation is safe and behaves as expected with respect to the telemetry state.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for NewMetrics implementation to verify state handling
ast-grep --pattern 'func NewMetrics($config $_) ($_, $_) {
$$$
}'
Length of output: 69
Script:
#!/bin/bash
# Let's try a broader search for NewMetrics function
rg "func NewMetrics" -A 10
# Also search for any metrics-related type definitions and initialization
rg "type.*Metrics" -A 5
# Search for metrics configuration handling
rg "metrics.*config" -A 5
Length of output: 14565
server/v2/api/telemetry/metrics.go (1)
62-64
: LGTM! Early return pattern improves control flow.
The early return when metrics are disabled is a good practice that prevents unnecessary resource allocation. However, let's verify the impact on existing callers.
✅ Verification successful
Let me check how the callers handle the nil return value.
Let me check the server implementation to see how metrics are used after initialization.
Early return pattern is safe and correctly handled
The nil return from NewMetrics
is properly handled in the codebase:
- The server stores metrics in
Server.metrics
field which can be nil - The metrics handler is only registered when
config.Enable
is true - The server's
Start
method has an early return when telemetry is disabled - No unsafe nil pointer dereferences are possible as the metrics handler is not registered when disabled
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for NewMetrics calls to ensure callers handle nil return
rg -A 3 "NewMetrics\("
Length of output: 999
Script:
#!/bin/bash
# Check how the server handles nil metrics
ast-grep --pattern 'metrics, err := NewMetrics($$$)
$$$
if metrics != nil {
$$$
}'
Length of output: 94
Script:
#!/bin/bash
# Get the server implementation to see metrics usage
cat server/v2/api/telemetry/server.go
Length of output: 3223
simapp/v2/simdv2/cmd/commands.go (2)
28-28
: LGTM! Import added correctly.
The new import for telemetry is properly aliased and follows Go conventions.
121-124
: LGTM! Telemetry initialization updated correctly.
The addition of sdktelemetry.EnableTelemetry
callback aligns with the PR objective of enabling metrics within the v2 telemetry framework.
Let's verify the telemetry initialization:
✅ Verification successful
Telemetry initialization is correctly implemented
The telemetry initialization in the code is properly done:
-
The telemetry server is initialized with the correct parameters:
- Global configuration from deps.GlobalConfig
- Logger instance
- EnableTelemetry callback function from sdktelemetry package
-
The telemetry package provides proper metrics collection and measurement capabilities:
- Module-specific measurements
- Transaction metrics
- Query metrics
- Server info metrics
-
The telemetry configuration is properly integrated into the server config structure and can be enabled/disabled through configuration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the telemetry initialization and usage
# Check if EnableTelemetry is properly defined and exported
ast-grep --pattern 'func EnableTelemetry() bool { $$$'
# Check for any direct telemetry.New calls to ensure consistent usage
rg 'telemetry\.New\(' -A 2
Length of output: 367
Script:
#!/bin/bash
# Let's check the telemetry package to understand the New function signature and usage
rg -t go 'func New.*\(' --context 3
# Also check for any other telemetry initialization patterns
rg -t go 'type.*[Tt]elemetry' --context 2
# Check imports of telemetry package
rg -t go '^import.*telemetry'
Length of output: 452779
Script:
#!/bin/bash
# Let's check the telemetry package and its usage in the codebase
rg -t go 'telemetry\.' --context 3
# Also check for any telemetry configuration
rg -t go 'EnableTelemetry' --context 3
Length of output: 17069
telemetry/metrics.go
Outdated
// SetTelemetryEnabled allows for the global telemetry enabled state to be set. | ||
func EnableTelemetry() { | ||
globalTelemetryEnabled = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider thread safety and design implications of EnableTelemetry
The current implementation has several concerns:
- Thread safety: Access to
globalTelemetryEnabled
isn't synchronized, which could lead to race conditions in concurrent environments. - The function contradicts the documented behavior that states the variable "does not change for the lifetime of the program"
- One-way operation: Consider whether disabling telemetry should also be supported
Consider this thread-safe implementation:
+var telemetryMu sync.RWMutex
func EnableTelemetry() {
+ telemetryMu.Lock()
+ defer telemetryMu.Unlock()
globalTelemetryEnabled = true
}
func IsTelemetryEnabled() bool {
+ telemetryMu.RLock()
+ defer telemetryMu.RUnlock()
return globalTelemetryEnabled
}
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
server/v2/api/telemetry/server.go (1)
42-44
: Consider using error return instead of panicWhile panicking in constructors is a common Go pattern, this validation case might be better handled by returning an error. This would allow callers to handle the invalid state more gracefully.
Consider this alternative:
- if enableTelemetry == nil { - panic("enableTelemetry must be provided") - } + if enableTelemetry == nil { + return nil, fmt.Errorf("enableTelemetry callback cannot be nil") + }simapp/v2/simdv2/cmd/commands.go (1)
121-124
: LGTM: Clean implementation of telemetry enablementThe addition of
EnableTelemetry
callback is a good architectural choice as it:
- Implements the PR objective of enabling metrics
- Avoids unnecessary imports in the telemetry package
- Maintains proper error handling and initialization order
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
📒 Files selected for processing (4)
server/v2/api/telemetry/metrics.go
(1 hunks)server/v2/api/telemetry/server.go
(2 hunks)simapp/v2/simdv2/cmd/commands.go
(2 hunks)telemetry/metrics.go
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
server/v2/api/telemetry/metrics.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
server/v2/api/telemetry/server.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
simapp/v2/simdv2/cmd/commands.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
telemetry/metrics.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
🔇 Additional comments (5)
server/v2/api/telemetry/server.go (2)
31-31
: LGTM: Function signature change is well-designed
The addition of enableTelemetry func()
parameter is clean and follows Go idioms for callback passing.
46-48
: Consider handling potential callback errors
The enableTelemetry
callback is executed without any error handling. While it's currently defined as func()
, consider whether it should return an error that could be propagated to the caller.
Let's check how this callback is used across the codebase:
Consider updating the signature to handle potential errors:
-func New[T transaction.Tx](cfg server.ConfigMap, logger log.Logger, enableTelemetry func()) (*Server[T], error) {
+func New[T transaction.Tx](cfg server.ConfigMap, logger log.Logger, enableTelemetry func() error) (*Server[T], error) {
// ...
if srv.config.Enable {
- enableTelemetry()
+ if err := enableTelemetry(); err != nil {
+ return nil, fmt.Errorf("failed to enable telemetry: %w", err)
+ }
}
server/v2/api/telemetry/metrics.go (1)
62-64
: Verify nil handling in callers of NewMetrics
The early return of nil, nil
when metrics are disabled is a good practice. However, we should ensure all callers properly handle the nil case to prevent potential panics.
✅ Verification successful
Early return with nil, nil
is safe and properly handled
Based on the codebase analysis:
- The only direct caller of
NewMetrics
is inserver/v2/api/telemetry/server.go
- The caller properly checks for error:
if err != nil { return nil, fmt.Errorf(...) }
- The metrics field is optional in the Server struct, indicating it's designed to handle nil metrics
- The early return pattern aligns with Go's idiomatic approach for optional features
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for direct calls to NewMetrics and verify nil checks
ast-grep --pattern 'metrics, err := telemetry.NewMetrics($_)
$$$
if err != nil {
$$$
}
if metrics != nil {
$$$
}'
Length of output: 119
Script:
#!/bin/bash
# Let's try a broader search for NewMetrics usage
rg "NewMetrics" -A 5 -B 2
Length of output: 2155
Script:
#!/bin/bash
# Let's check how srv.metrics is used in the server code
rg "srv\.metrics" -A 2 -B 2
Length of output: 310
Script:
#!/bin/bash
# Let's examine the Server struct and any methods that might use metrics
rg -A 10 -B 2 "type Server struct"
Length of output: 790
simapp/v2/simdv2/cmd/commands.go (1)
28-28
: LGTM: Import added for telemetry enablement
The addition of the SDK telemetry package aligns with the PR objective and follows Go import conventions.
telemetry/metrics.go (1)
27-30
:
Critical: Address concurrent access and state management concerns
The current implementation has several issues:
- Race Condition: The
globalTelemetryEnabled
variable is not protected against concurrent access, which could lead to race conditions in a concurrent environment. - Immutability Violation: The comment on line 19 states that
globalTelemetryEnabled
"does not change for the lifetime of the program", but this function violates that contract. - Documentation Mismatch: The comment suggests setting the state, but the function only enables it.
Consider this safer implementation:
- // SetTelemetryEnabled allows for the global telemetry enabled state to be set.
+ // EnableTelemetry enables telemetry for the application.
+ // Note: This operation is not thread-safe and should only be called during initialization.
func EnableTelemetry() {
+ if !globalTelemetryEnabled {
globalTelemetryEnabled = true
+ }
}
Let's verify the usage pattern to ensure it's only called during initialization:
server/v2/api/telemetry/metrics.go
Outdated
if !cfg.Enable { | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Document nil return and consider adding IsEnabled method
To improve safety and usability:
- Document that NewMetrics returns nil when metrics are disabled
- Consider adding an IsEnabled() bool method to safely check metrics state
Add documentation and helper method:
+ // NewMetrics creates a new instance of Metrics.
+ // Returns nil if metrics are disabled in the configuration.
func NewMetrics(cfg *Config) (*Metrics, error) {
if !cfg.Enable {
return nil, nil
}
// Add this helper method to the Metrics struct
func (m *Metrics) IsEnabled() bool {
return m != nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with the nits pointed by Julien 👌
return globalTelemetryEnabled | ||
} | ||
|
||
// EnableTelemetry allows for the global telemetry enabled state to be set. | ||
func EnableTelemetry() { | ||
globalTelemetryEnabled = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change potentially affects state.
Call sequence:
github.com/cosmos/cosmos-sdk/telemetry.IsTelemetryEnabled (telemetry/metrics.go:23)
github.com/cosmos/cosmos-sdk/telemetry.IncrCounter (telemetry/metrics.go:53)
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).deliverTx (telemetry/metrics.go:761)
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).internalFinalizeBlock (telemetry/metrics.go:747)
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).FinalizeBlock (telemetry/metrics.go:902)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
math/dec_test.go (1)
453-454
: Fix indentation for consistency.The indentation uses tabs but appears misaligned with other test cases in the file.
- {math.LegacyNewDecWithPrec(4, 2), 0, math.LegacyOneDec()}, // 0.04 ^ 0 => 1.0 - {math.LegacyNewDec(0), 1, math.LegacyNewDec(0)}, // 0 ^ 1 => 0 + {math.LegacyNewDecWithPrec(4, 2), 0, math.LegacyOneDec()}, // 0.04 ^ 0 => 1.0 + {math.LegacyNewDec(0), 1, math.LegacyNewDec(0)}, // 0 ^ 1 => 0
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
📒 Files selected for processing (2)
math/dec_test.go
(1 hunks)server/v2/cometbft/abci.go
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
math/dec_test.go (2)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
Pattern **/*_test.go
: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"
server/v2/cometbft/abci.go (1)
Pattern **/*.go
: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.
🔇 Additional comments (3)
server/v2/cometbft/abci.go (2)
292-292
: Improved error wrapping in simulation response
The change from %v
to %w
verb in fmt.Errorf
is a good improvement as it properly wraps the original error, preserving the error chain for better debugging and error handling upstream.
37-40
: LGTM: Import changes are necessary
The added imports are required for handling codec types, SDK types, and transaction types, which are used in the error handling improvements.
math/dec_test.go (1)
453-454
: LGTM! Good test coverage for edge cases.
The test cases appropriately cover important edge cases:
- Testing 0.04^0 which should equal 1.0 (any number raised to power 0)
- Testing 0^1 which should equal 0 (zero raised to any positive power)
(cherry picked from commit b45cf75) # Conflicts: # server/v2/api/telemetry/server.go
…#22579) Co-authored-by: Marko <[email protected]> Co-authored-by: Julien Robert <[email protected]>
* main: fix(server/v2/api/telemetry): enable global metrics (#22571)
* main: build(deps): Bump cosmossdk.io/math from 1.3.0 to 1.4.0 (#22580) fix(server/v2/api/telemetry): enable global metrics (#22571) refactor(server/v2/cometbft): add `codec.Codec` and clean-up APIs (#22566) feat(core/coretesting): make memDB satisfy db.Db interface (#22570) Merge commit from fork fix(server(/v2)): fix fallback genesis path (#22564) fix: only overwrite context chainID when necessary (#22568) docs(client): Update setFeeGranter and setFeePayer comments (#22526) fix(baseapp): populate header info in `NewUncachedContext` (#22557) build(deps): Bump buf.build/gen/go/cometbft/cometbft/protocolbuffers/go from 1.35.1-20240701160653-fedbb9acfd2f.1 to 1.35.2-20240701160653-fedbb9acfd2f.1 in /api (#22551) build(deps): Bump github.com/creachadair/atomicfile from 0.3.5 to 0.3.6 in /tools/confix (#22552) docs: Update reference of Approximation (#22550) fix(server/v2/comebft): wire missing services + fix simulation (#21964) ci: fix permissions for GITHUB_TOKEN on dependabot workflows (#22547) ci: fix permissions for GITHUB_TOKEN in spell check workflow (#22545) build(deps): Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 (#22537) fix(cosmovisor): premature upgrade on restart (#22528) fix(store/v2/pebble): handle version 0 in keys (#22524) refactor(server/v2/telemetry): swap redirects (#22520) docs: Update content in CODE_OF_CONDUCT.md (#22518)
Description
This enables metrics by passing a callback to v2 telemetry to avoid imports
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
in the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.
I have...
Summary by CodeRabbit
New Features
Bug Fixes