Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBG-3563 Implement automatic memory profiling #6904

Merged
merged 9 commits into from
Jul 2, 2024
Merged

CBG-3563 Implement automatic memory profiling #6904

merged 9 commits into from
Jul 2, 2024

Conversation

torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Jun 20, 2024

This should write memory profiles like pprof_heap_high_{timestamp}.pb.gz to the log file directory, so sgcollect can pick them up.

  • Collect the memory profiles a maximum of every 5 minutes.
  • Keep only 10 memory profiles.
  • api.heap_profile_collection_threshold is configurable but allow disabling with api.disable_heap_profile_collection

https://docs.google.com/document/d/1Z8pqW-CEGpAdxSSQCakBf4tSKbmzZJGz2Ojlm2JmQ8o

Before merging this code, I want to test it in capella using their cgroup setup. After implementing this, this configuration should be documented (via DOC ticket).

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

Copy link
Contributor

@gregns1 gregns1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, just few small things I noticed. I tested locally by running the branch and seems to work well!

rest/config_startup.go Outdated Show resolved Hide resolved
rest/stats_context_test.go Outdated Show resolved Hide resolved
rest/stats_context_test.go Outdated Show resolved Hide resolved
rest/config_flags.go Outdated Show resolved Hide resolved
rest/config_startup.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor suggestions.

@@ -73,6 +76,20 @@ func DefaultStartupConfig(defaultLogFilePath string) StartupConfig {
},
MaxFileDescriptors: DefaultMaxFileDescriptors,
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we shouldn't be doing the memlimit/mem calls when populating the config, and insterad this should instead be done in NewServerContext when the server context is created (in the scenario where config.API.HeapProfileCollectionThreshold is not specified). That also avoids logging here when logging isn't initialized, and also avoids this work altogether if the user's bootstrap config has explicitly set HeapProfileCollectionThreshold.

CORS *auth.CORSConfig `json:"cors,omitempty"`
HTTPS HTTPSConfig `json:"https,omitempty"`
CORS *auth.CORSConfig `json:"cors,omitempty"`
HeapProfileCollectionThreshold *uint64 `json:"heap_profile_collection_threshold,omitempty" help:"Threshold in bytes for collecting heap profiles automatically. If set, Sync Gateway will collect a memory profile when it exceeds this value. The default value will be set to 85% of the lesser of cgroup or system memory."`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure these belong in APIConfig (which has to do with the REST APIs). Was there a reason to put it here, as opposed to (say) BootstrapConfig?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't belong in the bootstrap config because it doesn't have to do with couchbase server. I moved it to the toplevel.

I don't love that either, but I agree it doesn't below in API.

@@ -1666,7 +1667,18 @@ func (sc *ServerContext) logStats(ctx context.Context) error {
// Marshal expvar map w/ timestamp to string and write to logs
base.RecordStats(string(marshalled))

return nil
if sc.Config.API.HeapProfileDisableCollection {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The early returns here (1671 and 1677) feel out of place - it would be easy for someone to need to add something to logStats in future and just stick it at the end of the function. Maybe move all of this into a sc.collectMemoryProfile() function to encapsulate this?

@@ -178,6 +181,18 @@ func NewServerContext(ctx context.Context, config *StartupConfig, persistentConf
sc.DatabaseInitManager = &DatabaseInitManager{}
}

if config.HeapProfileCollectionThreshold != nil {
sc.statsContext.heapProfileCollectionThreshold = int64(*config.HeapProfileCollectionThreshold)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we should either define this as int64 or uint64 in both places, to avoid the risk of an unsafe cast when a user specifies a very large value in the config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using uint64 because that matches the values returned from gosputil or memlimit. The reason for not using it everywhere is that our stats are int64, so GoMemStatsHeapInUse is int64.

Copy link
Contributor

@gregns1 gregns1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good just one thing I noticed then should be ready to go

}
base.InfofCtx(ctx, base.KeyAll, "Memory usage %d exceeds threshold %d, collecting memory profile", currentMemory, profileCollectionThreshold)

return sc.statsContext.collectMemoryProfile(ctx, sc.Config.Logging.LogFilePath, timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these lines (lines 1685 to 1695) can be removed? As these checks are done in collectMemoryProfile

@torcolvin torcolvin enabled auto-merge (squash) June 27, 2024 15:47
@torcolvin torcolvin merged commit 80c9ca6 into main Jul 2, 2024
34 checks passed
@torcolvin torcolvin deleted the CBG-3563 branch July 2, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants