-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logging capabilities for /vsicurl, /vsis3 and the like #2742
Conversation
303a131
to
7fb29b1
Compare
@rouault I can't tell from the first comment how we would use this or understand the data. Does it record all requests between setting and unsetting the config option? |
It records everything when CPL_VSIL_NETWORK_STATS_ENABLED is set to YES before the first network access (at that point, the value is cached), or if it set before the next network access following a call to VSINetworkStatsReset() (which resets the caching of the env variable). This is to avoid testing the env variable too frequently. |
At the risk of bike shedding, what would you think about a more raw recording of network requests instead of this aggregated output? For example, one record per request with keys like "handler" ("vsicurl" etc), "uri", "method", "status", and "bytes"? Users could choose their favorite tools to aggregate these records. And what would you think about pouring these records into the CPL log stream? |
The need expressed was to have some form of aggregation to have an assessment of the amount of requests & bytes for cost evaluation. Digging into logs isn't super convenient. We have already some low level logging if enabling CPL_CURL_VERBOSE |
@sgillies Do you think my approach would still be acceptable / useful ? The issue I can see with raw logs is that everybody will have to write their parser to build similar high level stats to the ones of this PR |
@rouault would the report output be useful in combination with e.g. https://aws.amazon.com/s3/pricing/? Yes, I think it would. My sense is that the report format lacks some structure that would make it easier to consume. At every node in the graph, there are many items of different kinds and that will make it hard to iterate over handlers or request methods or filenames without knowing which are which, if you see what I mean. If request methods were collected in a "methods" object, handlers were collected in a handlers object, files collected in a files object, I think that could help. Something like
|
Sounds good ! |
* Add VSINetworkStatsGetAsSerializedJSON() to return collected statistics as a JSON serialized string e.g. ```json { "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } }, "handlers": { "vsis3": { "files": { "/vsis3/s3_fake_bucket3/another_file.bin": { "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } }, "actions": { "Write": { "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } } } } } }, "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } } } } } ``` Statistics collecting should be enabled with the CPL_VSIL_NETWORK_STATS_ENABLED configuration option set to YES before any network activity starts (for efficiency, reading it is cached on first access, until VSINetworkStatsReset() is called) Statistics can also be emitted on standard output at process termination if the CPL_VSIL_SHOW_NETWORK_STATS configuration option is set to YES. * Add VSINetworkStatsReset() to clear statistics * Map both functions to SWIG
Add VSINetworkStatsGetAsSerializedJSON() to return collected
statistics as a JSON serialized string
e.g.
Statistics collecting should be enabled with the CPL_VSIL_NETWORK_STATS_ENABLED
configuration option set to YES before any network activity starts
(for efficiency, reading it is cached on first access, until VSINetworkStatsReset() is called)
Statistics can also be emitted on standard output at process termination if
the CPL_VSIL_SHOW_NETWORK_STATS configuration option is set to YES.
Add VSINetworkStatsReset() to clear statistics
Map both functions to SWIG