Add logging capabilities for /vsicurl, /vsis3 and the like #2742

rouault · 2020-07-02T21:36:42Z

Add VSINetworkStatsGetAsSerializedJSON() to return collected
statistics as a JSON serialized string

e.g.

       {
            "methods": {
                "PUT": {
                    "count": 1,
                    "uploaded_bytes": 6
                }
            },
            "handlers": {
                "vsis3": {
                    "files": {
                        "/vsis3/s3_fake_bucket3/another_file.bin": {
                            "methods": {
                                "PUT": {
                                    "count": 1,
                                    "uploaded_bytes": 6
                                }
                            },
                            "actions": {
                                "Write": {
                                    "methods": {
                                        "PUT": {
                                            "count": 1,
                                            "uploaded_bytes": 6
                                        }
                                    }
                                }
                            }
                        }
                    },
                    "methods": {
                        "PUT": {
                            "count": 1,
                            "uploaded_bytes": 6
                        }
                    }
                }
            }
        }

Statistics collecting should be enabled with the CPL_VSIL_NETWORK_STATS_ENABLED
configuration option set to YES before any network activity starts
(for efficiency, reading it is cached on first access, until VSINetworkStatsReset() is called)

Statistics can also be emitted on standard output at process termination if
the CPL_VSIL_SHOW_NETWORK_STATS configuration option is set to YES.

Add VSINetworkStatsReset() to clear statistics
Map both functions to SWIG

sgillies · 2020-07-06T17:58:41Z

@rouault I can't tell from the first comment how we would use this or understand the data. Does it record all requests between setting and unsetting the config option?

rouault · 2020-07-06T18:25:49Z

Does it record all requests between setting and unsetting the config option?

It records everything when CPL_VSIL_NETWORK_STATS_ENABLED is set to YES before the first network access (at that point, the value is cached), or if it set before the next network access following a call to VSINetworkStatsReset() (which resets the caching of the env variable). This is to avoid testing the env variable too frequently.
Another option I considered was to make statistics collecting conditionned by a #define, but I feel the way it is done in this PR should make the overhead when not enabling the env variable acceptable for regular production builds.

sgillies · 2020-07-06T19:48:41Z

At the risk of bike shedding, what would you think about a more raw recording of network requests instead of this aggregated output? For example, one record per request with keys like "handler" ("vsicurl" etc), "uri", "method", "status", and "bytes"? Users could choose their favorite tools to aggregate these records. And what would you think about pouring these records into the CPL log stream?

rouault · 2020-07-06T19:55:23Z

what would you think about a more raw recording of network requests instead of this aggregated output?

The need expressed was to have some form of aggregation to have an assessment of the amount of requests & bytes for cost evaluation. Digging into logs isn't super convenient. We have already some low level logging if enabling CPL_CURL_VERBOSE

rouault · 2020-07-07T20:53:59Z

@sgillies Do you think my approach would still be acceptable / useful ? The issue I can see with raw logs is that everybody will have to write their parser to build similar high level stats to the ones of this PR

sgillies · 2020-07-07T21:50:58Z

@rouault would the report output be useful in combination with e.g. https://aws.amazon.com/s3/pricing/? Yes, I think it would.

My sense is that the report format lacks some structure that would make it easier to consume. At every node in the graph, there are many items of different kinds and that will make it hard to iterate over handlers or request methods or filenames without knowing which are which, if you see what I mean. If request methods were collected in a "methods" object, handlers were collected in a handlers object, files collected in a files object, I think that could help. Something like

{"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "handlers": {"vsis3": {"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "files": ...}

rouault · 2020-07-07T22:00:47Z

Something like

{"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "handlers": {"vsis3": {"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "files": ...}

Sounds good !

* Add VSINetworkStatsGetAsSerializedJSON() to return collected statistics as a JSON serialized string e.g. ```json { "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } }, "handlers": { "vsis3": { "files": { "/vsis3/s3_fake_bucket3/another_file.bin": { "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } }, "actions": { "Write": { "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } } } } } }, "methods": { "PUT": { "count": 1, "uploaded_bytes": 6 } } } } } ``` Statistics collecting should be enabled with the CPL_VSIL_NETWORK_STATS_ENABLED configuration option set to YES before any network activity starts (for efficiency, reading it is cached on first access, until VSINetworkStatsReset() is called) Statistics can also be emitted on standard output at process termination if the CPL_VSIL_SHOW_NETWORK_STATS configuration option is set to YES. * Add VSINetworkStatsReset() to clear statistics * Map both functions to SWIG

rouault force-pushed the network_stats branch 5 times, most recently from 303a131 to 7fb29b1 Compare July 2, 2020 22:50

rouault closed this Jul 4, 2020

rouault reopened this Jul 4, 2020

rouault force-pushed the network_stats branch from 7fb29b1 to caa6ac1 Compare July 8, 2020 12:43

rouault merged commit 6a90456 into OSGeo:master Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logging capabilities for /vsicurl, /vsis3 and the like #2742

Add logging capabilities for /vsicurl, /vsis3 and the like #2742

rouault commented Jul 2, 2020 •

edited

Loading

sgillies commented Jul 6, 2020

rouault commented Jul 6, 2020

sgillies commented Jul 6, 2020

rouault commented Jul 6, 2020

rouault commented Jul 7, 2020

sgillies commented Jul 7, 2020

rouault commented Jul 7, 2020

Add logging capabilities for /vsicurl, /vsis3 and the like #2742

Add logging capabilities for /vsicurl, /vsis3 and the like #2742

Conversation

rouault commented Jul 2, 2020 • edited Loading

sgillies commented Jul 6, 2020

rouault commented Jul 6, 2020

sgillies commented Jul 6, 2020

rouault commented Jul 6, 2020

rouault commented Jul 7, 2020

sgillies commented Jul 7, 2020

rouault commented Jul 7, 2020

rouault commented Jul 2, 2020 •

edited

Loading