Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging capabilities for /vsicurl, /vsis3 and the like #2742

Merged
merged 1 commit into from
Jul 9, 2020

Conversation

rouault
Copy link
Member

@rouault rouault commented Jul 2, 2020

  • Add VSINetworkStatsGetAsSerializedJSON() to return collected
    statistics as a JSON serialized string

    e.g.

       {
            "methods": {
                "PUT": {
                    "count": 1,
                    "uploaded_bytes": 6
                }
            },
            "handlers": {
                "vsis3": {
                    "files": {
                        "/vsis3/s3_fake_bucket3/another_file.bin": {
                            "methods": {
                                "PUT": {
                                    "count": 1,
                                    "uploaded_bytes": 6
                                }
                            },
                            "actions": {
                                "Write": {
                                    "methods": {
                                        "PUT": {
                                            "count": 1,
                                            "uploaded_bytes": 6
                                        }
                                    }
                                }
                            }
                        }
                    },
                    "methods": {
                        "PUT": {
                            "count": 1,
                            "uploaded_bytes": 6
                        }
                    }
                }
            }
        }

Statistics collecting should be enabled with the CPL_VSIL_NETWORK_STATS_ENABLED
configuration option set to YES before any network activity starts
(for efficiency, reading it is cached on first access, until VSINetworkStatsReset() is called)

Statistics can also be emitted on standard output at process termination if
the CPL_VSIL_SHOW_NETWORK_STATS configuration option is set to YES.

  • Add VSINetworkStatsReset() to clear statistics

  • Map both functions to SWIG

@rouault rouault force-pushed the network_stats branch 5 times, most recently from 303a131 to 7fb29b1 Compare July 2, 2020 22:50
@rouault rouault closed this Jul 4, 2020
@rouault rouault reopened this Jul 4, 2020
@sgillies
Copy link
Contributor

sgillies commented Jul 6, 2020

@rouault I can't tell from the first comment how we would use this or understand the data. Does it record all requests between setting and unsetting the config option?

@rouault
Copy link
Member Author

rouault commented Jul 6, 2020

Does it record all requests between setting and unsetting the config option?

It records everything when CPL_VSIL_NETWORK_STATS_ENABLED is set to YES before the first network access (at that point, the value is cached), or if it set before the next network access following a call to VSINetworkStatsReset() (which resets the caching of the env variable). This is to avoid testing the env variable too frequently.
Another option I considered was to make statistics collecting conditionned by a #define, but I feel the way it is done in this PR should make the overhead when not enabling the env variable acceptable for regular production builds.

@sgillies
Copy link
Contributor

sgillies commented Jul 6, 2020

At the risk of bike shedding, what would you think about a more raw recording of network requests instead of this aggregated output? For example, one record per request with keys like "handler" ("vsicurl" etc), "uri", "method", "status", and "bytes"? Users could choose their favorite tools to aggregate these records. And what would you think about pouring these records into the CPL log stream?

@rouault
Copy link
Member Author

rouault commented Jul 6, 2020

what would you think about a more raw recording of network requests instead of this aggregated output?

The need expressed was to have some form of aggregation to have an assessment of the amount of requests & bytes for cost evaluation. Digging into logs isn't super convenient. We have already some low level logging if enabling CPL_CURL_VERBOSE

@rouault
Copy link
Member Author

rouault commented Jul 7, 2020

@sgillies Do you think my approach would still be acceptable / useful ? The issue I can see with raw logs is that everybody will have to write their parser to build similar high level stats to the ones of this PR

@sgillies
Copy link
Contributor

sgillies commented Jul 7, 2020

@rouault would the report output be useful in combination with e.g. https://aws.amazon.com/s3/pricing/? Yes, I think it would.

My sense is that the report format lacks some structure that would make it easier to consume. At every node in the graph, there are many items of different kinds and that will make it hard to iterate over handlers or request methods or filenames without knowing which are which, if you see what I mean. If request methods were collected in a "methods" object, handlers were collected in a handlers object, files collected in a files object, I think that could help. Something like

{"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "handlers": {"vsis3": {"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "files": ...}

@rouault
Copy link
Member Author

rouault commented Jul 7, 2020

Something like

{"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "handlers": {"vsis3": {"methods": {"GET": {"count": M, "bytes": N}, "PUT": {"count": M, "bytes": N}}, "files": ...}

Sounds good !

* Add VSINetworkStatsGetAsSerializedJSON() to return collected
  statistics as a JSON serialized string

  e.g.
```json
   {
        "methods": {
            "PUT": {
                "count": 1,
                "uploaded_bytes": 6
            }
        },
        "handlers": {
            "vsis3": {
                "files": {
                    "/vsis3/s3_fake_bucket3/another_file.bin": {
                        "methods": {
                            "PUT": {
                                "count": 1,
                                "uploaded_bytes": 6
                            }
                        },
                        "actions": {
                            "Write": {
                                "methods": {
                                    "PUT": {
                                        "count": 1,
                                        "uploaded_bytes": 6
                                    }
                                }
                            }
                        }
                    }
                },
                "methods": {
                    "PUT": {
                        "count": 1,
                        "uploaded_bytes": 6
                    }
                }
            }
        }
    }
```

  Statistics collecting should be enabled with the CPL_VSIL_NETWORK_STATS_ENABLED
  configuration option set to YES before any network activity starts
  (for efficiency, reading it is cached on first access, until VSINetworkStatsReset() is called)

  Statistics can also be emitted on standard output at process termination if
  the CPL_VSIL_SHOW_NETWORK_STATS configuration option is set to YES.

* Add VSINetworkStatsReset() to clear statistics

* Map both functions to SWIG
@rouault rouault merged commit 6a90456 into OSGeo:master Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants