-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Improvement to Logging #702
Comments
+1
I've seen and appreciated those entries with other servers too, so I'd prefer not to remove them. In fact, it might make sense to add two more standard entries:
That would be too much of a cleanup IMO. There might very well be different HTTP endpoints served from the same binary in the future (perhaps even today? - haven't checked), e.g. a v2, or additional helper services.
https://brouter.de/privacypolicy.html (as linked from https://brouter.de/brouter-web) returns 404. For GDPR compliance it should be clearly stated for what reasons and for how long that data is retained, though. See https://bikerouter.de for how it could be done properly. Before that is fixed, storing the IP should be disabled immediately. IOW, logging the IP should be off by default, or it should be replaced by a static placeholder. Logging the real IP should be an option (config or env), to be enabled only by admins who know what they are doing (see paragraph above). Is logging the IP even needed, or would a hash be enough? |
Hallo marcus+msb48, restored the https://brouter.de/privacypolicy.html ... difficult for me to comment the RFC if you do not say what you plan to do with the logs. As I sayd, I am doing just some access statistics and quality assurance for the quality of service for the brouter.de instance. Here, parallel sessions summer peak is about 200 (so capping at 999 theoretical so far) and requests-per-day summer peak is about 400.000 Yesterday, however, requests per day was 900.000 and looking at the log shows what happened: 18.05.24 11:47 127 ip=147.45.47.133 ms=1 -> GET /brouter/suspects/Austria/Vienna/confirmed/843514405793336318/fixed?ndays=30-1))%20OR%20938=(SELECT%20938%20FROM%20PG_SLEEP(15))-- HTTP/1.1 Lessons learned her: 1) more then just standard routing requests in that log, 2) intrusion detection needs unformatted infoprmation, 3) IP adress needed to add it on the blacklist However, for context you need to know there's the nginx access/error logs in addition, that do contain the IP and the URL as well. On brouter.de, they are setup for daily rotate/gzip and 2 weeks archive brouter-logging up to now is rotated manually (which does no work well) So depends, if you want to have a long-time archive of access statistics (and I would like to have..) that should be some hybrid that works in conjunction with the nginx-log, having the brouter-log free of IPs for GDPR compliance. Hard to believe for me that you will get happy here with structured JSON when talking about 100 Mio Requests per year. Other Aspect here is that if the JSON comes with a library dependency then it comes with a price... ((Geo)JSON up to now we create low-level) So maybe Marcus you can comment on the intended usage of the log? regards, Arndt |
Hi @abrensch, my motivation for introducing structured logging is to analyze more efficiently. The structured logs can be directly imported into modern log management systems like Elasticsearch and then searched, analyzed, aggregated, or visualized with frontends like Kibana, Graylog, Grafana, etc. Examples:
All of this is possible with plain text logs too, but it requires more effort to parse the logs and extract the relevant information. I'm using Elasticsearch, Graylog and Kibana for other projects. Depending on the time of day, hundreds to thousands of messages are ingested every second on some small virtual servers. So performance is definitely not an issue. 100M messages would require only a few gigabytes of storage, which is negligible nowadays. As the log format is relatively simple, building the messages manually would not be a hard task if no further external dependencies are allowed. Of course it'd be easily possible to pick up the existing log messages and convert them to structured logs by an external tool or script. Integration into BRouter is not necessarily important to me, but I thought it might also be useful for other server operators. Re IP addresses/GDPR: Logging, identifying, and blocking rogue clients are better done at least one level above BRouter, for example, in the reverse proxy, before the request reaches the BRouter server. This way, the IP address can easily be printed in a hashed form to BRouter's log messages (even the session pool could work with the hashes instead of plain IP addresses). In my development setup, I've already implemented logging hashes: SHA256(IP address + User-Agent), then I extract the first 10 characters. This way, each client remains unique but is not traceable to an IP address:
@msb48, thank you for your remarks regarding the logged HTTP information. It makes total sense. I'll update the table in the first post accordingly. |
Following #699 I'd like to discuss some more improvements to the log output of BRouter server.
There are three things I'd like to address:
These are just some ideas scribbled down. Before going any deeper, I'd like to hear your thoughts on this.
Separation between stdout and stderr outputs
Currently BRouter emits messages to stdout and stderr.
Sometimes messages related to each other are printed to different streams (e.g. error message printed to stdout, followed by a stacktrace printed to stderr).
A clear separation between regular log messages (i.e. routing requests or profile uploads) and informational/error messages (e.g. startup messages, usage information, errors, stacktraces, debug output if enabled) would be beneficial.
Structure of the log messages
I've tried to describe the structure of the log messages in the README.
I think there's some room for improvement here, mainly removing redundant information which has to be filtered out in later processing anyway.
I'd like to propose a structure that's very close to the current format, with some small changes:
timestamp
session count
new
for newly detected client or counter (positive integer)remove padding
client ip
ip=
prefixprocessing time
ms=
prefixrequest method
request URL
http version
HTTP/1.1
response status
200
,500
reponse size
An example of a log message would look like this:
All fields are separated by a single space character. The encoding of the query string depends on the client, so we should ensure that the values logged are always in URL-encoded (percent-encoding, RFC3986) form. This way, no additional spaces have to be considered when parsing log messages.
Introduction of optional structured logging
For easier processing of log messages, I'd like to propose introducing structured logging, starting with regular log output at stdout. This would allow to easily parse the log messages and extract any relevant information.
To keep backward compatibility, this should be an optional feature.
The switch to structured logging from the legacy format would be controlled by an environment variable.
Another way would be to introduce a new command line option, but as BRouter's argument parsing is somewhat restricted (exact order is required, new arguments can only be appended), I'd prefer the environment variable approach.
JSON is being used as de facto standard format for structured logging nowadays.
I'd like to propose a format following this example:
Each log message is a JSON string. Line breaks are shown only for readability; server will emit the entire log record in one line.
More fields can be added as needed without breaking existing log processing.
The
schema
field is used to indicate the version of the structured logging format (helpful if breaking changes need to be introduced).All other fields are just a different representation of the existing log message structure described above.
lonlats
andnogos
arrays contain the points in the same order as in the request query string.The exact format would be described in a JSON schema file.
The text was updated successfully, but these errors were encountered: