-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out the server logging situation #89
Comments
DigitalOcean itself doesn't store any access logs AFAICT, or at least I can't find anything in their dashboard. For nginx (marquee) I thought we didn't have any logs at all since I never included it in any configuration file and in fact removed it from examples, but it seems like there's default logging. It includes IP, date, UA string and the HTTP command, like "GET / HTTP/2.0", For Apache (multicol) we also get access and error logs. For nginx on noembed (the node server) we also get the nginx logs. @domenic, is there any extra logging at the node level? |
No access logs. pm2 maintains error logs and when-did-we-restart-the-server logs. |
The Steering Group is working on a privacy policy for WHATWG.org. It would be really useful to know in more detail what the various web servers log, and how long that info is retained. Can anyone provide samples of the logs for the various web servers? If posting that publicly is not good, then privately emailing it to me would be ok, and I'll show it to the folks drafting the privacy policy. Also information about how long logs are retained. |
For blog.whatwg.org and wiki.whatwg.org, it looks like the default Debian Apache2 logging is being used. That seems to amount to 14 days of log files. Logs older than 14 days are removed. So as of today (March 20), the oldest log file is for March 7. The logs are just in the standard Apache log format: IP, date, HTTP request method and URL path, HTTP response code, UA string. For all the other domains, it looks like the default Debian nginx logging is being used. As with Apache, that seems to amount to 14 days of log files. Logs older than 14 days are removed. So as of today (March 20), the oldest log file is for March 7. The logs are in the same format as Apache logs: IP, date, HTTP request method and URL path, HTTP response code, UA string. |
For whatpr.org https://aws.amazon.com/compliance/data-privacy-faq/ might help (though quickly skimming I couldn't find the information we're looking for), though also note we currently do not have access ourselves. @tobie still has the keys for the backing S3 instance. |
How about we take that as an opportunity to transfer the AWS account? FYI: PR Preview runs on Heroku and logs a number of things on https://papertrailapp.com/. I think those logs are retained for a week only. |
I can confirm what @sideshowbarker says for marquee, which serves whatwg.org itself and all specs, everything static really. The oldest current log entry is March 8. Here's a sample of the access logs with IPs changed:
|
@othermaciej are there other logs you would like samples of as well? They'd all be very similar to this. |
I think that sample is sufficient to cover all the Apache and nginx servers. I'll share that sample, a summary of what it contains, and the 14-day retention window with the people drafting the privacy policy. It sounds like the only remaining case where we don't have a definitive answer yet is whatpr.org. |
Until #75 is fixed there's also lists.whatwg.org, which isn't really accessible due to HSTS and is still on DreamHost. |
I hope this helps. |
And here's what those PR Preview logs look like:
And it turns out I can't successfully spell "successfully" in a log. |
Is there anything left to do here, should the answer be documented and kept up-to-date somewhere, or was this a one-time audit? |
This has now been figured out and is covered by https://whatwg.org/privacy-policy. |
It'd be good to know the facts on what DigitalOcean ends up storing from visitors and what we store ourselves (presumably whatever Nginx and Apache default to).
And Amazon S3 for whatpr.org.
The text was updated successfully, but these errors were encountered: