Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out the server logging situation #89

Closed
annevk opened this issue Apr 13, 2018 · 14 comments
Closed

Figure out the server logging situation #89

annevk opened this issue Apr 13, 2018 · 14 comments

Comments

@annevk
Copy link
Member

annevk commented Apr 13, 2018

It'd be good to know the facts on what DigitalOcean ends up storing from visitors and what we store ourselves (presumably whatever Nginx and Apache default to).

And Amazon S3 for whatpr.org.

@foolip
Copy link
Member

foolip commented Apr 26, 2018

DigitalOcean itself doesn't store any access logs AFAICT, or at least I can't find anything in their dashboard.

For nginx (marquee) I thought we didn't have any logs at all since I never included it in any configuration file and in fact removed it from examples, but it seems like there's default logging. It includes IP, date, UA string and the HTTP command, like "GET / HTTP/2.0",

For Apache (multicol) we also get access and error logs.

For nginx on noembed (the node server) we also get the nginx logs. @domenic, is there any extra logging at the node level?

@domenic
Copy link
Member

domenic commented Apr 26, 2018

No access logs. pm2 maintains error logs and when-did-we-restart-the-server logs.

@othermaciej
Copy link

The Steering Group is working on a privacy policy for WHATWG.org. It would be really useful to know in more detail what the various web servers log, and how long that info is retained. Can anyone provide samples of the logs for the various web servers? If posting that publicly is not good, then privately emailing it to me would be ok, and I'll show it to the folks drafting the privacy policy. Also information about how long logs are retained.

@sideshowbarker
Copy link
Member

For blog.whatwg.org and wiki.whatwg.org, it looks like the default Debian Apache2 logging is being used. That seems to amount to 14 days of log files. Logs older than 14 days are removed. So as of today (March 20), the oldest log file is for March 7. The logs are just in the standard Apache log format: IP, date, HTTP request method and URL path, HTTP response code, UA string.

For all the other domains, it looks like the default Debian nginx logging is being used. As with Apache, that seems to amount to 14 days of log files. Logs older than 14 days are removed. So as of today (March 20), the oldest log file is for March 7. The logs are in the same format as Apache logs: IP, date, HTTP request method and URL path, HTTP response code, UA string.

@annevk
Copy link
Member Author

annevk commented Mar 20, 2019

For whatpr.org https://aws.amazon.com/compliance/data-privacy-faq/ might help (though quickly skimming I couldn't find the information we're looking for), though also note we currently do not have access ourselves. @tobie still has the keys for the backing S3 instance.

@tobie
Copy link

tobie commented Mar 20, 2019

How about we take that as an opportunity to transfer the AWS account?

FYI: PR Preview runs on Heroku and logs a number of things on https://papertrailapp.com/. I think those logs are retained for a week only.

@foolip
Copy link
Member

foolip commented Mar 22, 2019

I can confirm what @sideshowbarker says for marquee, which serves whatwg.org itself and all specs, everything static really. The oldest current log entry is March 8. Here's a sample of the access logs with IPs changed:

1.2.3.4 - - [08/Mar/2019:06:25:15 +0000] "HEAD /specs/web-apps/current-work/ HTTP/1.1" 301 0 "-" "Java/1.7.0_80"
1.2.3.5 - - [08/Mar/2019:06:25:16 +0000] "GET /standard-shared-with-dev.css HTTP/2.0" 200 2922 "https://encoding.spec.whatwg.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"
1.2.3.6 - - [08/Mar/2019:06:25:16 +0000] "GET /file-issue.js HTTP/2.0" 200 4981 "https://encoding.spec.whatwg.org/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0"

@foolip
Copy link
Member

foolip commented Mar 22, 2019

@othermaciej are there other logs you would like samples of as well? They'd all be very similar to this.

@othermaciej
Copy link

I think that sample is sufficient to cover all the Apache and nginx servers. I'll share that sample, a summary of what it contains, and the 14-day retention window with the people drafting the privacy policy.

It sounds like the only remaining case where we don't have a definitive answer yet is whatpr.org.

@annevk
Copy link
Member Author

annevk commented Mar 22, 2019

Until #75 is fixed there's also lists.whatwg.org, which isn't really accessible due to HSTS and is still on DreamHost.

@tobie
Copy link

tobie commented Mar 24, 2019

  1. PR Preview relies on Heroku (for hosting the application), Papertrail (for logs), and GitHub's API. The application is stateless beyond that.

    1. Papertrail logs:
    2. Heroku:
  2. whatpr.org relies on the following AWS solutions

    1. AWS S3 (two S3 bucket hosted in North Virginia. Logging is disabled on both).
    2. AWS Route 53
    3. AWS CloudFront

I hope this helps.

@tobie
Copy link

tobie commented Mar 24, 2019

And here's what those PR Preview logs look like:

Mar 23 09:34:08 pr-preview heroku/router: at=info method=POST path="/github-hook" host=pr-preview.herokuapp.com request_id=36393b5e-346c-4e99-a5fe-1ff9f703bd56 fwd="192.30.252.39" dyno=web.1 connect=0ms service=3ms status=200 bytes=219 protocol=https 
Mar 23 09:34:08 pr-preview app/web.1: Currently running: [] 
Mar 23 09:34:08 pr-preview app/web.1: Found repo config file { src_file: 'index.bs', 
Mar 23 09:34:08 pr-preview app/web.1:   type: 'bikeshed', 
Mar 23 09:34:08 pr-preview app/web.1:   params:  
Mar 23 09:34:08 pr-preview app/web.1:    { 'md-status': 'LS-COMMIT', 
Mar 23 09:34:08 pr-preview app/web.1:      'md-warning': 'Commit {{ sha }} {{ pull_request.head.repo.html_url }}/commit/{{ sha }} replaced by {{ config.ls_url }}', 
Mar 23 09:34:08 pr-preview app/web.1:      'md-title': '{{ config.title }} (Pull Request Snapshot #{{ pull_request.number }})', 
Mar 23 09:34:08 pr-preview app/web.1:      'md-Text-Macro': 'SNAPSHOT-LINK {{ config.back_to_ls_link }}' }, 
Mar 23 09:34:08 pr-preview app/web.1:   ls_url: 'https://streams.spec.whatwg.org/', 
Mar 23 09:34:08 pr-preview app/web.1:   title: 'Streams Standard', 
Mar 23 09:34:08 pr-preview app/web.1:   back_to_ls_link: '<a href="https://streams.spec.whatwg.org/" id="commit-snapshot-link">Go to the living standard</a>', 
Mar 23 09:34:08 pr-preview app/web.1:   post_processing: { name: 'emu-algify', options: { throwingIndicators: true } } } 
Mar 23 09:34:09 pr-preview app/web.1: s3: Bucket name: whatpr.org. 
Mar 23 09:34:09 pr-preview app/web.1: Fetch: https://api.csswg.org/bikeshed/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fsurma-dump%2Fstreams%2Feafd8637479cad13bb1f3bdec917efc762131b1e%2Findex.bs&md-status=LS-COMMIT&md-warning=Commit%20eafd8637479cad13bb1f3bdec917efc762131b1e%20https%3A%2F%2Fgithub.com%2Fsurma-dump%2Fstreams%2Fcommit%2Feafd8637479cad13bb1f3bdec917efc762131b1e%20replaced%20by%20https%3A%2F%2Fstreams.spec.whatwg.org%2F&md-title=Streams%20Standard%20(Pull%20Request%20Snapshot%20%23999)&md-Text-Macro=SNAPSHOT-LINK%20%3Ca%20href%3D%22https%3A%2F%2Fstreams.spec.whatwg.org%2F%22%20id%3D%22commit-snapshot-link%22%3EGo%20to%20the%20living%20standard%3C%2Fa%3E 
Mar 23 09:34:09 pr-preview app/web.1: s3: Bucket name: whatpr.org. 
Mar 23 09:34:09 pr-preview app/web.1: Fetch: https://api.csswg.org/bikeshed/?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwhatwg%2Fstreams%2Fa7f62107f12d223f093f6bb64a197c7489f25765%2Findex.bs&md-status=LS-COMMIT&md-warning=Commit%20a7f62107f12d223f093f6bb64a197c7489f25765%20https%3A%2F%2Fgithub.com%2Fsurma-dump%2Fstreams%2Fcommit%2Fa7f62107f12d223f093f6bb64a197c7489f25765%20replaced%20by%20https%3A%2F%2Fstreams.spec.whatwg.org%2F&md-title=Streams%20Standard%20(Pull%20Request%20Snapshot%20%23999)&md-Text-Macro=SNAPSHOT-LINK%20%3Ca%20href%3D%22https%3A%2F%2Fstreams.spec.whatwg.org%2F%22%20id%3D%22commit-snapshot-link%22%3EGo%20to%20the%20living%20standard%3C%2Fa%3E 
Mar 23 09:34:22 pr-preview app/web.1: s3: Attempting to cache streams/999/a7f6210.html. 
Mar 23 09:34:30 pr-preview app/web.1: s3: Attempting to cache streams/999.html. 
Mar 23 09:34:30 pr-preview app/web.1: s3: Succesfully cached streams/999/a7f6210.html. 
Mar 23 09:34:30 pr-preview app/web.1: s3: Available at https://whatpr.org/streams/999/a7f6210.html. 
Mar 23 09:34:30 pr-preview app/web.1: s3: Succesfully cached streams/999.html. 
Mar 23 09:34:30 pr-preview app/web.1: s3: Available at https://whatpr.org/streams/999.html. 
Mar 23 09:34:30 pr-preview app/web.1: s3: Bucket name: whatpr.org. 
Mar 23 09:34:30 pr-preview app/web.1: Fetch: https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwhatpr.org%2Fstreams%2F999%2Fa7f6210.html&doc2=https%3A%2F%2Fwhatpr.org%2Fstreams%2F999.html 
Mar 23 09:34:38 pr-preview app/web.1: s3: Attempting to cache streams/999/a7f6210...eafd863.html. 
Mar 23 09:34:38 pr-preview app/web.1: s3: Succesfully cached streams/999/a7f6210...eafd863.html. 
Mar 23 09:34:38 pr-preview app/web.1: s3: Available at https://whatpr.org/streams/999/a7f6210...eafd863.html. 

And it turns out I can't successfully spell "successfully" in a log.

@foolip
Copy link
Member

foolip commented Nov 29, 2019

Is there anything left to do here, should the answer be documented and kept up-to-date somewhere, or was this a one-time audit?

@foolip
Copy link
Member

foolip commented Jun 10, 2020

This has now been figured out and is covered by https://whatwg.org/privacy-policy.

@foolip foolip closed this as completed Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants