Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.js download metrics #794

Closed
mcollina opened this issue Dec 16, 2019 · 14 comments
Closed

Node.js download metrics #794

mcollina opened this issue Dec 16, 2019 · 14 comments

Comments

@mcollina
Copy link
Member

It seems the Node.js download metrics have dropped in December. Have we put a CDN in place in front our server by any chance?

https://nodejs.org/metrics/

If that's the case, our download numbers might not be a good metric anymore and we might want to collect them differently.

@richardlau
Copy link
Member

Have we put a CDN in place in front our server by any chance?

Yes, there have been changes. @rvagg and @jbergstroem can probably describe them better than I can -- @rvagg gave an overview to the Build WG in the 2019-11-19 meeting. nodejs/build#2025 is somewhat related.

@mcollina
Copy link
Member Author

We might want to revamp our metrics data, as it’s useful to make informed decision. Is there a tracking issue?

@MylesBorins
Copy link
Contributor

I was under the impression that we were keeping logs from the CDN level, maybe that wasn't wired up properly?

@rvagg
Copy link
Member

rvagg commented Dec 18, 2019

Ongoing work since we dropped the CDN bypass: nodejs/build#2025, just needs more of my time but I have enough folks nagging me that it's fairly high on the priority list. We have the logs, we have an intermediate plan and the resources to process them (and even most of the scripts and code), it just needs to be all wired up properly. I'm on holiday till after Christmas so unless I'm super bored and that work seems like a good distraction then it's going to have to wait until at least after then.

@mcollina
Copy link
Member Author

Thanks for the update

@mhdawson
Copy link
Member

mhdawson commented Mar 6, 2020

@rvagg wondering if there is an update or if there is something we might get Ash to do in order to help move this forward?

@rvagg
Copy link
Member

rvagg commented Mar 9, 2020

@AshCripps how strong is your Bash, Awk & JS? It would be good to get someone else up to speed on how this stuff works and I currently have most of the pieces running on a separate server I was going to use for metrics calculation so we could isolate this piece of the permissions puzzle—as long as there are no objections to sharing raw log data with Ash—it's really just user data, you can pinpoint usage by some larger companies, and if you put in enough effort you might even be able to isolate smaller users by location and usage patterns, which is why we sanitise it before publishing publicly). The challenge is our timezone overlap so we'd probably be doing knowledge sharing async.

@AshCripps
Copy link
Member

@rvagg I have some experience with Bash & JS, never really used Awk asides the odd stackoverflow copy and paste. What is goal here? Is it just sanatise the input of any sensitive information or just to collate metrics together to form summaries or both?

@sxa
Copy link
Member

sxa commented Mar 18, 2020

@rvagg I use awk quite a lot - is there much of it in the current solution?

@rvagg
Copy link
Member

rvagg commented Mar 19, 2020

@sxa555 there is, but there's also a rewrite to JavaScript for some of the most AWKward code. It's mainly about grokking what's going on. The problem here is a matter of access, we have to be selective with who we get to handle our raw logs and we don't have good partitioning in place, nor a test environment where just anyone can contribute. Would be nice of course, but we're not there.

Old code @ https://github.com/nodejs/build/tree/master/ansible/www-standalone/tools/metrics
Newer code @ https://github.com/nodejs/build/pull/2025/files

It's just not fully wired up, and now the server I setup to run this has a full disk because it's pulling down all logs from GCP storage but it needs to be discarding useless bits, so there's even more work to do on that ...

@mhdawson
Copy link
Member

I don't have any concern with sharing the raw data with Ash or @sxa555 as they both either currently work in the same team as me or have in the past. As I understand it the sensitive data is just the IP address versus any data that would directly identify a person, right?

@AshCripps
Copy link
Member

@rvagg This data keeps cropping up as important so I think we should start a knowledge transfer for this. Our timezones are a little better with latest clock changes but async is probably still best.

@AshCripps
Copy link
Member

Ive updated the top of https://nodejs.org/metrics/ which now points to https://storage.googleapis.com/access-logs-summaries-nodejs/index.html which is a simple website that updates every day with the previous days log so people can now access the metrics stats and do what they wish with them. What missing is backfilling and a redesign of the metrics site if anyone is interested.

@mhdawson
Copy link
Member

I think we can close this out since the data is now available again. @mcollina should I go ahead an close it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants