-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track (anonymized) counts of repos using Rover #313
Comments
Do you think this should replace the current working directory hash that we currently send, or be an additional piece of data? |
I think an additional one would be good. (lazy question) Do we document the telemetry we send, so people can feel confident we're not trying to be sneaky? Might be worth rationalizing the purpose of working directory as part of this, as it's not as useful but may end up being useful to de-dupe telemetry, for instance. |
Sounds good to me! Shouldn't be a heavy lift.
|
hey @ndintenfass - should we just special case |
This comment has been minimized.
This comment has been minimized.
@EverlastingBugstopper yes, I think we can special-case |
Our telemetry today sends anonymized usage info, so we can track which commands are used, with an opaque ID for each install of Rover. However, given that Rover often runs in CI environments this isn't an accurate representation of how many projects are using Rover. This is useful for Apollo to understand because we can then track adoption rates and see how often Rover is used.
The proposed addition is to create an anonymized hash of the URL for the repo as part of the telemetry payload, storing this data in our data warehouse associated with each invocation.
A working assumption is that we could use a hash of the
origin
URL of the git repo local to the invocation ofrover
. We'll likely want to add some kind of extra characters to it to avoid being able to use brute force approaches to identify the specific repo. One concern raised is that some CI systems generate specialorigin
URLs that may contain credentials, though if we're creating an opaque hash we should be safe to use even such URLs.We presume this won't be an exact science, in that sometimes the same git repo will end up having differently shaped
origin
URLs (and, if we can think of a better way to recognize when a given repo is the same repo even when it's running in many CI runs and on many developers' local environments that would be fine too). Part of the design needed here is to vet that our approach will be a reasonably good approximation, not a flawless enumeration.Per our existing telemetry, users would be able to opt out by not sending any telemetry.
The result of this work should be that we can reason about how many distinct projects are using Rover, even if the invocations for that project are taking place both locally on many developers' machines and in various automated pipelines. As a side effect, we should also be able to make reasonable estimations of how many devs are using rover locally per project because we'll have both the anonymized ID, an indication of whether the invocation is in CI, and the anonymized representation of the repo.
The text was updated successfully, but these errors were encountered: