Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server side analytics #4131

Merged
merged 11 commits into from
Jun 7, 2018

Conversation

davidfischer
Copy link
Contributor

@davidfischer davidfischer commented May 23, 2018

This PR sends an event to Google Analytics from the server side when the Footer API is called.

This is one of our identified possible alternatives to client side GA. Sending data from the server side has many advantages such as being able to anonymize data before it is sent and not having clients directly connect to a site they may not want to. It comes with the downside of needing to have many web servers to handle all our analytics calls.

I am seeking feedback on the structure especially around the structure in terms of using signals/hooks to find when footer calls are made and whether this seems like a sane approach.

@davidfischer
Copy link
Contributor Author

I'm expecting a lint error here around the naming of the log variable but I believe that means our linting is a bit overzealous.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks great with some gating implemented, unless I'm missing it.

For this use case, I think it might actually be more useful to have a sample that isn't project specific, since our initial goal is to compare ad rates across all projects. However, we don't have a way to easily handle that from the DB, so I don't want scaling to require a deploy to change the % of traffic to send.

@@ -0,0 +1 @@
"""Intentionally blank"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary with the AppConfig.

'ua': request.META.get('HTTP_USER_AGENT'),
'uip': get_client_ip(request),
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this is gated by the feature flag, so it would start sending all footer data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it is not yet.

@davidfischer
Copy link
Contributor Author

For this use case, I think it might actually be more useful to have a sample that isn't project specific

How about sending ad clicks to GA server side rather than client side? That would be hundreds (but not thousands) of clicks per day.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I just mentioned a couple small nits.

'ec': None, # Event category (required)
'ea': None, # Event action (required)
'el': None, # Event label
'ev': None, # Event value (numeric)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be setting these to None by default? It seems like we should be a bit more defensive here, and make sure the incoming event_data contains them.

log = logging.getLogger(__name__) # noqa

# Used to anonymize an IP by zero-ing out the last 2 bytes
MASK = int('0xFFFFFFFFFFFFFFFFFFFFFFFFFFFF0000', 16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just live in the anonymize_ip_address function? Not sure if we need it elsewhere.

'https://www.google-analytics.com/collect',
data=data,
)
except requests.Timeout:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default timeout here? We should probably set it to something really low.

['rtfd._trackEvent', 'Promo', 'Click', data.id]
);
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about leaving this logic in, until we can properly confirm the server-side stuff is working. Not sure how important it is, if it's better to possible double count or to just lose the data.

@davidfischer
Copy link
Contributor Author

I made the updates based on the feedback including making it more obvious what is required in terms of analytics.

@ericholscher ericholscher merged commit 732b37c into readthedocs:master Jun 7, 2018
@ericholscher
Copy link
Member

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants