Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Sending Google Analytics Data for Tenant and Proprietor #910

Closed
12 of 16 tasks
Tracked by #903
jillpe opened this issue Nov 30, 2023 · 17 comments
Closed
12 of 16 tasks
Tracked by #903

Resolve Sending Google Analytics Data for Tenant and Proprietor #910

jillpe opened this issue Nov 30, 2023 · 17 comments

Comments

@jillpe
Copy link

jillpe commented Nov 30, 2023

Summary

(Updated and revised 2023-12-08)

Goal: For each page to send data to Google Analytics; either the proprietor only or the proprietor and tenant (see Use Cases).

We want:

  • The proprietor provided analytics id as well as OAuth credentials for the entire Hyku instance's tenants.
  • Tenant to provide analytics id (but not OAuth credentials)

Use Cases

Use Cases/Acceptance Criteria
Given I'm on a proprietor only page
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
Given I'm on a tenant page
And the tenant has configured their analytics ID
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
And I will see the google analytics section for the tenant's analytics id
Given I'm on a tenant page
And the tenant has NOT configured their analytics ID
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
And I will not see the google analytics section for the tenant's analytics id
Given I'm on a tenant page
And the tenant has configured their analytics ID
And the tenant's analytics ID is the same as the proprietors
When I inspect the source
Then I will see only one google analytics section for that singular analytics id

The above handles not needing to worry too much about the data migration to rollback changes to the AccountSettings.

Tasks

A cautionary note: at present, the tenant's have Analytics ID along with secrets. It is highly likely that the Analytics ID and secrets are duplicated (and are in fact the Proprietor's information).

Breakdown

We want Hyrax::Analytics.config.analytics_id to always be the proprietor analytics id. In doing this, we'll ensure that future work regarding integration with Google Analytics version 4 will be the least disruptive. Further this better aligns with Hyrax assumptions.

At the proprietor-level, we need to be able to provide:

  • Proprietor configuration, as defined in baseline Hyku

    • analytics_id :: via ENV['GOOGLE_ANALYTICS_ID']
    • app_name :: via ENV['GOOGLE_OAUTH_APP_NAME']
    • app_version :: via ENV['GOOGLE_OAUTH_APP_VERSION']
    • privkey_path :: via ENV['GOOGLE_OAUTH_PRIVATE_KEY_PATH']
    • privkey_secret :: via ENV['GOOGLE_OAUTH_PRIVATE_KEY_SECRET']
    • client_email :: via ENV['GOOGLE_OAUTH_CLIENT_EMAIL']
  • Ensure that Pals production deploy has the above environment variables for the proprietor

At the tenant-level we should only expose google_analytics_id.

  • Remove tenant based configuration for the following as currently defined in Pals(note we're keeping google_analytics_id)
    • Remove google_oauth_app_name
    • Remove google_oauth_app_version
    • Remove google_oauth_client_email
    • Remove google_oauth_private_key_path
    • Remove google_oauth_private_key_secret
    • Remove google_oauth_private_key_value

We will almost certainly want to undo most of the work from PR #479; in particular the switching logic for amending the Hyrax.config and Hyrax::Analytics.config:

Acceptance Criteria

  • Individual tenants have the ability to add an analytics code specific to their organization that can be reported to GA
  • On a proprietor page, the HEAD of the HTML document will have the proprietor analytics code
  • On a tenant page, the HEAD of the HTML document will have the proprietor analytics code and, if set, the tenant's analytics code

Testing Instructions

See above "Use Case/Acceptance Criteria" for testing instructions.

User Documentation

The proprietor analytics are configured on the server via Environment variables. To change these requires coordination with SoftServ.

To send data to Google Analytics requires a Google Analytics ID.

To fetch data from Google Analytics requires creating a service account with Google and adding the service account keys to the server.

See Hyku README’s Google Section for details regarding the service accounts.

With credentials set, Hyku will send analytics data to Google. And when the community completes Hyrax Google 4 Analytics, it will be able to pull Google analytics data into the Hyku dashboard for all tenants.

The tenant analytics are set via a tenant administrator in the Configuration > Settings > Account form. Set the “Google Analytics” field to the tenant provided Google Analytics ID (e.g. G-10CHARLONG).

@jillpe jillpe converted this from a draft issue Nov 30, 2023
@jillpe
Copy link
Author

jillpe commented Nov 30, 2023

Is having it report to GA sufficient, or are you wanting the analytics summary to display in Hyrax? Only getting it to report to GA is the easier lift (about a 5). If you are wanting to be able to look at the analytics summary in Hyrax that will be a bigger lift (multiple sprints). I know we'd talked about setting aside additional analytics works for a separate sprint, so the bigger lift may be what we set aside

@aahurford
Copy link
Collaborator

We are ok with getting just the GA working for now; My understanding is that we are waiting for in-app work in Hyrax to be completed

@aahurford
Copy link
Collaborator

to clarify -- We have google analytics for the whole repo currently. We would additionally like our tenants to be able to create and manage their own per-tenant GA

@jeremyf
Copy link
Contributor

jeremyf commented Dec 1, 2023

@aahurford To clarify there is presently a G-3TWN24N1XN on the proprietor (e.g. https://hykucommons.org) and all of the tenants. It sounds like you would like each tenant to be able to add another analytics code that is specific to their organization (e.g. Goshen College has an analytics ID that they use for their website and want to use for their hykucommons.org tenant).

Am I understanding that correctly?

Also, if the tenant adds their GA code, it does not clobber/replace the proprietor GA code for that tenant.

@aahurford
Copy link
Collaborator

@jeremyf Yep! It's my understanding that British Library has something similar implemented. The way we are doing it currently makes it difficult to retrieve stats at the tenant level.

@laritakr laritakr self-assigned this Dec 5, 2023
@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Dec 5, 2023

cc @laritakr not sure if this will help but I captured this screenshot during yesterday's meeting. It may be relevant to this work?. I asked Rob why this project had its own hyrax branch and if we could/should switch them back:

Image

@jeremyf
Copy link
Contributor

jeremyf commented Dec 5, 2023

tl;dr The Hyrax branch is a bit of a red herring.

Given that the request is for a per tenant Google Analytics
And Hyrax has no concept of tenants
Then this is not a problem that different branches/versions of Hyrax would solve.

@jeremyf
Copy link
Contributor

jeremyf commented Dec 5, 2023

In discussion with @laritakr, there's #479 to consider. It appears to implement per tenant google analytics but at the cost of proprietor level analytics.

However, it does inform how we might go about doing this.

@jeremyf
Copy link
Contributor

jeremyf commented Dec 5, 2023

ATTN @aahurford, please read the following to see if it makes sense.

There are two functions for Google Analytics in Hyrax

  1. Sending analytic data to Google, via the google analytics ID. This data is then visible in https://analytics.google.com/analytics/web/
  2. Fetching analytic data from Google, vai the google analytics auth keys. When this is working, the analytics data sent to Google is visible in Hyrax (that is a pending bug fix upstream).

For Issue #910 we are looking to have functionality where:

  1. We send Google analytics data for both the proprietor and the tenant; that requires two separate google analytics IDs.
  2. Eventually, when the fetching function is working, we would like to provide only 1 set of credentials across the whole Hyku system (tenants and prorpietor); as those credentials are a bit more complicate to generate and manage. Something we don’t envison most tenants doing; or frankly needing to do. Especially given that step 1 is to be in place.

In our investigation, we’ve discovered that there is presently a per-tenant configuration of Google Analytics (but no longer a proprietor level configuration). That was brought about by PR#479 addressing Issue #408.

It appears that the Google Analytics ID of the tenants (as well as the google analytics auth keys) are copied to each of the tenants; that is each tenant has the same Google Analytics ID.

The proposed solution is:

  1. Restore proprietor based analytics that sends data to Google and likewise fetches.
  2. Add a tenant setting for tenant_google_analytic_id; that id would only be used for sending analytic data to Google.

The above proposal would then ensure that when GA4-fetch integration is completed, all tenants can fetch data from google (using the proprietor’s provide auth keys). And both proprietor and tenant would have their respective information shown in the Google Analytics. This would alleviate the need for each tenant to provide an OAUth configuration for fetching data.

When a tenant chose not to specify a google analytics ID, then they would not be able to go to https://analytics.google.com/analytics/web/ to see information about their tenant. However, once the fetch integration worked, they would piggyback retrieving information from Google to show in the Hyrax native analytics environment.

Note: This assumes that tenants are not able to opt out of the proprietor sending analytics to Google.

@jeremyf jeremyf added the question Further information is requested label Dec 5, 2023
@laritakr laritakr removed the question Further information is requested label Dec 5, 2023
@jeremyf jeremyf added the question Further information is requested label Dec 5, 2023
@laritakr laritakr added question Further information is requested and removed question Further information is requested labels Dec 5, 2023
@aahurford
Copy link
Collaborator

aahurford commented Dec 6, 2023

I think this plan makes sense, but I wanted to clarify that we are currently gathering data for all tenants in the main account (GA-4 285666830) -- the problem is that there is no easy way to pull data for a single tenant.

We previously explored the option of adding sub accounts under the main one to achieve this but abandoned that plan due to lack of documentation for how exactly to do it. We are now pursuing this direction after hearing from Jenny at British Library that they have something similar set up.

So, it sounds like we are on the same page, but I want to make sure we are pursuing the least complicated path at achieving both overall and tenant level analytics, and however will be of best use to the community.

Best case scenario is that tenant admins can access their stats on their own -- but not entirely necessary if that makes the work terribly more complex. I also want to make sure that British Library's setup is reviewed to avoid reinventing the wheel.

@jeremyf
Copy link
Contributor

jeremyf commented Dec 6, 2023

@aahurford I'm asking @j-basford for some clarification. What I'm seeing is that they have a per tenant analytics configuration; they also send (through a different mechanism) data to Iris (at a proprietor level, but likely with tenant level segmentation).

Which would mean that:

  1. In Google Analytics, the proprietor cannot see traffic of any of the tenants
  2. They likely can see analytics in Iris for all tenants; as the configuration is on a proprietor basis

This would mean then deviate from future Hyrax Google Analytics 4 work; because we'd lose the ability for the proprietor to fetch analytics of all tenants (requiring each tenant to configure OAuth keys to fetch that information into Hyrax).

It is possible that Iris would be suitable, and with the COUNTER metrics work for Pitt, however we introduce a mix of analytics down the road that will require notable refactoring; Hyrax assumes that it's sending analytics to the same place that it will later fetch from.

So, I'm still thinking through this and will bring it to the team for further discussion/consideration.

@jeremyf
Copy link
Contributor

jeremyf commented Dec 6, 2023

From @j-basford:

Hi Jeremy, we have a GA tag that we can look at for everyone's tenant, which I think is in-built into our page setup. then we can also go in the superadmin view and add a tag that each tenant supplies, and they have to look at that themselves.

The BL uses a company called Crafted to do all our GA monitoring and they create us nice dashboards to monitor; they simply added the repositories to the list. I (and Graham and Nora) can only see the repository figures, but the BL web team can see them too, plus all the other BL websites that they monitor. My understanding is that our partner tenants have a similar set up and their web teams give them the tag to use and they monitor it in-house.

Happy to show you and Amanda what it looks like in our GA dashboard if that helps. We do also use IRUS,
Graham is much more au fait with those figures than I am as he tested those and is requesting some tweaks on it with the IRUS-UK team right now

@jeremyf
Copy link
Contributor

jeremyf commented Dec 6, 2023

@aahurford to summarize, the pathway forward is to have two analytics codes on each page. One proprietor and one tenant. However, there are still issues to resolve (see Discussion)

Discussion

This resolves sending the data to Google. The next issue is getting the data out of Google; in particular with a focus on tenant level segmentation. There are two places and two "users" that the data:

The places are:

  1. Google Analytics
  2. Querying Google Analytics to fetch that data into Hyrax; a function that is not yet working.

The users are:

  1. Properietor admins (e.g. who provided the GA token for the proprietor)
  2. Tenant admins (e.g. who provided the GA token for the tenant)

With the above, I'm wondering if you can add hostName to Google Analytics dashboard. I found that while working on the Sushi counter metrics (here's the related code documentation regarding hostName dimension).

The Querying Google Analytics is a known issue for Hyrax; and would be part of Hyrax 5 (with a possible backport to Hyrax 4). Put another way, this is further down the horizon.

The Iris pathway that BL took, is a different consideration as well. Namely, it hooks into another service and sends that data. And we wouldn't be able to back fill that.

Since the proprietor's analytics is used on each tenant; that means the tenant does not have access (at this time) to Google Analytics for their tenant (unless you share the properietor analytics information; and provide a dimenion query with hostName).

Put another way, due to the configuration reality, the best way to ensure we have historic data prior to any changes (which we should make) is to continue with sending properietor analytics; but move that information back to the proprietor level then allow each tenant to add their own analytics. This would mean the historical data is available, but would only be visible to the tenant when one of three things becomes true:

  1. The proprietor Google analytics are shared with/reported to the tenant; this assumes that the hostName dimension can be added.
  2. The aforementioned issue in Hyrax is resolved
  3. We complete the Sushi metrics work to populate that information; the queries and approach for populating Sushi counter-metrics are different from the aforementioned Hyrax issue.
    a. The querying of data from GA for Sushi is done on a date range basis.
    b. The querying of data from GA for Hyrax analytics is done on a work basis.

@laritakr laritakr removed their assignment Dec 6, 2023
@aahurford
Copy link
Collaborator

Thank you for your thoughtful examination of this, Jeremy!

@ShanaLMoore
Copy link
Contributor

TODO: @jeremyf to do a writeup of tasks to accomplish this

@jeremyf jeremyf removed the question Further information is requested label Dec 7, 2023
@kirkkwang kirkkwang self-assigned this Dec 7, 2023
@jeremyf jeremyf changed the title implement GA per tenant with rollup account (like BL) Resolve Sending Google Analytics Data for Tenant and Proprietor Dec 8, 2023
kirkkwang added a commit that referenced this issue Dec 8, 2023
This commit will overrite Hyrax's ga4 partial to handle Hyku's
multitenancy.

Ref:
  - #910
kirkkwang added a commit that referenced this issue Dec 8, 2023
This commit will overrite Hyrax's ga4 partial to handle Hyku's
multitenancy.

Ref:
  - #910
kirkkwang added a commit that referenced this issue Dec 11, 2023
This commit will overwrite Hyrax's ga4 partial to handle Hyku's
multitenancy.

Ref:
  - #910
kirkkwang pushed a commit that referenced this issue Dec 11, 2023
kirkkwang added a commit that referenced this issue Dec 11, 2023
This commit will overwrite Hyrax's ga4 partial to handle Hyku's
multitenancy.

Ref:
  - #910
kirkkwang pushed a commit that referenced this issue Dec 11, 2023
kirkkwang added a commit that referenced this issue Dec 11, 2023
This commit will overwrite Hyrax's ga4 partial to handle Hyku's
multitenancy.

Ref:
  - #910
kirkkwang pushed a commit that referenced this issue Dec 11, 2023
@ShanaLMoore ShanaLMoore moved this from Ready for Development to SoftServ QA in palni-palci Dec 11, 2023
@laritakr
Copy link
Contributor

This does not pass QA. The tenant only showed their own ID, and not the proprietor's ID.

@laritakr laritakr moved this from SoftServ QA to Ready for Development in palni-palci Dec 11, 2023
kirkkwang added a commit that referenced this issue Dec 12, 2023
The `#current_account` method returns a tenant instance when on the
proprietor page (not in a tenant subdomain). This then looks for a
`HYKU_GOOGLE_ANALYTICS_ID` instead of the `GOOGLE_ANALYTICS_ID` env
variable.  We want to only use the `GOOGLE_ANALYTICS_ID`.

Ref:
  - #910
kirkkwang added a commit that referenced this issue Dec 12, 2023
The `#current_account` method returns a tenant instance when on the
proprietor page (not in a tenant subdomain). This then looks for a
`HYKU_GOOGLE_ANALYTICS_ID` instead of the `GOOGLE_ANALYTICS_ID` env
variable.  We want to only use the `GOOGLE_ANALYTICS_ID`.

Ref:
  - #910
kirkkwang added a commit that referenced this issue Dec 12, 2023
This commit removes some Hyrax config trampling done in a previous
commit that was causing GA to some times appear and some times not.

Ref:
  - #910
@laritakr laritakr moved this from Ready for Development to Code Review in palni-palci Dec 12, 2023
laritakr added a commit that referenced this issue Dec 12, 2023
@ShanaLMoore ShanaLMoore moved this from Code Review to Deploy to Staging in palni-palci Dec 12, 2023
@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Dec 12, 2023

QA RESULTS: Pass ✅

cc @ndroark ready for your re-review

Tested on: Staging

Use Cases

1.

Given I'm on a proprietor only page
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id

Image

2.

Given I'm on a tenant page
And the tenant has configured their analytics ID
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
And I will see the google analytics section for the tenant's analytics id

Image

3.

Given I'm on a tenant page
And the tenant has NOT configured their analytics ID
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
And I will not see the google analytics section for the tenant's analytics id

Image

4.

Given I'm on a tenant page
And the tenant has configured their analytics ID
And the tenant's analytics ID is the same as the proprietors
When I inspect the source
Then I will see only one google analytics section for that singular analytics id

Image

@ShanaLMoore ShanaLMoore moved this from Deploy to Staging to PALs QA in palni-palci Dec 12, 2023
@ndroark ndroark moved this from PALs QA to Deploy to Production in palni-palci Dec 12, 2023
@jeremyf jeremyf moved this from Deploy to Production to Client Verification in palni-palci Dec 13, 2023
@ndroark ndroark moved this from Client Verification to Done in palni-palci Jan 5, 2024
@jillpe jillpe closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants