Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #1507] BI Tool ADR #1677

Merged
merged 7 commits into from
Apr 16, 2024
Merged

[Issue #1507] BI Tool ADR #1677

merged 7 commits into from
Apr 16, 2024

Conversation

coilysiren
Copy link
Collaborator

@coilysiren coilysiren commented Apr 11, 2024

Summary

Fixes #1507

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 11, 2024
@coilysiren coilysiren changed the title analytics ADR analytics tool ADR Apr 11, 2024
@coilysiren coilysiren changed the title analytics tool ADR [Issue #1507] BI Tool ADR Apr 11, 2024
@coilysiren coilysiren marked this pull request as ready for review April 11, 2024 23:49
@coilysiren coilysiren requested review from ebuwa-evbuoma-fike and removed request for andycochran, SammySteiner and sumiat April 11, 2024 23:49
- ✅ Ability to connect to common data sources (S3, Redshift, Postgres) - [Metabase supports common data sources](https://www.metabase.com/data_sources/)
- ✅ Allows technical users to create ad hoc queries to create graphs - [Metabase supports creating a variety of visual types](https://www.metabase.com/learn/visualization/)
- ✅ Easy-to-use UI for non-coders - Subjectively, the Metabase UI was found to be easy to use.
- ✅ Replicable for users outside of the project - Metabase is open-source and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there more you wanted to add here?
It seems like metabase will allow for our open source contributors to replicate? That’s great that we can do that with metabase. I’m curious what that looks like. For example, would users outside of the project getting access slow down the querying at all? Do we need to set anything up to protect that from happening? You don’t necessarily have to add it to the ADR, but I’m just curious about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there more you wanted to add here?

Yes I just finished the sentence 👍🏼

I’m curious what that looks like.

For the record, I don't think we should invest in doing this at all. It seems like a non-ideal use of time. Specifically, the time cost to setup / accidental risk of data leakage -versus- benefit is poor.

That said, if we were to support people outside the project replicating our Metabase data, we shouldn't do it via giving them direct access to the database. That creates a massive security risk. We should do it via giving them a pre-cleaned database dump of our analytics SQL database.

@coilysiren coilysiren requested a review from sumiat April 12, 2024 14:26
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mental model I'm working with is Tableau. You can share the underlying data and allow users duplicate a visualization. Can you say more about what we're trying to achieve by giving external users access to a copy of our analytics database? Do we want external users to be able to:

  • duplicate the viz,
  • use the viz's data source to make their own unique vizzes, share, collaborate, discuss etc
  • use all data sources (index viz and others, including for instance, data we haven't visualized yet) to make their own unique vizzes, share, collaborate, discuss etc

Copy link
Collaborator Author

@coilysiren coilysiren Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebuwa-evbuoma-fike it seems you may have missed the comment I made to Sumi about this same subject? It's here: #1677 (comment)

I personally don't think we should be investing time in allowing external users to duplicate our visualizations. I was not the one who wrote that requirement, so I don't understand its motivations.

@acouch or @widal001 might know more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking the thread with @sumiat , @coilysiren.

Duplicating visualizations is a very common practice - it encourages users explore, learn, critique and collaborate, which is germane to the open-source community we're trying to build. I do agree that it does not need to be the primary motivation of a BI tool. Granting users access to an approved copy of the data to explore (and ?share) seems more important.

Copy link
Collaborator

@widal001 widal001 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebuwa-evbuoma-fike I can add some more context to this question:

Can you say more about what we're trying to achieve by giving external users access to a copy of our analytics database? Do we want external users to be able to:

  • duplicate the viz,
  • use the viz's data source to make their own unique vizzes, share, collaborate, discuss etc
  • use all data sources (index viz and others, including for instance, data we haven't visualized yet) to make their own unique vizzes, share, collaborate, discuss etc

Yeah these use cases definitely touch on the core goals related to this!

To provide an example of why an open source tool paired with open data is super helpful is that the current grants.gov project uses Tableau, and right now there isn't an way for other project collaborators to understand how some of the current metrics are calculated, and the project has a fixed set of licenses for Tableau making it difficult give access even to internal maintainers.

Using an open source tool and making the underlying datasets available via API or csv files in a public S3 bucket would enable us to say to open source contributors:

  1. Access or download this data via API/S3
  2. Run this command to start up the same BI tool we use via Docker
  3. Run these queries (ideally committed to our repo) to reproduce the same charts/reports
  4. Experiment with some new metrics or analysis and then open up a PR with your SQL queries to reproduce them.

- ✅ Easy-to-use UI for non-coders - Subjectively, the Metabase UI was found to be easy to use.
- ✅ Replicable for users outside of the project - Metabase is open-source and could be replicated by people outside the project by giving them access to a copy of our analytics database.
- Cost of ownership - The cost of running Metabase is the cost of running an appropriately sized AWS Fargate task 24/7. That cost works out to about ~$100/month.
- ✅ Ease of deployment - [Metabase provides an official docker image that we can run on AWS ECS](https://www.metabase.com/docs/latest/installation-and-operation/running-metabase-on-docker)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest mentioning that it's not a managed service, so we would need to manage deployment (upgrades, security patches, etc.)

In comparison AWS manages those aspects for QuickSight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its dockerized, I would recommend we use the container as-is, without trying to apply security patches to it.

That means we just need to manage upgrades to the image version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: 9290916

Copy link
Contributor

@sumiat sumiat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left comments, thanks for responding kai! Metabase seems like the right approach for us right now. Others, like @widal001 and @acouch, may have more thoughts though! Thanks for putting this together.

@coilysiren coilysiren merged commit ab1fd97 into main Apr 16, 2024
1 check passed
@coilysiren coilysiren deleted the analytics-adr branch April 16, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADR]: Business Intelligence Tool for Dashboards
6 participants