-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #1507] BI Tool ADR #1677
Conversation
- ✅ Ability to connect to common data sources (S3, Redshift, Postgres) - [Metabase supports common data sources](https://www.metabase.com/data_sources/) | ||
- ✅ Allows technical users to create ad hoc queries to create graphs - [Metabase supports creating a variety of visual types](https://www.metabase.com/learn/visualization/) | ||
- ✅ Easy-to-use UI for non-coders - Subjectively, the Metabase UI was found to be easy to use. | ||
- ✅ Replicable for users outside of the project - Metabase is open-source and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there more you wanted to add here?
It seems like metabase will allow for our open source contributors to replicate? That’s great that we can do that with metabase. I’m curious what that looks like. For example, would users outside of the project getting access slow down the querying at all? Do we need to set anything up to protect that from happening? You don’t necessarily have to add it to the ADR, but I’m just curious about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there more you wanted to add here?
Yes I just finished the sentence 👍🏼
I’m curious what that looks like.
For the record, I don't think we should invest in doing this at all. It seems like a non-ideal use of time. Specifically, the time cost to setup / accidental risk of data leakage -versus- benefit is poor.
That said, if we were to support people outside the project replicating our Metabase data, we shouldn't do it via giving them direct access to the database. That creates a massive security risk. We should do it via giving them a pre-cleaned database dump of our analytics SQL database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mental model I'm working with is Tableau. You can share the underlying data and allow users duplicate a visualization. Can you say more about what we're trying to achieve by giving external users access to a copy of our analytics database? Do we want external users to be able to:
- duplicate the viz,
- use the viz's data source to make their own unique vizzes, share, collaborate, discuss etc
- use all data sources (index viz and others, including for instance, data we haven't visualized yet) to make their own unique vizzes, share, collaborate, discuss etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebuwa-evbuoma-fike it seems you may have missed the comment I made to Sumi about this same subject? It's here: #1677 (comment)
I personally don't think we should be investing time in allowing external users to duplicate our visualizations. I was not the one who wrote that requirement, so I don't understand its motivations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for linking the thread with @sumiat , @coilysiren.
Duplicating visualizations is a very common practice - it encourages users explore, learn, critique and collaborate, which is germane to the open-source community we're trying to build. I do agree that it does not need to be the primary motivation of a BI tool. Granting users access to an approved copy of the data to explore (and ?share) seems more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebuwa-evbuoma-fike I can add some more context to this question:
Can you say more about what we're trying to achieve by giving external users access to a copy of our analytics database? Do we want external users to be able to:
- duplicate the viz,
- use the viz's data source to make their own unique vizzes, share, collaborate, discuss etc
- use all data sources (index viz and others, including for instance, data we haven't visualized yet) to make their own unique vizzes, share, collaborate, discuss etc
Yeah these use cases definitely touch on the core goals related to this!
To provide an example of why an open source tool paired with open data is super helpful is that the current grants.gov project uses Tableau, and right now there isn't an way for other project collaborators to understand how some of the current metrics are calculated, and the project has a fixed set of licenses for Tableau making it difficult give access even to internal maintainers.
Using an open source tool and making the underlying datasets available via API or csv files in a public S3 bucket would enable us to say to open source contributors:
- Access or download this data via API/S3
- Run this command to start up the same BI tool we use via Docker
- Run these queries (ideally committed to our repo) to reproduce the same charts/reports
- Experiment with some new metrics or analysis and then open up a PR with your SQL queries to reproduce them.
- ✅ Easy-to-use UI for non-coders - Subjectively, the Metabase UI was found to be easy to use. | ||
- ✅ Replicable for users outside of the project - Metabase is open-source and could be replicated by people outside the project by giving them access to a copy of our analytics database. | ||
- Cost of ownership - The cost of running Metabase is the cost of running an appropriately sized AWS Fargate task 24/7. That cost works out to about ~$100/month. | ||
- ✅ Ease of deployment - [Metabase provides an official docker image that we can run on AWS ECS](https://www.metabase.com/docs/latest/installation-and-operation/running-metabase-on-docker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest mentioning that it's not a managed service, so we would need to manage deployment (upgrades, security patches, etc.)
In comparison AWS manages those aspects for QuickSight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since its dockerized, I would recommend we use the container as-is, without trying to apply security patches to it.
That means we just need to manage upgrades to the image version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated: 9290916
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
Fixes #1507