Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #1507] BI Tool ADR #1677

Merged
merged 7 commits into from
Apr 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions documentation/decisions/adr/2024-04-10-dashboard-tool.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mental model I'm working with is Tableau. You can share the underlying data and allow users duplicate a visualization. Can you say more about what we're trying to achieve by giving external users access to a copy of our analytics database? Do we want external users to be able to:

  • duplicate the viz,
  • use the viz's data source to make their own unique vizzes, share, collaborate, discuss etc
  • use all data sources (index viz and others, including for instance, data we haven't visualized yet) to make their own unique vizzes, share, collaborate, discuss etc

Copy link
Collaborator Author

@coilysiren coilysiren Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebuwa-evbuoma-fike it seems you may have missed the comment I made to Sumi about this same subject? It's here: #1677 (comment)

I personally don't think we should be investing time in allowing external users to duplicate our visualizations. I was not the one who wrote that requirement, so I don't understand its motivations.

@acouch or @widal001 might know more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking the thread with @sumiat , @coilysiren.

Duplicating visualizations is a very common practice - it encourages users explore, learn, critique and collaborate, which is germane to the open-source community we're trying to build. I do agree that it does not need to be the primary motivation of a BI tool. Granting users access to an approved copy of the data to explore (and ?share) seems more important.

Copy link
Collaborator

@widal001 widal001 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebuwa-evbuoma-fike I can add some more context to this question:

Can you say more about what we're trying to achieve by giving external users access to a copy of our analytics database? Do we want external users to be able to:

  • duplicate the viz,
  • use the viz's data source to make their own unique vizzes, share, collaborate, discuss etc
  • use all data sources (index viz and others, including for instance, data we haven't visualized yet) to make their own unique vizzes, share, collaborate, discuss etc

Yeah these use cases definitely touch on the core goals related to this!

To provide an example of why an open source tool paired with open data is super helpful is that the current grants.gov project uses Tableau, and right now there isn't an way for other project collaborators to understand how some of the current metrics are calculated, and the project has a fixed set of licenses for Tableau making it difficult give access even to internal maintainers.

Using an open source tool and making the underlying datasets available via API or csv files in a public S3 bucket would enable us to say to open source contributors:

  1. Access or download this data via API/S3
  2. Run this command to start up the same BI tool we use via Docker
  3. Run these queries (ideally committed to our repo) to reproduce the same charts/reports
  4. Experiment with some new metrics or analysis and then open up a PR with your SQL queries to reproduce them.

Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Dashboard Data Storage

- **Status:** Active
- **Last Modified:** 2024-04-10
- **Related Issue:** [#1507](https://github.com/HHS/simpler-grants-gov/issues/1507)
- **Deciders:** Aaron Couch, Billy Daly

## Context and Problem Statement

We are looking to implement a BI (Business Intelligence) tool for Simpler. The BI tool will be the centerpiece of our "Delivery Dashboard" work.
A BI tool is software designed to analyze, process, and visualize large volumes of data to help organizations make informed decisions. These tools gather data from various sources, including databases, spreadsheets, and cloud services, and transform it into actionable insights through reports, dashboards, and interactive visualizations. BI tools often include features such as data querying, data mining, statistical analysis, and predictive modeling to uncover trends, patterns, and correlations within the data.

Adopting a BI tool will be instrumental in optimizing decision-making processes and enhancing our delivery practices. BI tools enable agencies to analyze vast amounts of data efficiently, helping to identify trends, patterns, and areas for improvement. By harnessing the power of BI, we can improve resource allocation, monitor program effectiveness, and ensure transparency and accountability in our operations. Furthermore, BI tools facilitate evidence-based decision making by providing us with timely and accurate insights into our needs and trends. Leveraging BI will empower Simpler to better serve citizens, drive efficiencies, and achieve our project goals.

## Desired Solution

We will evaluate the BI tool based on the following capabilities and attributes:

- Ability to share public dashboards
- Ability to show private dashboards to selected users
- Ability to connect to common data sources (S3, Redshift, Postgres)
- Allows technical users to create ad hoc queries to create graphs
- Easy-to-use UI for non-coders
- Replicable for users outside of the project
- Cost of ownership
- Ease of deployment
- Simple account creation

## Solution Options

The possible solution space here is quite large, but we have narrowed it down to 5 to options total, only 2 of which are evaluated in this ADR. Only 2 options were thoroughly evaluated in the interest of time. The 5 total options we evaluated are listed below.

- AWS QuickSight - evaluated below
- Metabase - evaluated below
- Tableau
- Redash
- Apache Superset

### AWS QuickSight

> AWS QuickSight is a cloud-based Business Intelligence (BI) service provided by Amazon Web Services (AWS). It enables users to easily create and share interactive dashboards and visualizations from various data sources, including AWS services, databases, and third-party applications. QuickSight offers features such as ad-hoc analysis, machine learning-powered insights, and seamless integration with AWS services like Amazon Redshift, Amazon RDS, and Amazon S3. It provides users with the ability to explore data through drag-and-drop interfaces, create custom visualizations, and perform advanced analytics without requiring extensive technical expertise. With pay-as-you-go pricing and scalability, QuickSight offers an accessible and cost-effective solution for organizations looking to harness the power of BI in the cloud.

Here's how QuickSight evaluates against our criteria:

- ✅ Ability to share public dashboards - [AWS QuickSight supports public dashboards](https://docs.aws.amazon.com/quicksight/latest/user/embedded-analytics-1-click-public.html)
- ✅ Ability to show private dashboards to selected users - [AWS QuickSight supports access controlled dashboards](https://docs.aws.amazon.com/quicksight/latest/user/sharing-a-dashboard.html)
- ✅ Ability to connect to common data sources (S3, Redshift, Postgres) - [AWS QuickSight supports common data sources](https://docs.aws.amazon.com/quicksight/latest/user/supported-data-sources.html)
- ✅ Allows technical users to create ad hoc queries to create graphs - [AWS QuickSight supports creating a variety of visual types](https://docs.aws.amazon.com/quicksight/latest/user/working-with-visual-types.html)
- ✅ Easy-to-use UI for non-coders - Subjectively, the AWS QuickSight UI was found to be easy to use.
- ❌ Replicable for users outside of the project - AWS QuickSight is not open source, so its results can only replicated by having access to our AWS account
- Cost of ownership - A rough estimate puts AWS QuickSight at about ~$300/month for our quantity of users. [Pricing page.](https://aws.amazon.com/quicksight/pricing/)
- ✅✅ Ease of deployment - [AWS QuickSight can be deploy via Terraform](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/quicksight_account_subscription). The entire deployment would be AWS managed, we do not need to manage the deployment in any way.
- ❌ Simple account creation - [AWS QuickSight users must be deployed via Terraform or the AWS console. These users require an associated IAM user to be created.](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/quicksight_user)

### Metabase

> Metabase is an open-source Business Intelligence (BI) tool that enables users to easily query, visualize, and share insights from their data. It offers a user-friendly interface that allows users to create and customize dashboards and visualizations without the need for advanced technical skills. Metabase supports various data sources, including SQL databases like MySQL, PostgreSQL, and MongoDB, as well as cloud services like Google BigQuery and Amazon Redshift. With features such as SQL querying, interactive dashboards, and natural language querying, Metabase empowers users to explore and understand their data in a flexible and intuitive way. Additionally, being open-source, Metabase allows for community contributions and customization, making it a popular choice for organizations seeking a cost-effective and customizable BI solution.

Here's how Metabase evaluates against our criteria:

- ✅ Ability to share public dashboards - [Metabase supports public dashboards](https://www.metabase.com/docs/latest/questions/sharing/public-links)
- ✅ Ability to show private dashboards to selected users - [Metabase supports access controlled dashboards](https://www.metabase.com/learn/administration/guide-to-sharing-data)
- ✅ Ability to connect to common data sources (S3, Redshift, Postgres) - [Metabase supports common data sources](https://www.metabase.com/data_sources/)
- ✅ Allows technical users to create ad hoc queries to create graphs - [Metabase supports creating a variety of visual types](https://www.metabase.com/learn/visualization/)
- ✅ Easy-to-use UI for non-coders - Subjectively, the Metabase UI was found to be easy to use.
- ✅ Replicable for users outside of the project - Metabase is open-source and could be replicated by people outside the project by giving them access to a copy of our analytics database.
- Cost of ownership - The cost of running Metabase is the cost of running an appropriately sized AWS Fargate task 24/7. That cost works out to about ~$100/month.
- ✅ Ease of deployment - [Metabase provides an official docker image that we can run on AWS ECS](https://www.metabase.com/docs/latest/installation-and-operation/running-metabase-on-docker). This ECS service would be managed by us, so we would be responsible for managing upgrades to the service.
- ✅ Simple account creation - [Metabase uses simple username and password accounts](https://www.metabase.com/docs/latest/configuring-metabase/setting-up-metabase)

### QuickSight and Metabase compared

Metabase's more conventional account creation, and open-source nature, make it slightly beat out AWS QuickSight. That said, they both unambiguously satisfy the majority of our decision criteria. Either tool would be a good choice to implement.

## Decision

This ADR supports Metabase as our chosen BI tool.

## Links

- [Best BI tools for startups: How to choose a BI tool
](https://www.airops.com/blog/best-bi-tools-for-startups-how-to-choose-a-bi-tool)
- [Metabase vs QuickSight Comparison](https://www.restack.io/docs/metabase-knowledge-metabase-vs-quicksight-comparison)
- [Amazon Quicksight vs Metabase](https://stackshare.io/stackups/amazon-quicksight-vs-metabase)
Loading