v2 Reports Dashboard (First Draft) #1197

EchoProject · 2022-05-13T01:07:20Z

Overview

We need to create a first draft Reports Dashboard with summary statistics and basic analytics so that we can create an active version using Ploty Dash.

Additional note:
Then we will show users how they can derive insights to create Service Request-based initiatives. Finally, we can get feedback on improving the dashboard feature set.

Action Items

Draft a "wireframe" model using BI Tool
Gather feedback from 311-Data Team
Gather feedback from Data Science Community of practice
Gather feedback from Seymour Liao
Implement actual v2 Dashboard with 1 month Data for sample purposes
Beta-test dashboard with Neighborhood Council

Features

Filter by NC, Districts, Request Types
Summary statistics and visualizations by NC
Comparison statistics and visualiztaions by NC (e.g. request share, time distributions by NC, completed vs. closed requests counts...etc)
Provide background context to the all statistics, visualization and data involve.

Resources/Instructions

Dev Version Dashboard: https://dev.311-data.org/reports/dashboards/overview
[EPIC] Data Science Analytics Dashboard (Reports) H1 2023 #1378

nichhk · 2022-05-13T03:12:37Z

Thanks for working on this Josh!

I think it would be interesting to show the distribution of time-to-close for each request type. E.g., if the distribution is bimodal, this can mean that a certain request type may have two subtypes, where one is easier to fix than the other. It might also capture that issue Bonnie mentioned where certain issues are closed after a while without actually being resolved (?). It might also be informative to overlap distributions per NC, to see how quickly issues are resolved in different NCs.

joshuayhwu · 2022-05-13T04:27:43Z

Thanks for the suggetion Nich!

I think adding comparison would be extremely helpful - will put that as a feature to incorporate!

joshuayhwu · 2022-05-20T00:37:17Z

"Wireframe" with Power BI to gather feedback

Neighborhood Council Summary

Filter by name, date-range, and request type
Indicator visuals for total number of requests, average time-to-close requests (days), maximum time-to-close requests (days)
Pie chart for share of request type
Distribution of request throughout the day by hour

Data by Police Precinct

Filter by police precinct and request create date
Indicator visual for total number of requests, average time-to-close requests (days)
Bar chart for request sources
Request type frequency

joshuayhwu · 2022-05-20T00:46:54Z

Feedback from 311-Data Team

Incorporate distribution of time-to-close for each request type
Flag requests that have absurd time-to-close (Ignored Requests?)
Include page / tabs with summary and comparison between NC

joshuayhwu · 2022-05-20T00:54:02Z

Feedback from DS Community of Practice

Incoporate text-based descriptions (i.e. context) for each statistic
As pandas won't scale, use dask instead
Use monthly data only to prototype how dashboard would look like

joshuayhwu · 2022-05-20T01:01:10Z

Plotly MVP Dashboard v1.1

Filter by NC
Time series of 311 Request overtime
Pie chart for share of request type
Histogram for distribution of request time-to-close

Filter by NCs (Side-by-side comparison)
Indicator visual for total number of requests and ignored requests (requests completed < 1 day)
Bar chart for Number of requests by sources
Overlaid Time series for time-to-close (unfinished)

Todo:
High Priority:

Add exclusion filter (exclude particular filter types)
Try to add filter that excludes data with data quality issues
Cleaning up the User interface of the plotly dashboard

Low Priority

Add Request Type filter
Adding more Divs to make sure visuals are not as stretched horizontally
Add external style sheet to style titles and other html components
Complete overlay time series for time-to-close, or work on alternative visual
Incorporate text / annotation of visual to provide context
Consult 311-Data team and Data Science CoP for additional features / comments, revise for new version

nichhk · 2022-05-23T20:08:13Z

The team took a look at Josh's updates on Thursday, here's what I remember discussing for the record:

being able to select arbitrary sets of request types, like you can on the site, would be useful (e.g., "bulky items" is almost half of the data, but "bulky items" is generally not a quality-of-life issue, so NC members might want to filter those out)
"Ignored Requests" might not be worth surfacing to users. It's hard to find an accurate name for this; "ignored" kind of suggests that the city just ignored these requests, and I can't really think of anything better. It might be better to just have a small question mark on the "Total number of requests" box where we can show this info to very curious users.
There are two dimensions in which we can analyze this data: 1) use it to understand quality-of-life issues in different NCs; 2) use it to identify issues in bookkeeping and data management by the city teams that are handling the requests. For 1, it's not particularly useful to see requests that have data quality issues (i.e., time-to-close is super short or super long). For 2, it is. So we can implement a toggle that filters out requests with data quality issues (it would default to "on").

joshuayhwu · 2022-05-27T00:11:19Z

Plotly MVP Dashboard v1.2

Summary Dashboard

Visuals:

Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request
Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council
Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

Selecting individual neighbhood council
Removing particular request types
Data Quality Toggle to filter data with quality issues (where the time to close is less than 1 day or longer than 100 days)

Comparison Dashboard

Visuals:

Indicator Visuals: Total number of requests and the number of days of the data available
Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through
Line chart: total number of 311 request comparison

Changes from before:

Added exclusion filter that achieves the following functionalities:

Remove one or more request types for the summary dashboard
Exclusion filter request type options dependent on the NC selected, otherwise, assume all request types in entire dataset
Exclusion filter update will "freeze" with the last remaining request type - i.e. dashboard prevents user from removing all request type as display on dashboards

UI

Chose 'Open Sans' as default font to stay consistent with plotly visuals
Adjust font size to accomodate text to div ratio
Added spaces between different dvs

Comparison plots

Added overlapping line charts for the number of requests throughout the day

Data Quality Toggle

Added data quality toggle to filter out data that are considered "bad" (request timeToClose less than 1 day or longer than 100 days)

nichhk · 2022-06-02T05:21:01Z

Thanks for these updates Josh!

Re: remove one or more request types: I think it might be more intuitive to make this the opposite, i.e., select one or more request types. This will better align with the map functionality as well.

Re: Data Quality Toggle: This looks super useful! May I ask how you chose the thresholds for "bad"? This might be a situation where we might have to combine some statistical analysis and also get input from City folks.

In terms of statistical analysis, I think there are several ways to detect outliers. One way is applying something like a z-score range.

But I think we also need help from the City to understand what acceptable timeToCloses are. It might be perfectly ok for a timeToClose to be like, 10min, for example, if it's a duplicate of another request.

joshuayhwu · 2022-06-02T16:57:28Z

Thanks for the feedback Nich!

I have implemented the selection by requests type functionality, but unfortunately I discover another bug. In order to implement the dependent drop-down (the type dropdown only shows the types available in a particualr NC), he visuals wouldn't update when only the NC drop down is selected. Surprisingly, visuals are updated when only type dropdown is selected. This part is still under investigation.

EDIT: fixed this bug - it was sloppy logic on my part. But another minimum series length occurred
EDIT2: Figure out what was happening. Essentially my filtering logic tries to filter some rows by selection, for in some cases, the filtering mechanism removes all rows from a dataset, causing error to show as I didn't specify what should happen to the visuals when there is no data.
EDIT3: Raise PreventUpdate() exception

In terms of data quality, I essentially eye balled the value based on the visualization. I have now defined outliers by first using a log-transform and taking the median +- 1.5*IQR since the data is skewed (before and after log transform).

Raw Distribution

After removing outliers based on rule above

I agree we need to talk to City if possible. One thing I notice is that there are some rows with missing createDate / closeDate, causing timeToClose to be empty (which I replace with 0). There are some rows that have negative timeToClose, which are definitely data quality issues we need to investigate and constraint upstream.

joshuayhwu · 2022-06-09T07:44:08Z

@ExperimentsInHonesty thanks for clarifying the context for the dashboards MVP last week. Would love some feedback from you on this version. Please note the following:

Plotly dash doesn't support pre-defined groups in the dropdown lists (i,.e. cannot select 1 region, but must select individual NCs in the region). It is possible to select multiple NC at the same time, but for now I'm keeping things simple.
Descriptions on the visualization will be a later feature. Visualizations are designed to be as simple as possible and I don't want to assume data illteracy
You mentioned about some requests being close ridiculously early (i.e. less than 10 minutes or a day). Nich and I discussed this issue and we thought it is best to treat it as data quality issue rather than instantly flag as problematic - we need to talk to the people generating this data prior to making a conclusion.

NC Summary

Visuals:

Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request
Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council
Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

Selecting individual neighbhood council
Selecting one or more request types
Data Quality Toggle to filter data with quality issues (where the time to close is not a outlier)

NC Comparison

Visuals:

Indicator Visuals: Total number of requests and the number of days of the data available
Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through
Line chart: total number of 311 request comparison

Features:

Compare the total number of requests and date range between NCs
Compare how individuals make 311 request between the two NCs
Compare the number of requests throughout the day for both NCs

joshuayhwu · 2022-06-12T04:35:48Z

See my public repo for integrated dashboard file and instructions that could be run locally. I only used new version of dash and docker.

Currently looking to integrate the code into 311 Data Code Base. There are some issues with the newer versions of Dash / gunicorn interface that none of the callback functions work with the newer version of dash / gunicorn / docker interaction. i.e. no response with any interaction on dashboards. Will try to figure this out in the next few days.

nichhk · 2022-06-14T20:05:18Z

Thanks Josh! In your repo, can you put in the unzipped files instead of the zip so that people can browse the code without downloading? Let me know if you need help with debugging the interaction issue.

joshuayhwu · 2022-06-15T00:32:24Z

Thanks, I put in the unzipped files in the public repo. Would appreciate some help whenever you're available, but I'll continue working on it and see if I could replicate error.

EDIT: Turns out the problem resolves just by adding flask. Seems like Gunicorn does not go well with Dash

joshuayhwu · 2022-06-20T02:37:33Z

I have summarize some of the feedbacks I received for the current version of the dashboard:

Regarding data quality issue, Bonnie had a wonderful insight for implementing one possible decision rule. When the requestSource is driver self report, and the time-to-Close of such request is 0, then it is likely the driver simply close the request instantly, then proceed to work on the request (or not). This could be one potential decision rule that we implement
The current color of the Plotly Dash dashboards are unfavorable for neighborhood council in formal publications (e.g. newsletter). Will use the default Dash colors for the plotly dashboards from now on
The current plotly dashboards doesn't take into account how our end user will utilize the dashboard, i.e. downloading the individual visualization and printing the dashboard page as a whole. Ideally, each visualization should have title, corespnding axes label, and correct scale. Each dashboard should also be optimized to the "printed" layout.
Will confirm the following again in next meeting: consistent with Nich's comments on combining dashboard, I propose to combine the recent dashboards with the overall dashboards. More specifically, the neighborhod dashboard could be combined with neighborhood_recent, overview dashboard and recent could be deprecated due to redundancy with current prototype, types_map could be combine with other dashboard (perhaps the one Piero is working on?), and this current prototype will be the final one. Meaning there would be 3 dashboards in total: neighborhood, current prototype, and types_map.

joshuayhwu · 2022-07-03T20:42:05Z

Updated Overview Dashboard Pt 1

Updated Overview Dashboard Pt 2

EchoProject added Role: Data Science Data management, loading, or analysis Size: 8pt Can be done in 31-48 hours P-feature: Reports P-feature: Analytics labels May 13, 2022

EchoProject added this to the v2.1 Launch milestone May 13, 2022

EchoProject assigned joshuayhwu May 13, 2022

EchoProject mentioned this issue May 20, 2022

311: PM Team Meeting Agenda and Notes #1136

Closed

3 tasks

joshuayhwu linked a pull request Jul 22, 2022 that will close this issue

Update plotly dash overview dashboards (combined old dashboards) #1288

Merged

4 tasks

nichhk mentioned this issue Jul 22, 2022

Update plotly dash overview dashboards (combined old dashboards) #1288

Merged

4 tasks

joshuayhwu mentioned this issue Jul 24, 2022

Add plotly dash NC summary & comparison dashboard #1295

Merged

4 tasks

joshuayhwu closed this as completed in #1288 Aug 4, 2022

joshuayhwu mentioned this issue Oct 9, 2022

[EPIC] Data Science Analytics Dashboard (Reports) H1 2023 #1378

Closed

6 tasks

ExperimentsInHonesty added this to P: 311: Project Board Jun 7, 2024

ExperimentsInHonesty moved this to Done (without merge) in P: 311: Project Board Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2 Reports Dashboard (First Draft) #1197

v2 Reports Dashboard (First Draft) #1197

EchoProject commented May 13, 2022 •

edited by joshuayhwu

Loading

nichhk commented May 13, 2022

joshuayhwu commented May 13, 2022

joshuayhwu commented May 20, 2022

joshuayhwu commented May 20, 2022 •

edited

Loading

joshuayhwu commented May 20, 2022

joshuayhwu commented May 20, 2022 •

edited

Loading

nichhk commented May 23, 2022

joshuayhwu commented May 27, 2022 •

edited

Loading

nichhk commented Jun 2, 2022

joshuayhwu commented Jun 2, 2022 •

edited

Loading

joshuayhwu commented Jun 9, 2022

joshuayhwu commented Jun 12, 2022 •

edited

Loading

nichhk commented Jun 14, 2022

joshuayhwu commented Jun 15, 2022 •

edited

Loading

joshuayhwu commented Jun 20, 2022 •

edited

Loading

joshuayhwu commented Jul 3, 2022

v2 Reports Dashboard (First Draft) #1197

v2 Reports Dashboard (First Draft) #1197

Comments

EchoProject commented May 13, 2022 • edited by joshuayhwu Loading

Overview

Action Items

Features

Resources/Instructions

nichhk commented May 13, 2022

joshuayhwu commented May 13, 2022

joshuayhwu commented May 20, 2022

joshuayhwu commented May 20, 2022 • edited Loading

joshuayhwu commented May 20, 2022

joshuayhwu commented May 20, 2022 • edited Loading

nichhk commented May 23, 2022

joshuayhwu commented May 27, 2022 • edited Loading

nichhk commented Jun 2, 2022

joshuayhwu commented Jun 2, 2022 • edited Loading

joshuayhwu commented Jun 9, 2022

joshuayhwu commented Jun 12, 2022 • edited Loading

nichhk commented Jun 14, 2022

joshuayhwu commented Jun 15, 2022 • edited Loading

joshuayhwu commented Jun 20, 2022 • edited Loading

joshuayhwu commented Jul 3, 2022

Updated Overview Dashboard Pt 1

Updated Overview Dashboard Pt 2

EchoProject commented May 13, 2022 •

edited by joshuayhwu

Loading

joshuayhwu commented May 20, 2022 •

edited

Loading

joshuayhwu commented May 20, 2022 •

edited

Loading

joshuayhwu commented May 27, 2022 •

edited

Loading

joshuayhwu commented Jun 2, 2022 •

edited

Loading

joshuayhwu commented Jun 12, 2022 •

edited

Loading

joshuayhwu commented Jun 15, 2022 •

edited

Loading

joshuayhwu commented Jun 20, 2022 •

edited

Loading