Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 Reports Dashboard (First Draft) #1197

Closed
3 of 6 tasks
EchoProject opened this issue May 13, 2022 · 16 comments · Fixed by #1288 or #1295
Closed
3 of 6 tasks

v2 Reports Dashboard (First Draft) #1197

EchoProject opened this issue May 13, 2022 · 16 comments · Fixed by #1288 or #1295
Assignees
Labels
P-feature: Analytics P-feature: Reports Role: Data Science Data management, loading, or analysis Size: 8pt Can be done in 31-48 hours

Comments

@EchoProject
Copy link
Contributor

EchoProject commented May 13, 2022

Overview

We need to create a first draft Reports Dashboard with summary statistics and basic analytics so that we can create an active version using Ploty Dash.

Additional note:
Then we will show users how they can derive insights to create Service Request-based initiatives. Finally, we can get feedback on improving the dashboard feature set.

Action Items

  • Draft a "wireframe" model using BI Tool
  • Gather feedback from 311-Data Team
  • Gather feedback from Data Science Community of practice
  • Gather feedback from Seymour Liao
  • Implement actual v2 Dashboard with 1 month Data for sample purposes
  • Beta-test dashboard with Neighborhood Council

Features

  • Filter by NC, Districts, Request Types
  • Summary statistics and visualizations by NC
  • Comparison statistics and visualiztaions by NC (e.g. request share, time distributions by NC, completed vs. closed requests counts...etc)
  • Provide background context to the all statistics, visualization and data involve.

Resources/Instructions

@EchoProject EchoProject added Role: Data Science Data management, loading, or analysis Size: 8pt Can be done in 31-48 hours P-feature: Reports P-feature: Analytics labels May 13, 2022
@EchoProject EchoProject added this to the v2.1 Launch milestone May 13, 2022
@nichhk
Copy link
Member

nichhk commented May 13, 2022

Thanks for working on this Josh!

I think it would be interesting to show the distribution of time-to-close for each request type. E.g., if the distribution is bimodal, this can mean that a certain request type may have two subtypes, where one is easier to fix than the other. It might also capture that issue Bonnie mentioned where certain issues are closed after a while without actually being resolved (?). It might also be informative to overlap distributions per NC, to see how quickly issues are resolved in different NCs.

@joshuayhwu
Copy link
Contributor

Thanks for the suggetion Nich!

I think adding comparison would be extremely helpful - will put that as a feature to incorporate!

@joshuayhwu
Copy link
Contributor

  • "Wireframe" with Power BI to gather feedback
  1. Neighborhood Council Summary
    plotly_mvp1
  • Filter by name, date-range, and request type
  • Indicator visuals for total number of requests, average time-to-close requests (days), maximum time-to-close requests (days)
  • Pie chart for share of request type
  • Distribution of request throughout the day by hour
  1. Data by Police Precinct
    plotly_mvp2
  • Filter by police precinct and request create date
  • Indicator visual for total number of requests, average time-to-close requests (days)
  • Bar chart for request sources
  • Request type frequency

@joshuayhwu
Copy link
Contributor

joshuayhwu commented May 20, 2022

  • Feedback from 311-Data Team
  1. Incorporate distribution of time-to-close for each request type
  2. Flag requests that have absurd time-to-close (Ignored Requests?)
  3. Include page / tabs with summary and comparison between NC

@joshuayhwu
Copy link
Contributor

  • Feedback from DS Community of Practice
  1. Incoporate text-based descriptions (i.e. context) for each statistic
  2. As pandas won't scale, use dask instead
  3. Use monthly data only to prototype how dashboard would look like

@joshuayhwu
Copy link
Contributor

joshuayhwu commented May 20, 2022

  • Plotly MVP Dashboard v1.1

plotly_mvp3

  • Filter by NC
  • Time series of 311 Request overtime
  • Pie chart for share of request type
  • Histogram for distribution of request time-to-close

plotly_mvp4

  • Filter by NCs (Side-by-side comparison)
  • Indicator visual for total number of requests and ignored requests (requests completed < 1 day)
  • Bar chart for Number of requests by sources
  • Overlaid Time series for time-to-close (unfinished)

Todo:
High Priority:

  • Add exclusion filter (exclude particular filter types)
  • Try to add filter that excludes data with data quality issues
  • Cleaning up the User interface of the plotly dashboard

Low Priority

  • Add Request Type filter
  • Adding more Divs to make sure visuals are not as stretched horizontally
  • Add external style sheet to style titles and other html components
  • Complete overlay time series for time-to-close, or work on alternative visual
  • Incorporate text / annotation of visual to provide context
  • Consult 311-Data team and Data Science CoP for additional features / comments, revise for new version

@nichhk
Copy link
Member

nichhk commented May 23, 2022

The team took a look at Josh's updates on Thursday, here's what I remember discussing for the record:

  • being able to select arbitrary sets of request types, like you can on the site, would be useful (e.g., "bulky items" is almost half of the data, but "bulky items" is generally not a quality-of-life issue, so NC members might want to filter those out)
  • "Ignored Requests" might not be worth surfacing to users. It's hard to find an accurate name for this; "ignored" kind of suggests that the city just ignored these requests, and I can't really think of anything better. It might be better to just have a small question mark on the "Total number of requests" box where we can show this info to very curious users.
  • There are two dimensions in which we can analyze this data: 1) use it to understand quality-of-life issues in different NCs; 2) use it to identify issues in bookkeeping and data management by the city teams that are handling the requests. For 1, it's not particularly useful to see requests that have data quality issues (i.e., time-to-close is super short or super long). For 2, it is. So we can implement a toggle that filters out requests with data quality issues (it would default to "on").

@joshuayhwu
Copy link
Contributor

joshuayhwu commented May 27, 2022

Plotly MVP Dashboard v1.2

  • Summary Dashboard
    plotly_mvp_v1 2

Visuals:

  1. Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request
  2. Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council
  3. Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

  1. Selecting individual neighbhood council
  2. Removing particular request types
  3. Data Quality Toggle to filter data with quality issues (where the time to close is less than 1 day or longer than 100 days)
  • Comparison Dashboard
    plotly_mvp_v1 2_2

Visuals:

  1. Indicator Visuals: Total number of requests and the number of days of the data available
  2. Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through
  3. Line chart: total number of 311 request comparison

Changes from before:

  1. Added exclusion filter that achieves the following functionalities:
  • Remove one or more request types for the summary dashboard
  • Exclusion filter request type options dependent on the NC selected, otherwise, assume all request types in entire dataset
  • Exclusion filter update will "freeze" with the last remaining request type - i.e. dashboard prevents user from removing all request type as display on dashboards
  1. UI
  • Chose 'Open Sans' as default font to stay consistent with plotly visuals
  • Adjust font size to accomodate text to div ratio
  • Added spaces between different dvs
  1. Comparison plots
  • Added overlapping line charts for the number of requests throughout the day
  1. Data Quality Toggle
  • Added data quality toggle to filter out data that are considered "bad" (request timeToClose less than 1 day or longer than 100 days)

@nichhk
Copy link
Member

nichhk commented Jun 2, 2022

Thanks for these updates Josh!

Re: remove one or more request types: I think it might be more intuitive to make this the opposite, i.e., select one or more request types. This will better align with the map functionality as well.

Re: Data Quality Toggle: This looks super useful! May I ask how you chose the thresholds for "bad"? This might be a situation where we might have to combine some statistical analysis and also get input from City folks.

In terms of statistical analysis, I think there are several ways to detect outliers. One way is applying something like a z-score range.

But I think we also need help from the City to understand what acceptable timeToCloses are. It might be perfectly ok for a timeToClose to be like, 10min, for example, if it's a duplicate of another request.

@joshuayhwu
Copy link
Contributor

joshuayhwu commented Jun 2, 2022

Thanks for the feedback Nich!

  • I have implemented the selection by requests type functionality, but unfortunately I discover another bug. In order to implement the dependent drop-down (the type dropdown only shows the types available in a particualr NC), he visuals wouldn't update when only the NC drop down is selected. Surprisingly, visuals are updated when only type dropdown is selected. This part is still under investigation.

EDIT: fixed this bug - it was sloppy logic on my part. But another minimum series length occurred
EDIT2: Figure out what was happening. Essentially my filtering logic tries to filter some rows by selection, for in some cases, the filtering mechanism removes all rows from a dataset, causing error to show as I didn't specify what should happen to the visuals when there is no data.
EDIT3: Raise PreventUpdate() exception

  • In terms of data quality, I essentially eye balled the value based on the visualization. I have now defined outliers by first using a log-transform and taking the median +- 1.5*IQR since the data is skewed (before and after log transform).

Raw Distribution
raw_timeToClose

After removing outliers based on rule above
filter_timeToClose

  • I agree we need to talk to City if possible. One thing I notice is that there are some rows with missing createDate / closeDate, causing timeToClose to be empty (which I replace with 0). There are some rows that have negative timeToClose, which are definitely data quality issues we need to investigate and constraint upstream.

@joshuayhwu
Copy link
Contributor

@ExperimentsInHonesty thanks for clarifying the context for the dashboards MVP last week. Would love some feedback from you on this version. Please note the following:

  • Plotly dash doesn't support pre-defined groups in the dropdown lists (i,.e. cannot select 1 region, but must select individual NCs in the region). It is possible to select multiple NC at the same time, but for now I'm keeping things simple.
  • Descriptions on the visualization will be a later feature. Visualizations are designed to be as simple as possible and I don't want to assume data illteracy
  • You mentioned about some requests being close ridiculously early (i.e. less than 10 minutes or a day). Nich and I discussed this issue and we thought it is best to treat it as data quality issue rather than instantly flag as problematic - we need to talk to the people generating this data prior to making a conclusion.

NC Summary
plotly_mvp_v1 3_1

Visuals:

  1. Line Chart: Total number of 311 Requests over the time range as defined by the earliest request create date and latest request create date. This shows which specific time range has the most request
  2. Pie Chart: Share of request type based on the data available. This shows what kind of request has the highest/lowest demand in a particular neighborhood council
  3. Histogram: Distribution of request time to close This shows how long it takes for each request to complete (proxied by close request) as a distribution

Features:

  1. Selecting individual neighbhood council
  2. Selecting one or more request types
  3. Data Quality Toggle to filter data with quality issues (where the time to close is not a outlier)

NC Comparison
plotly_mvp_v1 3_2

Visuals:

  1. Indicator Visuals: Total number of requests and the number of days of the data available
  2. Bar Chart: Number of requests by sources. This indicators show the variety of mediums individuals make request through
  3. Line chart: total number of 311 request comparison

Features:

  1. Compare the total number of requests and date range between NCs
  2. Compare how individuals make 311 request between the two NCs
  3. Compare the number of requests throughout the day for both NCs

@joshuayhwu
Copy link
Contributor

joshuayhwu commented Jun 12, 2022

See my public repo for integrated dashboard file and instructions that could be run locally. I only used new version of dash and docker.

Currently looking to integrate the code into 311 Data Code Base. There are some issues with the newer versions of Dash / gunicorn interface that none of the callback functions work with the newer version of dash / gunicorn / docker interaction. i.e. no response with any interaction on dashboards. Will try to figure this out in the next few days.

@nichhk
Copy link
Member

nichhk commented Jun 14, 2022

Thanks Josh! In your repo, can you put in the unzipped files instead of the zip so that people can browse the code without downloading? Let me know if you need help with debugging the interaction issue.

@joshuayhwu
Copy link
Contributor

joshuayhwu commented Jun 15, 2022

Thanks, I put in the unzipped files in the public repo. Would appreciate some help whenever you're available, but I'll continue working on it and see if I could replicate error.

EDIT: Turns out the problem resolves just by adding flask. Seems like Gunicorn does not go well with Dash

@joshuayhwu
Copy link
Contributor

joshuayhwu commented Jun 20, 2022

I have summarize some of the feedbacks I received for the current version of the dashboard:

  1. Regarding data quality issue, Bonnie had a wonderful insight for implementing one possible decision rule. When the requestSource is driver self report, and the time-to-Close of such request is 0, then it is likely the driver simply close the request instantly, then proceed to work on the request (or not). This could be one potential decision rule that we implement

  2. The current color of the Plotly Dash dashboards are unfavorable for neighborhood council in formal publications (e.g. newsletter). Will use the default Dash colors for the plotly dashboards from now on

  3. The current plotly dashboards doesn't take into account how our end user will utilize the dashboard, i.e. downloading the individual visualization and printing the dashboard page as a whole. Ideally, each visualization should have title, corespnding axes label, and correct scale. Each dashboard should also be optimized to the "printed" layout.

  4. Will confirm the following again in next meeting: consistent with Nich's comments on combining dashboard, I propose to combine the recent dashboards with the overall dashboards. More specifically, the neighborhod dashboard could be combined with neighborhood_recent, overview dashboard and recent could be deprecated due to redundancy with current prototype, types_map could be combine with other dashboard (perhaps the one Piero is working on?), and this current prototype will be the final one. Meaning there would be 3 dashboards in total: neighborhood, current prototype, and types_map.

@joshuayhwu
Copy link
Contributor

Updated Overview Dashboard Pt 1

PlotlyCombined1

Updated Overview Dashboard Pt 2

PlotlyCombined2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-feature: Analytics P-feature: Reports Role: Data Science Data management, loading, or analysis Size: 8pt Can be done in 31-48 hours
Projects
Status: Done (without merge)
3 participants