Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Re-implement the admin dashboard in plotly #802

Open
shankari opened this issue Sep 28, 2022 · 11 comments
Open

RFC: Re-implement the admin dashboard in plotly #802

shankari opened this issue Sep 28, 2022 · 11 comments

Comments

@shankari
Copy link
Contributor

@asiripanich @lgharib @TTalex @jf87 (and @PatGendre in case you are interested), this is the first of two high priority RFCs around fairly fundamental changes to the NREL OpenPATH technology stack.

We haven't had much community involvement so far around big decisions, and I wanted to see if we could change that 😄

As part of the new vision for NREL OpenPATH as a customizable, easy to use system for other programs or agencies to use, we want to enable an admin dashboard similar to https://github.com/asiripanich/emdash

This should significantly lower the support burden on the platform sysadmin (NREL in this case), since the program admins can perform many management tasks themselves. It should also make it easy for the program admins since they can see the results in real time, and don't need to email the platform admin to send out push notifications etc.

However, I am thinking of switching from R Shiny to Plotly. The main reasons are:

  • Given that the full stack already requires knowledge of java, objective-C, angular (+ hopefully react) and python, I am really reluctant to add another language to the mix.
  • I don't know R and I am the developer of last resort for the platform - the person who can go in and get things done by the deadline
  • The NREL viz team appears to be standardizing on plotly and I want to be consistent with them if possible just to make it easier to get the cybersecurity approvals
  • R Shiny is super heavyweight, python plotly seems to be lighter-weight?
  • Using python allow us to reuse the core timeseries objects to access the database, which in turn, will make it easier to support other databases in the future if we choose to migrate
  • We want to move beyond just displaying tables to actually supporting operations, notably:
    • creating and downloading tokens for programs
    • sending out push notifications
    • ...
    • and that seems to be much easier if we can just make calls to an internal flask server
  • The admin dashboard needs authentication, and the authentication should really have 2FA enabled. NREL IT recommends/requires us to use AWS Cognito for authentication.

On the other hand:

  • there are other people at the lab who know R Shiny. In fact, one of the other teams has chosen R Shiny for their dashboard and are hiring a Shiny-focused intern whom I could tap for this too.
    • Although it is still a real problem that I can't just step in and finish up the code if I need to. Interns have classes during the school year and are not always great at hitting deadlines.
  • There is also an R library for working with AWS Cognito (https://www.rdocumentation.org/packages/cognitoR/versions/1.0.2)

@asiripanich this is really a question for you since you created the original dashboard. But others, especially @TTalex since you work with the data science folks at cozy might also weigh in.

Should we continue developing the existing R/Shiny admin dashboard or switch to python-based plotly dash?

@asiripanich
Copy link
Member

@shankari thanks for including me in this discussion. I agree with you that it would benefit the community the most if there is an active maintainer of the admin dashboard project. Reusing the existing e-mission-server modules also sounds like an extremely good idea. I also don't see why we should add a third, or a fifth?, programming language to the e-mission stack. :)

The admin dashboard needs authentication, and the authentication should really have 2FA enabled. NREL IT recommends/requires us to use AWS Cognito for authentication.

This is also a big plus for me.

@PatGendre
Copy link
Contributor

@shankari an admin dashboard is definitely useful ; but more than a dashboard, a kind of "back-office" is needed. As you write, "we want to move beyond just displaying tables to actually supporting operations". This incites me to favour python wrt R. But it depends foremost on the people and community who are going to actually code the dashboard!

@TTalex
Copy link

TTalex commented Sep 30, 2022

I also agree with what was mentioned by @asiripanich

To me, it seems very smart to build tools on the same SDK for the future of the project. Otherwise, changes like edits to the database structure will require many duplicated adaptations, instead of just one to the corresponding python method.

Now, as far as an admin dashboard goes, I'm seeing it resolving three kinds of problems:

  1. Help with business intelligence, where data is used to identify trends and visualize globally how the system is used. As far as I understand, this is what emdash is doing. We could look at the Apache Superset project for inspiration on these kinds of dashboards. To me, this dashboard should be usable for non-developpers.
  2. Help with diagnosis of erroneous data for a specific user. Here the target is a support role, and graphs are less important, the need is automatic checks and debugging tools.
  3. Help with error detection, either in the pipeline process, in the server itself, or directly from user devices. Tools like Sentry or Grafana fit this category. The target user is a kind of sysadmin.

I believe the distinction is important in two cases:

  1. Relation to user consent
    By default, we should minimize how much user data is consulted, especially GPS data.
    There is no issue with the second point, since diagnosis will most likely come from an user request. However, I don't think users of a business intelligence tool should have access to all GPS data (like the map in emdash offers) by default. Anonymity should be first if possible.
  2. Relation to deployers
    I have yet to talk with Cozy about this RFC, but this kind of actors most likely already has tools in their stack to monitor errors and such described in point 3. And their job includes creating custom interfaces for end users, which could fill part of point 1.
    So if we want to fill a need for this specific kind of community member (industry reusing openPath), the second point should be the most useful, with a technology as close as possible to the rest of the stack (so python in our case)

@shankari
Copy link
Contributor Author

shankari commented Nov 9, 2022

@TTalex Thanks for the insightful comments. I have been busy with the label screen improvements and the "count every trip" project and random program communications and have not had the chance to respond. But I have been thinking about it in the background.

First, to respond to your comments about the use cases:

  • Actually, trend analysis or visualizing program impact is not what emdash is designed for. That is the goal of the public dashboard (https://github.com/e-mission/em-public-dashboard). Please see the deployed examples at: https://durham-openpath.nrel.gov/public/ or https://open-access-openpath.nrel.gov/public/. As you can see, this shows no spatio-temporal data, and does not even show individual user data.
  • The goal of emdash (at least in my experience, @asiripanich may weigh in) is to allow program admins to monitor the data collection in real time. Do all participants have the app still installed? Are they still sending data? Are they labeling their trips? If the answer to any of those is "no", they may have to contact the participants with a reminder 😄
  • In terms of technical capabilities, I agree that both the public dashboard and emdash should be targeted at non-developers. I would go further and say that enhancing and modifying them should be possible for transportation and urban planning majors without a strong CS background, so it should be possible to create analyses and visualizations using RStudio/jupyter notebooks (no SQLLab) that we can then trivially integrate/incorporate into the dashboards.

Second, your idea about minimizing the use of GPS traces in emdash was very interesting. We have now been working with program admins both with and without emdash and their requests for location data are all across the map[1].

  • There was at least one program admin in the CEO e-bike project who used individual location traces to suggest safe routes/parking locations to participants. When one of the participants stopped returning messages, she found the location where they appeared to have abandoned the bike and were able to recover it.
  • Most of the other CEO program admins, however, did not care much about the trajectories; they just really needed to know which participants to ping to ensure that the data was coming in.
  • The Durham program has an agreement with UNC to analyse the data and generate a custom report. They originally planned to use the full traces (exported as geojson), however, I stopped providing them after a month or so and they didn't complain 😄 They do use the trip tables (list of all trips for an individual user), exported as csv, for their analyses[2], When their report comes out in December, we will be able to see if they actually used the trip start/end locations and times, or only the duration/distance etc for their analysis.
  • There is a program I spoke to today morning that wants to understand pedestrian crossing behavior - e.g. do pedestrians cross in the middle of the street instead of going to the intersection? They definitely need access to fine-grained trajectory information because, if possible, they want to automatically determine whether the user crossed the street mid-block. This is going to be hard for narrow neighborhood streets, but they are optimistic about figuring it out at least for wide streets like the grand boulevards.

Whenever are these differing requirements, the obvious solution seems to be customization!
Maybe we can set up emdash to support 3 different flavors:

  • anonymized: only supports (1) back office operations, (2) participant table, (3) trip table with spatio-temporal data stripped out or aggregated to 6-hour/zip code level
  • includes aggregate spatio-temporal: includes maps with aggregate O-D information and after we support map matching, mode-specific counts along roadways. But the trips for an individual user are not linked.
  • includes all data: including trip tables with spatio-temporal data, including trip start/end and/or trajectory information

Since the NREL hosted version of OpenPATH is envisioned as a customizable tool for programs to collect data, each program will specify the flavor of the admin dashboard that they would like to see. This would allow us to get a real world sense of which flavors are really needed and potentially drop support for the less anonymous versions later.

Thoughts? @TTalex @PatGendre @asiripanich

[1] The CEO program admins originally had access to emdash, including the participant and trip tables. However, we have now turned it off due to resource constraints. In the absence of an admin dashboard, I have been emailing requested data to the NREL-hosted programs such as Durham.
[2] The full list of columns they use is

Dump week | Row in Dump | source | end_ts | end_fmt_time | end_loc | raw_trip | start_ts | start_fmt_time | start_loc | duration | distance | start_place | end_place | cleaned_trip | inferred_labels | inferred_trip | expectation | confidence_threshold | expected_trip | user_input | start_local_dt_year | start_local_dt_month | start_local_dt_day | start_local_dt_hour | start_local_dt_minute | start_local_dt_second | start_local_dt_weekday | start_local_dt_timezone | end_local_dt_year | end_local_dt_month | end_local_dt_day | end_local_dt_hour | end_local_dt_minute | end_local_dt_second | end_local_dt_weekday | end_local_dt_timezone | _id | user_id | metadata_write_ts | sensed_mode | purpose_confirm | mode_confirm

@shankari
Copy link
Contributor Author

shankari commented Nov 9, 2022

@asiripanich indicated that the community might be able to build out this dashboard, leaving me free to focus on the core functionality, including API updates, sensing improvements, pipeline debugging/scalability, UI improvements etc.

Is this something that the community is willing to commit to?

You would need to commit to it, since it is a deliverable that the NREL OpenPATH team has committed to our DOE sponsor.
We will need to have this coded and tested by Feb 2023, so if nobody else steps up, it will be my next main priority after I finish minimal API upgrades, which in turn, means that the other changes above will have to wait until this is done.

@shankari
Copy link
Contributor Author

shankari commented Nov 15, 2022

Quick updates:

  • we have decided to proceed with python
  • @asiripanich has committed to make the dashboard changes; as a swap with adding a time-use tab option to master
  • @TTalex any thoughts about the configurable options listed above? We can start with the anonymized option with a focus on monitoring the data collection and back office operations such as sending push notifications and generating tokens for programs. But I suspect we will need to include "aggregate spatio-temporal" and "all data" fairly soon as well.

@TTalex
Copy link

TTalex commented Nov 21, 2022

Sounds good to me !

I also agree with the order, starting with anonymized and especially back office operations. After all, you need a working service before you can compute pretty stats 😄 .

In “aggregate spatio-temporal”, you mentioned “mode-specific counts along roadways”, which is a great idea. This is a feature that has been requested on multiple occasions by public actors here in France. So I am looking forward to seeing how this is implemented, because it could be a technological feat to do it well.

I'll ask around to see if I can gather more input from the "field". But it shouldn't differ too much from what was said in this thread.

@shankari
Copy link
Contributor Author

shankari commented Nov 21, 2022

@TTalex the NREL team has been working on map matching of the TSDC data, primarily for drivetrain analysis, for a while and have recently open sourced their work https://github.com/NREL/mappymatch

It has only currently been tested on car trips and with limited ground truth. They are excited to expand it to support other modes as well and to evaluate against MobilityNet. Not sure what the exact ETA is but we have set aside some budget for this fiscal year, which will end in Sept 2023.

@jf87
Copy link
Contributor

jf87 commented Nov 22, 2022

@shankari
Just seeing this conversation now...
We have been using plotly dash for some visualization for the mobility toolkit we built for an EU project:
https://github.com/nec-research/mobility-toolkit

Also, the last couple of weeks a colleague has been busy building a more complete dashboard based on plotly that integrates with the NGSI-LD interface. It should be ready in the next couple of days. Both is open-source, so this might provide some starting points for the OpenPath dashboard.

@shankari
Copy link
Contributor Author

shankari commented Jan 6, 2023

Mapping-oriented use case from a deployer:

The City is applying for a competitive infrastructure grant to improve the bike, pedestrian, and transit facilities in a low SES area of XXX. If there is a way to see any anonymized trip point/route data that may be beneficial

Sounds like this is a request for flavor (2) "includes aggregate spatio-temporal" from
#802 (comment)

@shankari
Copy link
Contributor Author

I have generated some sample visualizations, and I actually think these "aggregate spatio-temporal" metrics can even be fully public. I will share them at the meeting on Monday to get feedback and then potentially incorporate them as part of the "public" dashboard.

They will be non-zoomable, and will only be displayed if we have > k number of users, so the admin dashboard can still support additional functionality. Note that, with appropriately small marker sizes, we can do "map matching" even without implementing a formal map matching algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants