Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the charts on the admin interface #342

Closed
manaswisaha opened this issue Oct 13, 2016 · 48 comments
Closed

Improve the charts on the admin interface #342

manaswisaha opened this issue Oct 13, 2016 · 48 comments

Comments

@manaswisaha
Copy link
Member

manaswisaha commented Oct 13, 2016

Currently, these aren't very clear to the user and aren't useful.
Things that can be improved:

  • improve the x-axis of each
  • create better histograms with proper bin sizes and not sort in a descending order (doesn't allow us to see a distribution)
  • each chart should have a clear goal (e.g. the onboarding completion time isn't providing any useful information)

Screenshots from my local system:
screen shot 2016-10-12 at 8 47 56 pm
screen shot 2016-10-12 at 8 48 05 pm

More improvements can be added when we get around to working on this.

@jonfroehlich
Copy link
Member

This is going to be @jihyukbae's initial task to contribute to Project Sidewalk.

@misaugstad misaugstad assigned misaugstad and unassigned jihyukbae Jun 5, 2017
@misaugstad
Copy link
Member

misaugstad commented Jun 6, 2017

I'll be taking this on so that we have some more informative graphs and metrics to look at to assess how the relaunch is going. Here is my current plan (looking for any feedback/suggestions):

Changes to current graphs (see screenshots in the original issue comment):

  1. All graphs: move the axis titles outside the graph DONE
  2. Coverage Rate per Neighbourhood % graph: switch the x and y axes and keep the axis labels, as suggested in Admin Interface: Coverage Rate per Neighborhood Graph #412; give the option to switch between ordering by neighbourhood name or coverage. DONE
  3. Coverage (m) graph: I'm not sure I see the value in this graph, and am thinking of removing it. I think that we care about completion percentage, not the miles covered per neighbourhood. If others see value in this graph, the same changes that are made to the coverage rate graph could be made to it. REMOVED
  4. Completed Missions graph: I think that this graph is not very clear or useful... Again I think that we care more about coverage percentage than this. The x-axis has coverage distance for a neighbourhood, and the y-axis is counts of missions completed that resulted in that distance having been covered in that neighbourhood... I am not seeing this as useful, am I missing something here? REMOVED
  5. Onboarding Completion Time graph: change x-axis labels to be in minutes instead of seconds, change bin size(s) to show more of a spread, with the last bin being something like (10+ minutes) DONE
  6. Daily Label Counts graph (shown below): data is only shown for the past month; should we make this longer, set it to be day of relaunch to present, or some other time window? DONE
  7. Daily Audit Counts graph (shown below): should treat this the same way that we treat the Daily Label Counts Graph, also should change to either Daily Missions or Daily Total Distance. Daily Missions could also be Weekly Missions instead... DONE
    sidewalk-audit-label-counts

New graphs (some suggestions came from #351):

  1. Line graph showing percentage of DC audited over time. This will give us an idea of trends in use of the tool and an at-a-glance sense of how on track we are to reaching 100% coverage. I want to be able to pan and zoom at least along the x axis (time) so that you can see the big trends over time (maybe seeing two big spikes, one for initial launch and another for relaunch) and have the ability to drill into more recent trends (maybe the server went down and the curve flattened out for a day or two). DONE
  2. Histograms of severity rating counts (one graph for each label type, and one summed across all types). We don't currently know if it needs to be one a scale of 1-5. This could tell us whether users actually use all 5 ratings, or if we should maybe just bring it down to a scale from 1-3. DONE
  3. Something like a bar chart that shows drop out starting from visiting website -> clicked start mapping -> finished tutorial -> finished 1 mission -> etc. I am thinking y-axis is from 0-100%, where 100% visited website, and a progressively smaller percentage went through the next steps. This will help to see where the biggest bottlenecks are in getting people started; when do they give up? (We also want to do something different for existing users or people who sign up in the middle etc).
  4. Histogram of time spent using Project Sidewalk with median and average marked. Also, perhaps include a table that also includes this info but differentiates between signed in and anonymous users (Jon's suggestions). This graph (and the following three) give us an idea of how engaging our tool is; how long can we keep users motivated and engaged? Do they all give up after their 3rd 0.5mile mission, so we should change things up by the 2nd 0.5 mile mission? Do they give up after their 1st 0.5 mile mission, so we should make them shorter?
  5. Histogram of number of labels per mission with median and average marked. Again, would like summary table comparing signed in and anonymous users (Jon's suggestions)
  6. Histogram of number of missions per person with median and average marked. Again, would like summary table comparing signed in and anonymous users (Jon's suggestions) DONE
  7. Histogram of number of logins per registered user with median and average marked (Jon's suggestions) DONE
  8. Histogram of neighbourhood completion percentages. Over time, this will show us the trend towards 100% coverage. It also gives us an idea of how well new users are distributed to neighbourhoods. DONE
  9. Add choropleth of completion % for neighbourhoods, as suggested in Admin Choropleth Completion Map #622. This should definitely be done, but will take me a bit more time than the other changes, so I plan to do the easier changes first. DONE
  10. In Daily Label Count graph, we could plot a line for each label type on the same graph, with their associated colour.

@jonfroehlich
Copy link
Member

I skimmed this list and it seems reasonable on a first take. I also appreciated that you went through all open Issues looking for those that were admin dashboard related. Thanks for doing this.

One comment about "Coverage Rate per Neighbourhood % graph." I think you should actually do as suggested in #412.

Also, @misaugstad, can you go back through your list and add in an expected value proposition for each graph that you plan to create.

@misaugstad
Copy link
Member

For the DC coverage percentage chart, do we prefer an area or line graph?

coverage-area
coverage-line

I will also bring down the interval between ticks on the y-axis from 10% points to 20.

Any other feedback on this?

@jonfroehlich
Copy link
Member

jonfroehlich commented Jun 17, 2017 via email

@misaugstad
Copy link
Member

I agree, more horizontal!

And thoughts on these histograms of severity rating by label type?

severity-ratings

@manaswisaha
Copy link
Member Author

manaswisaha commented Jun 17, 2017

Few comments:

  • X-axis ticks should be rotated
  • The y-axis scale should be the same across all the graphs imo.

@misaugstad
Copy link
Member

Will do for the x-axis ticks, didn't catch that before, thanks!

And for the y-axis scale, ideally I would keep the scale across all of them, but there are so many more curb ramps than the other labels, that the other histograms would be hard to read if we did counts. But then if we went with a proportion scale (from 0 to 1), we would be losing the counts information.

So that is my current rationale, but can still be swayed!

@jonfroehlich
Copy link
Member

jonfroehlich commented Jun 17, 2017 via email

@misaugstad
Copy link
Member

They have been shrunk! Next up for review... I extended the daily label counts graph to go back to the end of 2015, now again do you prefer line or area..?
daily-label-counts-area
daily-label-counts-line

Again, I think area looks nicer.

@manaswisaha
Copy link
Member Author

manaswisaha commented Jun 18, 2017 via email

@jonfroehlich
Copy link
Member

jonfroehlich commented Jun 18, 2017 via email

@misaugstad
Copy link
Member

Okay how do they look now?

But first, some of my notes:

  • For the daily audit and label count graphs, since basically a third of the data are 0's (due to this being a dump from December), the medians are 0 so you can't see them
  • On a related note, the distributions in the histograms are thus overly skewed towards 0. So right now the histograms don't look very good, but I'd like to see how it looks on real data before messing with the binning or anything like that.
  • I hope you like pink!
  • The std for the onboarding completion time is really high because inevitably some people leave their computer and come back. The 5 longest completion times are 122, 87,84,67,64 minutes. Do you want to filter out anything? Like can we assume that the onboarding does not take an hour? Can we filter out those over 45 minutes..?

maybedone1
maybedone2
maybedone3
maybedone5
maybedone6

@jonfroehlich
Copy link
Member

Thanks for making these updates so quickly. We can work on the color scheme. :)

One thing I forgot to mention, I'd really like it if you could label the mean and median lines on the graph.

@jonfroehlich
Copy link
Member

@misaugstad: I was thinking you would do log scale on y-axis, which seems simple enough imo.

Re: table. Yes.

@misaugstad
Copy link
Member

@ myself, in answer to my question about defining an anonymous user, it seems that there was discussion about this before I arrived, noticed when reading through #323

@misaugstad
Copy link
Member

With a lot of the analytics that we want to look at, there are 5 groups of users that we may want to see a graph for: all users, all users minus researchers, registered users, registered users minus researchers, and anonymous users. To try and get all that information without taking up a huge amount of space, I figure that we could have a button (or something like that) that would toggle whether we include the researchers in our histograms. I coded this up for one set of histograms, pictured below:

Defaults to including researchers...
including_researchers

Then upon clicking "exclude researchers", the viz updates...
excluding_researchers

Thoughts?

@jonfroehlich
Copy link
Member

jonfroehlich commented Jul 3, 2017 via email

@misaugstad
Copy link
Member

So here are a few new graphs for the admin interface. Any comments before I submit a PR for them so they can be included in the stress testing today?

new-graphs

Right now, the anonymous user labels are incorrect, in the same way as issue #791. I have an idea for how to fix it, but if that doesn't work, the fix may not happen today. The way to fix it is to use a select distinct, but Slick doesn't have select distinct directly built in (well, maybe they do in the newest version, but that isn't well documented yet anyway). So my idea is for a workaround.

Also, the following graphs are now bar graphs instead of area graphs. The area versions can be seen at this comment. (Note that the histograms next to these graphs are not pictured, and that is where the legends are)

daily-bar-graphs

@misaugstad
Copy link
Member

@r-holland is going to get started on making a graph of time spent using Project Sidewalk, where we count 5+ minutes of inactivity as not using the tool. This should be a good intro to our backend, while providing something very useful for the dashboard!

@jonfroehlich
Copy link
Member

jonfroehlich commented Jul 3, 2017 via email

@r-holland
Copy link
Collaborator

Looks like I have most everything working:

image

Unfortunately, @misaugstad explained to me that a query similar to the one I am conducting caused the server to crash a while ago. I will work on implementing more intermediate calculations on the back end to hopefully reduce the query size.

@jonfroehlich
Copy link
Member

Thanks @r-holland.

  1. Can you add in actual values for the mean & median
  2. Does the final column actually represent all auditing times > 180 minutes?
  3. The y-axis is counts of people?

@r-holland
Copy link
Collaborator

image

@jonfroehlich Does this updated graph address your concerns?

@jonfroehlich
Copy link
Member

jonfroehlich commented Jul 11, 2017 via email

@r-holland
Copy link
Collaborator

audit_time_histogram_updated
audit_time_histogram_updated_2
audit_time_histogram_updated_3

I am still in the process of switching the computation function to the back end, been busy with other issues. Plan is to get this done today.

@jonfroehlich
Copy link
Member

jonfroehlich commented Jul 12, 2017 via email

@r-holland
Copy link
Collaborator

Thanks to the discovery of lag(), my new favorite SQL function, I have dropped the total query time to about 2 seconds. All of the calculations are back end now. @misaugstad also suggested limiting the query to only ModeSwitch events...I will let him explain his own reasoning:

"ModeSwitch_Walk is logged during both panning and "walking", and the other mode switches are for the different label types. So if no one pans, changes pano, or switches to a labeling mode for 5 min, they probably aren't doing anything really."

Here is the updated histogram:

image

@misaugstad
Copy link
Member

@r-holland how is it possible that after removing some of the interactions, the average amount of time spent actually went up? Shouldn't it be that removing interactions should result in, at most, the same amount of time spent as with all interactions?

@misaugstad
Copy link
Member

You should run your current implementation, but for all interactions, and make sure that it looks the same as your original graphs. And if it doesn't, we need to find out why

@misaugstad
Copy link
Member

Also this is only looking at registered users, which should be mentioned. And we should do this for anonymous users as well. And that should not be hard at all, you just group by IP address instead of user id

@jonfroehlich
Copy link
Member

Is this Issue still active or should we close it out?

@misaugstad
Copy link
Member

Yep, there were just a couple remaining charts that had not been created, so I made separate issues for each of them. Closing this one now. Nice work, team!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants