-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special groups aggregate metrics #569
Comments
Some notes from WG meeting on Aug 27:
|
@schnuerle can we have a R.e:
Yes—there are significant privacy concerns with adding any additional PII to MDS. In my opinion, this is something that feels beyond the scope of what MDS should ever support. Can we not leave it up to agencies to use their own data—presumably collected responsibly (e.g. through surveys or the US Census)—to conduct this kind of analysis? If providers are already holding this kind of information: why? can you stop? |
@johnclary the original intention with this idea was to ensure that no personally identifiable information of specific user groups would flow through MDS, nor would their individual trip data, but rather aggregated information (such as overall number of low-income program users or overall number of trips), which would be calculated by the provider before sharing with a city. This is data that almost every city already asks for, and is mostly facilitated via custom reports, so there would be benefit to both providers and cities in standardizing. |
Maybe a clarification from the call that could be added to the main description here: the 'low income' group from providers means any riders using an equity plan, eg Bird Access, Spin Access, Bolt Forward, etc. So this is known by the providers per rider as part of operations and billing for discounts. Note the k-anonymity part which says if a query returns too few results based on filters/time/geography, no data is returned for that slice. |
If I'm reading these materials correctly, it is being proposed that this API make it possible to report on What's missing is a specific explanation of why an agency would want to collect this data and what meaningful action they would expect to take as a result of it. Assuming we all agree that the "special groups" being discussed here are people who have have a disproportionately high risk of harm if their PII were to be leaked, who generally have fewer mobility choices, and who would almost definitely prefer that service providers did not hold this kind of data about them—this warrants an extremely well-defined use-case. Regardless of whether or not providers are currently collecting this data, do we as agencies really want to reinforce that practice based on the case that there might be some useful information for an as-yet unproven technocratic purpose? |
What really bothers me about this proposal is that the development of MDS continues to default to collecting as much data as possible on the premise that it will be theoretically useful at some point. There remain very few stories to tell of cities doing right by there publics as a result of this casting of an extremely wide collection net. |
That's now how I read Dirk's proposal. It is from an employee of a mobility provider. Spin's assertion is that many cities have specific purposes for this data, they are already making these requests of mobility providers, and that this would be an (optional!) way to standardize the requests that are already being made. Also I am not sure I agree with the characterization that MDS "defaults to collecting as much data as possible". The Metrics API is clearly less data, explicitly de-identified. The OMF privacy committee is also publishing standards for privacy protection including data deletion principles. I agree that some city representatives should join Dirk/Spin to describe the use cases and public-policy aims for this information. |
For the sake of documenting what cities are currently asking for in terms of monthly reporting, I went through every city permit/policy application I could find from our Cities Using MDS list and summarized very broadly the relevant, documented asks. These are mostly in the terms of monthly reports, and can include MDS derived data (most cities are asking for aggregated MDS data in addition to the API feeds). I'm sure there are more examples, but I could not find them online. If anyone has more cities please add them in the comments. The mobility service providers like Spin and Bird may have a more comprehensive list of the things cities are requiring, and could provide an aggregated (not broken down by city) list. List NotesI put special user groups in bold, and equity/zone area info in italics. Note some cities are asking for aggregated demographics. Almost all cities also ask for complaints, safety, collision, injury, and vandalism numbers, and these seem like something missing in the proposals so far. Outside of the scope of this are system alerts, pricing plans, and hours of operation, which gets back to a brief conversation we had in the last Provider WG call around a kind of 'Policy for Providers' API idea. City ListAustin
Denver
Detroit
Kelowna, Canada
Long Beach
Louisville
San Jose
Santa Monica
Washington DC
Cities with low income participation count requirements: Cities with equity distribution count requirements: |
to be clear, I support the metrics API and really appreciate @dirkdk's work on it. but are there member agencies pushing for these "special groups" metrics as they're currently being proposed? would like to hear from them. |
Cities generally have equity as a key goal and basic mandate, working to ensure that all its residents have the opportunity to be on equal footing. We as an operator have this same goal, which is the reason we established our Access program, providing low-income pricing and a means to use the service without a smartphone or credit card. We track these metrics internally to gauge the effectiveness of our Access program, and inform decisions on any needed adjustments to the program. Cities essentially do the same thing, working to ensure any pilots or programs they deliver are working to achieve equitable outcomes, and evaluating operators to that end. Pretty much all cities where we operate require the metrics included in this proposal as part of our data sharing terms, although it is generally shared in aggregated form via a custom report via spreadsheet or powerpoint. As stated in the original issue and on the working group call when this was discussed, Spin feels strongly about the need to protect the privacy of users, and that this proposal strikes a good balance of sharing data with cities while maintaining privacy. Use cases should be driving changes to MDS, and we do see a clear use case here for sharing aggregated data on equity programs. Maybe a solution could be that this issue be solely focused on that equity program data, and the other groups or categories of data mentioned in the replies to this issue should be tabled until a clear and meaningful use case is presented. It would also be good to hear another operator's perspective such as @bhandzo from Bird. @dirkdk and @joshuaandrewjohnson1, Spin |
Updating the list of metrics from San Francisco's scooter program that are reported monthly via an excel file and calling out the special groups:
There is a more exhaustive description (although this list may not exactly be what is reported every month). It's great that our permittees offer low-income plans, but the goal of knowing how many members and trips are taken on that plan is to get a sense of how well those low-income plans are actually being used. Similarly, we have a requirement for an adaptive scooter pilot, and we'd like to understand how those devices are being utilized relative to the rest of the program. These aggregate metrics are not the only methods of evaluating their respective programs, but they are certainly helpful. |
@johnclary this is information that cities require to regulate. DC had been receiving this in quarterly reports for fleet increases and then with a newer regulation in weekly reports as operators requested fleet increases. Cities are asking for this in a non-standard way, so adding it to MDS would in theory make things easier for the operators, too, as @dirkdk noted in the initial PR. Our equity plans are both geography- and user group- based, same as several other cities. For geography-based equity, we require deployment in specific areas and this can already be monitored through MDS. For the user group-based equity, we require that companies have low-income customer plans that give free unlimited trips for those at 200% or less of the federal poverty level. Without some level of reporting on usage, we only know that the plans are offered, but not whether there is any uptake (which can speak to the success of the providers’ marketing of and support for said plans). There are distinctions between user-based programs and geographic programs - you use both to enable usage by groups that might generally have lower access. However, we don't necessarily expect that the low-income plan usage will be concentrated in the geographic equity areas. Per @alexdemisch knowing the level of program usage would allow for better regulation. We might want to require that a certain percent of trips come from low-income plans as a condition of operating (e.g. operators must demonstrate that at least 1% of all trips were from low-income plan users). Being able to see overall usage levels would be critical for tracking that. The information about the origins and destinations and waypoint movements of low-income plan users is not something that DC would like to have. At a high level, we’d like to know a little bit more about the characteristics of the geography where the trip is occurring. Being able to query an API, with a minimum K-anonymity value, would be a good scenario. |
The City of Chicago has requirements that mobility providers offer low-income and equity programs to ensure that new mobility options are available for all of our residents. We have a program for the City's own system (Divvy for Everyone) as well as program requirements as a condition of permitting for other public mobility providers. We also engage in evidence-based policy evaluation using the mobility data we collect (e.g. ridehail congestion study and scooter pilot evaluation). A core government function is to enroll individuals in programs and evaluate the policies that govern those programs to check and improve their effectiveness. Collecting data related to low-income and equity mobility programs is necessary in order to measure trends and determine whether the policies and requirements are having their intended effect, and whether changes need to be made. Our experience with the study and evaluation linked above is that pre-aggregated data limits the insights that can be generated in a study, and therefore limits the effectiveness of policy actions. The proposal in this issue would help MDS provide a better value when it comes to measuring the effectiveness of low income/equity policies, but would not allow for the level of analysis presented in the linked documents. Trip-level data would be the best way to measure and study the reach and equity of mobility services as well as the effect of policies. |
We reviewed this issue as part of the second OMF Working Group Steering Committee release Checkpoint. Both WGSCs had some feedback and I'm documenting it here for discussion. How can cities trust aggregated (non MDS derived/special groups) data? It might not be possible, but wanted to ask for ideas since it is a concern. |
Appreciate the cities chiming in. @nicklucius r.e.
Can you expand on why a pre-calculated geographic aggregate is not sufficient for your purposes? Do you see some trade off in terms of the value of whatever level of analysis you're trying to conduct versus the risks involved with collecting and storing personal data? I don't know how to measure this kind of trade off, but it seems to me that the most sound approach to mobility data collection for planning purposes is to start with the bare minimum (e.g. aggregates at some geo resolution) and see how far that gets you. If you're not certain about the insights you'll derive from raw data, even less certain about the policy decisions that follow, and even less certain about the impacts of those policies, it becomes increasingly harder to justify the privacy risks involved with collecting raw data in case its useful. |
@johnclary Sure, I'm happy to. These reports are good examples of what I'm talking about: our ridehail congestion study and scooter pilot evaluation. We have aggregated, privacy-protected datasets for the underlying data made available to the public on our data portal here and here. If you tried to replicate the analysis in the reports using the pre-aggregated data, you would not be able to calculate many of the metrics or recreate many of the maps. That is because the pre-aggregated data removes the granularity necessary to run the queries and analytics that produced the findings and allowed us to fully evaluate the programs, recommend policies, and share it all with the public. We only want to collect the data that is necessary for our purposes and will produce important insights. We are always monitoring our mobility data collection standards and we do refrain from collecting what we do not need, and have stopped collecting data we previously collected once we realize that it is not producing an needed benefit. For example, see how our scooter data collection rules changed from 2019 to 2020. |
Note that for 1.1.0 we have merged with #582 the new Geography API to the 'dev' branch. Please update this pull request with the latest code, resolve any conflicts, and make references to the Geography API where appropriate, e.g. with UUIDs. We will be discussing Special Groups at this week's Working Group meeting, so if available please come prepared to talk about your latest updates and ideas. |
Would love to see a PR for this before our Thursday Working Group call this week, so we can discuss it on the call @dirkdk @joshuaandrewjohnson1. I can help if needed. I've made a feature branch called 'feature-metrics' to start pulling all the Metrics related work together and do PRs against. |
ok I will work on that |
The new 'feature-metrics' is ready and has #486 and #487 incorporated into it. Per the WG call, I will incorporate the proposed ideas here into that branch, then report back. Here are the relevant meeting notes from the call last Thursday. Special Groups
|
I believe I've captured all of the metrics mentioned here in the Metrics branch, so please review. There is a new dimension and filter for special_group_type which should meet all of your requirements too. Please review. There is also the start of a data redaction section which talks about k-values (which I've set as 10 across the board for now) and needs some thought behind it (what should the value be, how should it be calculated, how can it differ across Metrics?). The k-value also comes back in the query response. |
We will be aligning this back to the original proposal intent and pull it out of Metrics. Look for a PR soon. |
This has a solution now for 1.1.0 as a beta feature serving up a relatively simple static CSV file with PR #607. I think in the next release we should gather feedback and ideas on how to expand this, either in a more dynamic API way and/or with more fields/options to align more to original issue description and meet more existing provider/agency monthly report use cases. |
Is your feature request related to a problem? Please describe.
Cities often request data on how many lower income users we reach with our vehicle service, and how many trips such users take. MDS does not currently support user segmentation. We would oppose attaching any user data to the Trips endpoint as it would involve information close to Personally Identifiable Information (PII) and would make it fairly easy to identify individuals by trip route and user segment. Instead, we propose the solution of providing aggregate trip data by user segment.
Describe the solution you'd like
We propose that Providers provide aggregated data on trips by special user segments, using the new Metrics API. By using aggregation, it should be impossible to trace back this data to individuals. This does mean we need to set meaningful minimals for certain metrics so that the aggregated data has k-anonymity.
The new Metrics API specifies parameters for name, since, interval, dimensions that we will assume these metrics support.
Proposed metrics for special groups:
Overall aggregate statistics
For overall usage we can do what is listed below. Please note that this data should be derivable from MDS trip data, minus active users.
Is this a breaking change
A breaking change would require consumers or implementors of the API to modify their code for it to continue to function (ex: renaming of a required field or the change in data type of an existing field). A non-breaking change would allow existing code to continue to function (ex: addition of an optional field or the creation of a new optional endpoint).
Impacted Spec
For which spec is this feature being requested?
metrics
but only served by Providers. Only Providers will have the raw dataDescribe alternatives you've considered
Alternatives would be to add user segments to individual trips in the Trips API. We oppose this method as it would make user identification extremely easy. We currently send this data to cities via manually compiled Excel sheets, and it would be better to have an official API.
The text was updated successfully, but these errors were encountered: