-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trip confidence can be artificially high for single trip clusters #663
Comments
@GabrielKS this makes sense to me, but I am concerned that it is not very explainable. The program admins do ask us questions about how the algorithm works and the current explanation is fairly simple - we look at your prior trips and use them for the probability. At our Friday meeting, Jeanne actually asked me if she could see the probabilities for the labels from the UI 😄 I think that an explanation similar to "we need 'k' trips before the cluster counts" would also be understandable. I'm afraid this will make people's eyes glaze over. Is there a way to explain this in simple terms for somebody like Jeanne to understand? If not, I would prefer a simpler version even if it is not sufficiently mathy. |
Here's an attempt at explaining the above (I changed
The last sentence may be omitted for more simplicity, and much of the first paragraph may be omitted if the user is familiar with how the status quo works. |
For now, I will implement the algorithm as planned, and if we need to simplify the formula it will be a simple one-line change. |
+ Implements the algorithm described in e-mission/e-mission-docs#663 (comment) + Uses constant values `A=0.01`, `B=0.75`, `C=0.25` + Changes eacilp.primary_algorithms to use the new algorithm No unit tests yet (working on a tight timeline), but tested as follows: 1. Run `[eacili.n_to_confidence_coeff(n) for n in [1,2,3,4,5,7,10,15,20,30,1000]]`, check for reasonableness, compare to results from plugging the formula into a calculator 2. Run the modeling and intake pipeline with `eacilp.primary_algorithms` set to the old algorithm 3. Run the first few cells of the "Explore label inference confidence" notebook for a user with many inferrable trips to get a list of unique probabilities and counts for that user 4. Set `eacilp.primary_algorithms` to the new algorithm, rerun the intake pipeline (modeling pipeline hasn't changed) 5. Rerun the notebook as above, examine how the list of probabilities and counts has changed
My reading of that result is that the metric is not very sensitive to A, but is sensitive to both b and c. Given the difference between the naive approach and the default, it seems like bumping up b and c (e.g. b = 0.9, c = 0.33) would help improve the upper quantiles. Or is there a domain specific reason for picking those values of b and c? |
On the bright side, though, it looks like the lower quantiles are not really affected by any of the settings, which makes sense given the change. So I don't think that there will be a visible impact in the number of yellow labels, only on the trigger for inclusion in "To Label".
We may want to drop the trigger to 0.90, or have a second, relaxed round in which we drop it to something like 0.75 or both. Let's see what the higher b and c values show... |
I think I like "high_b_and_c" better. Basically, the 1.0 probability has just moved down a bit to around 0.9 and the rest of it is unchanged. I like it a lot better than default, where the quantile moves down to below 0.8. I saw your note about empirically determining |
That makes sense. We should just be mindful of how it interacts with the upper threshold. Under |
Another way to do this would be to decide how many occurrences of a common trip we want the user to have labeled or confirmed before we take it out of their hands completely, and then work backwards. |
I like the approach of deciding the occurrences and working backwards because, as I indicated, that is easier to explain #663 (comment) I think that number of trips that need to be labeled = 3 is a reasonable starting point. |
Here are said gory details. I also propose configuration values:
|
@GabrielKS all that looks good, except that I don't understand this inconsistency. Relaxed: Why don't we need 4 occurrences in R4, similar to R2? R3 has the same value as R1 (1 occurrence before yellow trip) |
My thinking was that four occurrences of a common trip with all the same labels was enough for us to assume relaxedly that the labels would always be the same, but if we have an example of the labels being different, we need more data to convince ourselves that the labels have gone back to being all the same. (Also, I3 != I1.) |
Ok, makes sense. Let's go ahead and deploy with these settings, and maybe spend some time today afternoon going over these assumptions with @andyduvall |
+ See comments in e-mission/e-mission-docs#663
corinne-hcr/e-mission-server#4, incorporated into e-mission/e-mission-server#829, closes this issue for now. |
From e-mission/e-mission-eval-private-data#28 (comment)
the current confidence is essentially the same as the h-score. But that has a problem with our expectation design. If a user goes to a new location for the first time by car, and the second time by bike, then during the second time, we will have a match with a confidence of 1.0, so we won't even show it to the user. So we won't even know that it is a trip that is sometimes taken by bike. So we need to change the confidence calculation to wait until we have k labels before we are confident.
@GabrielKS if I work on #662, can you take this one? It is a lot more straightforward, and it is a correctness issue, so I think we should fix it before a larger scale deployment. And I think you already had an idea of how to implement it?
The text was updated successfully, but these errors were encountered: