-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate the current clustering based on start/end place alone #605
Comments
Shankari, I have many questions about the tasks. I don't understand fully during the meeting. 1.As for querying the user about novel trips, is it right after the clustering the common places to form common trips? Though I might not be able to get to that point, I am curious about that. 2.Are these two tasks similar to what Naomi did in her research? I saw the steps in the tasks are kind of similar to the paper. Hers get to tour graph, mine is common trips. It looks like a step before tour graph. 3.According to our internship plan, 4/15 is the deadline for the poster. So, it sounds like most of the work has to be done before that day. A feasible schedule is fine. I hope I can finish the two tasks, but I don't have an idea about what I can do and what I can't do base on the steps. I still have questions about them. That means, for now I don't know what I am going to face. I might need to get to the specific step to figure it out. Due to lack of experience, I need more guides. 4.For the first task, (1)Move the existing evaluation that does not use labels (uses silhouette score) out of computation code I want to know where to begin with? I always feel confused about which file to look at. Should I look at all files in (2)I don't understand. Can you expand it? (3)What do you mean user intervention here? (5)What do you mean tradeoffs here? I think the steps are vague to me. I need more information to see what to do and how I can catch up on the gap. |
No, the plan is that we will cluster common places, and then we will cluster the trips between those places to pre-generate common trip "buckets". A common trip may be "going from home to work on the #32 bus. Another one may be "going from home to work on the F light rail". As new trips come in, we will check them against the existing buckets and if they don't fit into the bucket, we will query the user. You will only work on the first part (creating the buckets), which is the ML/analysis-heavy part. I (or maybe a summer intern) will implement the part where we match the incoming trips. That is integrating the ML algorithm into a system, and I don't think you will be able to finish it in this semester.
Naomi did the first task, but she put all trips between a pair of common places into the same bucket. The second task is differentiating between different ways of traveling between the same pair - i.e. putting the trips into more fine-grained buckets wrt 3. I structured this as two separate and largely independent tasks so that we would have a backup plan if you are not able to finish both. If you only finish the first task, the project will be "evaluation of clustering algorithm to determine common places using labeled and unlabeled metrics". If you finish both tasks, the project will be "Using machine learning to classify common and novel trips from longitudinal labeled travel patterns"
You should move the score calculation out into its own file since the score is not required while featurizing. I will move out the scoring from either
All of them evaluate the machine learning technique of "clustering". sklearn (which is what they use) is a python machine learning library with a lot of machine learning algorithms (e.g. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html)
We can visualize the evaluation along a couple of axes:
This will show us what value of the settings works well for the existing metrics
How many novel places there are. Assuming we need to ask the user for every trip to a novel place, how often would be interrupt the user?
Between accuracy and user attention. If we ask the user for every trip, we are guaranteed to be 100$ accurate but with a large user burden. If we ask the user for no trips and only use the automated inference that Ishrat is working on, we would have no user burden but would not get all the data we want. We want to find a nice balance between them. |
If we check the cluster pipeline, |
@corinne-hcr We're going to move the viz code into the e-mission-eval-private-data repo to avoid cluttering up the server. First, we fix the setup on that repo (e-mission/e-mission-eval-private-data#12, e-mission/e-mission-eval-private-data#13) Next, we pull out the similarity viz code (e-mission/e-mission-eval-private-data#14) from (e-mission/e-mission-server#791) |
Note that my recollection of the prior work was a little faulty. Naomi did not cluster places directly. Instead, she clustered trips, but only based on the start and end points. So the two stages are now:
I have fixed the title of this bug to match. |
@corinne-hcr from the visualization, the current cutoff point seems a bit off to me. Can you change this to plot graphs for the other users as well, and see if this is a common theme? If so, we may need to fiddle with the cutoff detection code. Thanks! Also, feel free to post questions if there is anything about the PR that you don't understand |
plot graphs for all users |
Great. So now let's plot the cutoffs for multiple users from the CanBikeCO dataset and see visually whether the cutoff point seems reasonable. If not, we may want to see why the cutoff point is not reasonable and to tweak the settings/implementation accordingly. |
After this, we will use featurization and then representatives. That is pretty much all. |
Here is the updated graph for multiple users. |
Next, we should evaluate these bins and see whether they make sense. Before we go down the path of tweaking the cluster radius, though, I want to understand the difference between the binning output and the clustering output. If we have already binned trips based on start and end, what does the clustering do? I think we should also be able to plot the various bins and clusters for a specific UUID. @corinne-hcr how long will it take you to:
Note that there is already an implementation of |
@corinne-hcr do you have an estimate of how long it will take you to complete these tasks? |
Before Tuesday? |
@corinne-hcr I would actually like you to time bound each of the tasks and and the end of the time bound, post a PR of whatever you have for me to finish up. I do want you to get familiar with the code, but I also need to make sure that we are actually making progress. So how about one day each:
|
That sounds good. I will try my best to finish it. If I face some problem, it might take a bit longer than expected, but I will let you know what I have by then. |
@corinne-hcr is waiting for @shankari to run the clustering code from a notebook so we can evaluate it You are encouraged to add additional logging statements to the server code. In order to see log debug statements in the notebook, you need to enable logging, similar to |
Here is the link of visualization of bins and clusters. |
Next set of changes:
|
|
We can consider this problem through both a clustering and a classification lens. I believe that the reason this doesn't work is because of the featurization of the trips. For classification ML methods, we typically need a feature matrix with one entry per data point and feature. For trip start/end, though, it is not clear that putting the raw lat/lon into the feature matrix will work; it doesn't take noise into account. What we really need is the distance from other trips but that is a 2-D (trip x trip) matrix in itself. Which is why we are using clustering now. In that case, we would follow a two step approach in which we first cluster/bin the trips, and then put the bin or cluster number into the feature matrix. This should work as long as the bin numbers or cluster numbers are stable; if they are not, we may need to use the centroid or a geo region or just use clustering after all. |
While waiting for the F-score determination, let us continue exploring the data and tuning it based on the current precision/accuracy metric. While tuning, we typically experiment with different featurization options and different input tuning parameters to the ML algorithms. In our case:
We only have one featurization which is the distance between (start, end) pairs As part of tuning, you will want to run for all combinations for the tuning parameters and see if there are any paths that are noticeably better than others. Please note that these results will go into the final paper, at least in summary form. Note that scikit-learn has built-in techniques for evaluation/validation - e.g. Our evaluation may not fit as well into their examples, but it might if you try hard enough. Alternatively, we can implement a parallel method. But evaluation/validation is a key component to model selection. |
Since we are using clusters, but want to calculate precision and recall, we should look at algorithms and evaluation metrics for labeled clusters. That is likely to be closest to what we want. |
Although we can manually compute the precision and recall and F score for labeled clusters, sklearn includes built-in metrics that work with labeled clusters (https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation) These include metrics 2.3.10.1. to 2.3.10.4. In general, we should use existing metrics rather than re-implementing new ones because it is better to not reinvent the wheel. I think we should use the existing labeled cluster metrics instead of implementing F-score from scratch. The V-score seems to be pretty close to what we want, and it is already implemented! F-score is much better known than the labeled cluster metrics, so that's what came to my mind first. I apologize for any confusion! As I pointed out earlier, you could see if we can also re-use the model evaluation methods from scikit-learn for the tuning. |
The precision on all data is here |
The text was updated successfully, but these errors were encountered: