-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content recommendation bugs #4675
Content recommendation bugs #4675
Conversation
Handles three errors. 1. In get_exercise_prereqs, if an unknown exercise id is passed in, it now contributes nothing to the prereqs instead of raising an error. 2. Avoid a divide-by-zero error in coachreports.models. 3. Finally, in get_explore_recommendations, use of `next` built-in could result in an unhandled StopIteration exception. Refactor to use for..in block.
Line profiler output for the endpoint: Timer unit: 4.27654e-07 s
Total time: 180.254 s
File: kalite\main\api_views.py
Function: content_recommender at line 22
Line # Hits Time Per Hit % Time Line Contents
==============================================================
22 @api_handle_error_with_json
23 @profile
24 def content_recommender(request):
25 """Populate response with recommendation(s)"""
26
27 1 50 50.0 0.0 user_id = request.GET.get('user', None)
28 1 20 20.0 0.0 user = request.session.get('facility_user')
29
30 1 5 5.0 0.0 if not user:
31 if request.user.is_authenticated() and request.user.is_superuser:
32 user = get_object_or_404(FacilityUser, pk=user_id)
33 else:
34 return JsonResponseMessageError("You are not authorized to view these recommendations.", status=401)
35
36 1 28 28.0 0.0 resume = request.GET.get('resume', None)
37 1 24 24.0 0.0 next = request.GET.get('next', None)
38 1 25 25.0 0.0 explore = request.GET.get('explore', None)
39
40 1 5 5.0 0.0 def set_bool_flag(flag_name, rec_dict):
41 rec_dict[flag_name] = True
42 return rec_dict
43
44 # retrieve resume recommendation(s) and set resume boolean flag
45 2 255808 127904.0 0.1 resume_recommendations = [set_bool_flag("resume", rec) for rec in get_resume_recommendations(user, request)] if resume else []
46
47 # retrieve next_steps recommendations, set next_steps boolean flag, and flatten results for api response
48 5 421123206 84224641.2 99.9 next_recommendations = [set_bool_flag("next", rec) for rec in get_next_recommendations(user, request)] if next else []
49
50 # retrieve explore recommendations, set explore boolean flag, and flatten results for api response
51 5 111215 22243.0 0.0 explore_recommendations = [set_bool_flag("explore", rec) for rec in get_explore_recommendations(user, request)] if explore else []
52
53 1 3525 3525.0 0.0 return JsonResponse(resume_recommendations + next_recommendations + explore_recommendations) |
Yeah, this needs some optimizations... while splitting it into three endpoints could work, I think this shows that the way we are doing the 'next' recommendations currently is not helpful. Most of this is because of the way that it builds a mapping of 'cousin' exercises, which then gets stored in a memory cache. I think we could do this more efficiently using path "contains" selectors to query exercises that are in the same broader topic. Could you open an issue for this for develop? |
I got a similar test failure to your behave test failure here on my PR on develop also. |
I profiled Function: get_next_recommendations at line 53
Line # Hits Time Per Hit % Time Line Contents
==============================================================
53 @profile
54 def get_next_recommendations(user, request):
55 """Get the recommendations for the Next section, and return them as a list.
56
57 Logic:
58 Next recommendations are currently comprised of 3 main subgroups: group recommendations,
59 struggling exercises, and topic tree based data. Group recommendations consist of
60 finding the most common item tackled immediately after the most recent item, struggling
61 is determined by the "struggling" model attribute, and topic tree data is based off
62 the graphical distance between neighboring exercise/topic nodes.
63 Args:
64 user -- The current user as a facility user model object.
65
66 """
67
68 1 1792433 1792433.0 0.3 exercise_parents_table = get_exercise_parents_lookup_table()
69
70 1 5511 5511.0 0.0 most_recent = get_most_recent_exercises(user)
71
72 1 114403 114403.0 0.0 complete_exercises = set(get_completed_exercises(user))
73
74 1 10 10.0 0.0 def filter_complete(ex):
75 return ex not in complete_exercises
76
77 1 103816 103816.0 0.0 if len(most_recent) > 0 and most_recent[0] in exercise_parents_table:
78 1 20 20.0 0.0 current_subtopic = exercise_parents_table[most_recent[0]]['subtopic_id']
79 else:
80 current_subtopic = None
81
82 #logic for recommendations based off of the topic tree structure
83 1 3 3.0 0.0 if current_subtopic:
84 1 1713957 1713957.0 0.3 topic_tree_based_data = generate_recommendation_data()[current_subtopic]['related_subtopics'][:settings.TOPIC_RECOMMENDATION_DEPTH]
85 1 64643 64643.0 0.0 topic_tree_based_data = get_exercises_from_topics(topic_tree_based_data)
86 else:
87 topic_tree_based_data = []
88
89 #for checking that only exercises that have not been accessed are returned
90 16 929 58.1 0.0 topic_tree_based_data = [ex for ex in topic_tree_based_data if ex not in most_recent or filter_complete(ex)]
91
92 #logic to generate recommendations based on exercises student is struggling with
93 1 103541 103541.0 0.0 struggling = filter(filter_complete, get_exercise_prereqs(get_struggling_exercises(user)))
94
95 #logic to get recommendations based on group patterns, if applicable
96 1 514234562 514234562.0 99.2 group = filter(filter_complete, get_group_recommendations(user))
97
98 #now append titles and other metadata to each exercise id
99 1 9 9.0 0.0 final = [] # final data to return
100 6 20 3.3 0.0 for exercise_id in (group[:2] + struggling[:2] + topic_tree_based_data[:1]): #notice the concatenation
101
102 5 19 3.8 0.0 if exercise_id in exercise_parents_table:
103 4 15 3.8 0.0 subtopic_id = exercise_parents_table[exercise_id]['subtopic_id']
104 4 98 24.5 0.0 exercise = get_exercise_data(request, exercise_id)
105 4 46 11.5 0.0 exercise["topic"] = get_topic_data(request, subtopic_id)
106 4 11 2.8 0.0 final.append(exercise)
107
108
109 #final recommendations are a combination of struggling, group filtering, and topic_tree filtering
110 1 2 2.0 0.0 return final |
I'll take a look at it. |
Oh, I am slightly surprised by the location of the hotspot, but this may actually be an easier fix. Would you like me to look at fixing this on 0.15? Looking at the function, I think I can see what is going on, and where it could be optimized. |
@MCGallaspy if this is indeed the culprit:
then I think it could be collapsed into a single query without having to loop?
is the next line, which could be modified to include the exercise_id filter as well:
Or, it's possible that just querying the database like this takes forever. |
I was on a roll, so I did one last profile run: Timer unit: 4.27654e-07 s
Total time: 171.358 s
File: kalite\topic_tools\content_recommendation.py
Function: get_group_recommendations at line 111
Line # Hits Time Per Hit % Time Line Contents
==============================================================
111 @profile
112 def get_group_recommendations(user):
113 """Returns a list of exercises immediately tackled by other individuals in the same group."""
114
115 1 5287 5287.0 0.0 recent_exercises = get_most_recent_exercises(user)
116
117 1 2469 2469.0 0.0 user_list = FacilityUser.objects.filter(group=user.group)
118
119 1 98899 98899.0 0.0 if recent_exercises:
120
121 #If the user has recently engaged with exercises, use these exercises as the basis for filtering
122 1 19 19.0 0.0 user_exercises = ExerciseLog.objects\
123 1 5583 5583.0 0.0 .filter(user__in=user_list).order_by("-latest_activity_timestamp")\
124 1 6 6.0 0.0 .extra(select={'null_complete': "completion_timestamp is null"},
125 1 2627 2627.0 0.0 order_by=["-null_complete", "-completion_timestamp"])
126
127 1 9 9.0 0.0 exercise_counts = collections.defaultdict(lambda :0)
128
129 22 7033 319.7 0.0 for user in user_list:
130 21 98796 4704.6 0.0 user_logs = user_exercises.filter(user=user)
131 2681 4221168 1574.5 1.1 for i, log in enumerate(user_logs[1:]):
132 2660 396087243 148905.0 98.9 prev_log = user_logs[i]
133 2660 142120 53.4 0.0 if log.exercise_id in recent_exercises:
134 2427 20079 8.3 0.0 exercise_counts[prev_log.exercise_id] += 1
135
136 333 1168 3.5 0.0 exercise_counts = [{"exercise_id": key, "count": value} for key, value in exercise_counts.iteritems()]
137
138 else:
139 #If not, only look at the group data
140 exercise_counts = ExerciseLog.objects.filter(user__in=user_list).values("exercise_id").annotate(count=Count("exercise_id"))
141
142 #sort the results in order of highest counts to smallest
143 1 537 537.0 0.0 sorted_counts = sorted(exercise_counts, key=lambda k:k['count'], reverse=False)
144
145 #the final list of recommendations to return, WITHOUT counts
146 333 1002 3.0 0.0 group_rec = [c['exercise_id'] for c in sorted_counts]
147
148 1 3 3.0 0.0 return group_rec
Total time: 0.00688266 s
File: kalite\topic_tools\content_recommendation.py
Function: get_most_recent_exercises at line 292
Line # Hits Time Per Hit % Time Line Contents
==============================================================
292 @profile
293 def get_most_recent_exercises(user):
294 """Return a list of the most recent exercises (ids) accessed by the user."""
295
296 3 16085 5361.7 99.9 exercises_by_user = ExerciseLog.objects.filter(user=user).order_by("-latest_activity_timestamp").values_list("exercise_id", flat=True)
297
298 3 9 3.0 0.1 return exercises_by_user |
Nice, looks like the looping through each log is what is doing it, so hopefully the suggestion I made above should help? |
Yeah, I'll see if I can fiddle with the QuerySet. |
This is faster. Though I'll check also if evaluating the big queryset once is faster than evaluating several smaller querysets (as it is now). |
Tried a few variations, this is still the fastest. |
Here's the result for posterity, though perhaps as you say this could be reduced to one QuerySet with annotation: Timer unit: 4.27654e-07 s
Total time: 4.00492 s
File: kalite\topic_tools\content_recommendation.py
Function: get_group_recommendations at line 112
Line # Hits Time Per Hit % Time Line Contents
==============================================================
112 @profile
113 def get_group_recommendations(user):
114 """Returns a list of exercises immediately tackled by other individuals in the same group."""
115
116 1 23242 23242.0 0.2 recent_exercises = get_most_recent_exercises(user)
117
118 1 26296 26296.0 0.3 user_list = FacilityUser.objects.filter(group=user.group)
119
120 1 369336 369336.0 3.9 if recent_exercises:
121
122 #If the user has recently engaged with exercises, use these exercises as the basis for filtering
123 1 64 64.0 0.0 user_exercises = ExerciseLog.objects\
124 1 20651 20651.0 0.2 .filter(user__in=user_list).order_by("-latest_activity_timestamp")\
125 1 21 21.0 0.0 .extra(select={'null_complete': "completion_timestamp is null"},
126 1 9851 9851.0 0.1 order_by=["-null_complete", "-completion_timestamp"])
127
128 1 38 38.0 0.0 exercise_counts = collections.defaultdict(lambda: 0)
129
130 22 26712 1214.2 0.3 for user in user_list:
131 21 8680521 413358.1 92.7 logs = list(user_exercises.filter(user=user))
132 2681 28819 10.7 0.3 for log, prev_log in itertools.izip(logs[1:], logs):
133 2660 140679 52.9 1.5 if log.exercise_id in recent_exercises:
134 2427 28127 11.6 0.3 exercise_counts[prev_log.exercise_id] += 1
135
136 333 4691 14.1 0.1 exercise_counts = [{"exercise_id": key, "count": value} for key, value in exercise_counts.iteritems()]
137
138 else:
139 #If not, only look at the group data
140 exercise_counts = ExerciseLog.objects.filter(user__in=user_list).values("exercise_id").annotate(count=Count("exercise_id"))
141
142 #sort the results in order of highest counts to smallest
143 1 1752 1752.0 0.0 sorted_counts = sorted(exercise_counts, key=lambda k:k['count'], reverse=False)
144
145 #the final list of recommendations to return, WITHOUT counts
146 333 4055 12.2 0.0 group_rec = [c['exercise_id'] for c in sorted_counts]
147
148 1 11 11.0 0.0 return group_rec |
With the test failures, looks like the given elements have zero area -- my guess is that throws off the click action. |
This might be an exciting opportunity to write a test case for selenium! |
The tests select and try to click a zero-area element -- it doesn't do what's intended, notably load another page. Instead, the intent is to click one of that element's children with a non-zero area.
This works for me. |
Plan for merging into develop? |
Merged into develop already :) |
Fixed #4603. |
commit message:
@rtibbles inspite of the fact that no errors are now raised, the content_recommender api endpoint takes a really long time to return (one run took me 3.7 minutes). I think we can do two things (perhaps for 0.16), let me know if you agree: