Figures 0.4 learner progress overview performance boost #301

johnbaldwin · 2020-12-09T17:31:59Z

Sorry this is big, I'm in a rush and need to make sure the parts work, but I've layered the PR into individual commits focused on functional areas
The PR commit messages should be pretty detailed

The core idea is that we have a new model, figures.models.EnrollmentData that can be retrieved with a prefetch_related call so there is only one additional query than querying the users (which is the first query). I still have query tweaking to do, but want to see
A. How this performs on staging
B. Check the data are correct (by comparing to the current endpoint which I moved to /figures/api/learner-metrics-v1/

I'm testing this PR's branch on staging (well, about to).

This is mostly complete. What's left

Backfilling EnrollmentData for existing CourseEnrollment and LearnerCourseGradeMetrics (LCGM) records

I'm probably going to do a Django management command to do a data migration. This way it gets triggered automatically on installation. Right now manual work.

The EnrollmentData records need to get updated when there is a new CourseEnrollment or a new LCGM record. I'm probably going to implement this with Django Signals to trigger the EnrollmentData record update

https://appsembler.atlassian.net/browse/RED-1689

This is some groundwork to improving the "learner-metrics" API endpoint. Added new function, 'course_grade_from_course_id' to figures.compat to simplify and abstract retrieving the course grade structure from CoureGradeFactory. Also started with moving edx-platform imports into compat from other modules to centralize the dependency on the platform to a single module. The goal is for figures.compat to serve as an adapter to direct access to platform code

Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data 0.4 LPO performance update - new EnrollmentProgress class Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data 0.4 LPO performance update - new EnrollmentProgress class Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data 0.4 LPO performance update - new EnrollmentProgress class Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data

* renamed a function from `_add_<x>` to `_new_<x>` to make it clearer that we are creating new data * Make a couple of the lines shorter for better viewing in dual column mode w/out word wrap

* This commit adds a new model to Figures, "EnrollmentData" * This commit is a foundational piece to improve the learner progress overview page performance The purpose of this is to allow learner metrics data to be retrieved with a minimum of queries. This model collects a snapshot of a learner's progress for a given course (an "enrollment"). It forms the foundation to speed up the 'learner-metrics' API endpoint by reducing the queries needed to collect data. Prior to this work, 'learner-metrics' needed to query the platform's CourseEnrollment model and find the most recent instance of Figures LearnerCourseGradeMetrics model (if it exists: The learner needs to make course progress for it to exist) for the course. All this is quite expensive. Included in this commit is the model, migration, admin update, EnrollmentDataFactory for tests and the most barebones of test cases specifically for the model. Testing needs to be expanded. Read the module docstring in 'test_enrollment_data_model' for specific tests we should construct

Added new viewset and new serializers for version 2 of the 'learner-metrics' endpoint We keep the old viewset and serializers (for now) to test data correctness with a remote client (which I will write) and to benchmark relative performance differences. The old viewset is moved to endpoint `/figures/api/learner-metrics-v1` The new viewset uses the existing endpoint. This is so we can test the front end against the new endpoint Renamed the test module for v1 of the viewset Added a new test module for the v2 of the viewset, but we need to update the test data to add mock EnrollmentData objects as the tests now fail, so the test module has 'pytest.skip' decoration.

Added convenience method to figures.backfill to create EnrollmentData records for the specified site a new deployment of this feature. This means this method needs to be called for every site to fully update the Open edX installation that uses this Added devsite.seed method to backfill all sites in devsite l

Changed ProgressOverview.js to use the key 'enrollmentdata_set' instead of 'enrollments' because of some fussiness with Django REST Framework nested serializers and Django Queries 'prefetch_related' So to simplify the implementation under a tight deadline, we just change the key the front end uses for now See the code changes to figures/serializers.py for a code view explanation

Adds exception handling to figures.compat.course_grade_from_course_id. See the method docstring for details

In the figures.backfill.backfill_enrollment_data_for_site function: If an edx-platform `course` structure cannot be found from the course id, the error will be trapped and added to the 'errors' key in the results. The purpose of this is to not fail the whole site processing if there are defects for a single course. This can happen if there is an "unlinked CourseEnrollment" where the course content was deleted or the course key changed. This should not happen in production, but does happen on staging, so we handle it

Added Celery task and Django management command to update EnrollmentData * The Celery task adds updating EnrollmentData with the daily metrics pipeline task * The Django management command triggers the celery task

Changed update_figures_enrollment_data Django management command to use the figures.task.update_enrollment_data so it can be run in celery

Issue with this particular use of `CourseEnrollmentFactory(course=X,...) in Ginkgo MOX. Did some basic investigation. Interesting thing is that this construct passes in other places, so it my be some Pytest fixture issue. For now just skipping, we'll need to test though before deploying on Ginkgo

OmarIthawi

Thanks @johnbaldwin. It's pretty big for me to review it. I think we can merge once tests passes and depend on staging tests this time instead of an extensive code review.

OmarIthawi · 2020-12-10T12:18:35Z

figures/migrations/0015_add_enrollment_data_model.py

+    else:  # Assuming 1.11+
+        dependencies = [
+            migrations.swappable_dependency(settings.AUTH_USER_MODEL),
+            ('sites', '0002_alter_domain_unique'),


This should work, no need to specify the latest migration from sites, so if you'd like it's possible to remove this if statement

Suggested change

('sites', '0002_alter_domain_unique'),

('sites', '0001_initial'),

OmarIthawi · 2020-12-10T12:30:10Z

figures/management/commands/update_figures_enrollment_data.py

+            # Would be great to be able to filter out dead sites
+            # Would be really great to be able to filter out dead sites
+            # Would be really Really great to be able to filter out dead sites
+            # Would be really Really REALLY great to be able to filter out dead sites


I get you John. One idea is to soft delete: https://appsembler.atlassian.net/wiki/spaces/RT/blog/2020/12/10/1160708308/Soft+deleting+dead+sites

cc: @AhmedAljazzar

Changed to use 'site_id' instead of 'site' in the celery task

OmarIthawi

Thanks again @johnbaldwin. My other comment had some random characters, I don't know why.

It's pretty big for me to review it, which is very understandable given the performance issues we're having. I think we can merge once tests passes and depend on staging tests this time instead of an extensive code review.

Looks like all we needed to do was add `.distinct()` to the queryset results

johnbaldwin added 4 commits December 9, 2020 17:57

Light refactoring of pipeline 'enrollment_metrics' module

7d27ab9

* renamed a function from `_add_<x>` to `_new_<x>` to make it clearer that we are creating new data * Make a couple of the lines shorter for better viewing in dual column mode w/out word wrap

johnbaldwin requested review from OmarIthawi and melvinsoft December 9, 2020 17:32

johnbaldwin force-pushed the john/0.4-lpo-performance-boost branch from 115ed66 to 96ada6d Compare December 9, 2020 17:44

johnbaldwin added 13 commits December 9, 2020 18:45

0.4 LPO performance update - figures.models - comment string

2d089ad

0.4 LPO performance update - devsite flake8 fix

ff5e375

0.4 LPO performance update - Add exception handling to compat

1edae0b

Adds exception handling to figures.compat.course_grade_from_course_id. See the method docstring for details

0.4 LPO performance update - EnrollmentData update task and command

37c3990

Added Celery task and Django management command to update EnrollmentData * The Celery task adds updating EnrollmentData with the daily metrics pipeline task * The Django management command triggers the celery task

0.4 - performance update - Fix 0015 migration Ginkgo compat issue

a018f98

0.4 LPO performance update - figures.compat suppress pylint error

9f29ea6

0.4 LPO performance improvement figures.serializers fix pylint error

fb90299

0.4 LPO performance update - Fixed enrollment data management command

5ec48d5

Changed update_figures_enrollment_data Django management command to use the figures.task.update_enrollment_data so it can be run in celery

Merge branch 'master' into john/0.4-lpo-performance-boost

8bf3bf9

johnbaldwin marked this pull request as ready for review December 10, 2020 11:57

johnbaldwin mentioned this pull request Dec 10, 2020

Bump release version to 0.4.dev3 #303

Merged

johnbaldwin added 2 commits December 10, 2020 13:14

Merge branch 'master' into john/0.4-lpo-performance-boost

071e13a

Merge branch 'master' into john/0.4-lpo-performance-boost

ac7fb39

OmarIthawi approved these changes Dec 10, 2020

View reviewed changes

johnbaldwin force-pushed the john/0.4-lpo-performance-boost branch 2 times, most recently from 3b6230f to ac7fb39 Compare December 10, 2020 13:11

johnbaldwin added 2 commits December 10, 2020 14:36

0.4 LPO performance update - fix command update_figures_enrollment_data

5ed449f

Changed to use 'site_id' instead of 'site' in the celery task

Added missing 'sqlite' schema to devsite.settings

4f63c45

johnbaldwin added 3 commits December 10, 2020 17:39

0.4 LP performance update - pylint fix on figures.compat

f7bfa4a

0.4 performance update - Fixed missing 'id' keyword arg figures.tasks

cc7ab5e

.gitignore update

a73fcbd

OmarIthawi approved these changes Dec 11, 2020

View reviewed changes

johnbaldwin added 2 commits December 11, 2020 10:09

0.4 LPO progress update - fixed task typo

82a9f42

0.4 LPO performance update - Fixed dupe learners

65174d3

Looks like all we needed to do was add `.distinct()` to the queryset results

johnbaldwin merged commit 159668a into master Dec 11, 2020

johnbaldwin deleted the john/0.4-lpo-performance-boost branch December 11, 2020 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figures 0.4 learner progress overview performance boost #301

Figures 0.4 learner progress overview performance boost #301

johnbaldwin commented Dec 9, 2020 •

edited

Loading

OmarIthawi left a comment •

edited

Loading

OmarIthawi Dec 10, 2020

OmarIthawi Dec 10, 2020

OmarIthawi left a comment

	('sites', '0002_alter_domain_unique'),
	('sites', '0001_initial'),

Figures 0.4 learner progress overview performance boost #301

Figures 0.4 learner progress overview performance boost #301

Conversation

johnbaldwin commented Dec 9, 2020 • edited Loading

OmarIthawi left a comment • edited Loading

Choose a reason for hiding this comment

OmarIthawi Dec 10, 2020

Choose a reason for hiding this comment

OmarIthawi Dec 10, 2020

Choose a reason for hiding this comment

OmarIthawi left a comment

Choose a reason for hiding this comment

johnbaldwin commented Dec 9, 2020 •

edited

Loading

OmarIthawi left a comment •

edited

Loading