-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figures 0.4 learner progress overview performance boost #301
Conversation
This is some groundwork to improving the "learner-metrics" API endpoint. Added new function, 'course_grade_from_course_id' to figures.compat to simplify and abstract retrieving the course grade structure from CoureGradeFactory. Also started with moving edx-platform imports into compat from other modules to centralize the dependency on the platform to a single module. The goal is for figures.compat to serve as an adapter to direct access to platform code
Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data 0.4 LPO performance update - new EnrollmentProgress class Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data 0.4 LPO performance update - new EnrollmentProgress class Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data 0.4 LPO performance update - new EnrollmentProgress class Added a new package, 'figures.progress' instead of adding to 'figures.metrics' in order to prevent cyclical dependencies causing errors in figures.models This is a stripped down version of figures.metrics.LearnerCourseGrades class. The purpose is to minimize queries and object construction in order to get course structure data
* renamed a function from `_add_<x>` to `_new_<x>` to make it clearer that we are creating new data * Make a couple of the lines shorter for better viewing in dual column mode w/out word wrap
* This commit adds a new model to Figures, "EnrollmentData" * This commit is a foundational piece to improve the learner progress overview page performance The purpose of this is to allow learner metrics data to be retrieved with a minimum of queries. This model collects a snapshot of a learner's progress for a given course (an "enrollment"). It forms the foundation to speed up the 'learner-metrics' API endpoint by reducing the queries needed to collect data. Prior to this work, 'learner-metrics' needed to query the platform's CourseEnrollment model and find the most recent instance of Figures LearnerCourseGradeMetrics model (if it exists: The learner needs to make course progress for it to exist) for the course. All this is quite expensive. Included in this commit is the model, migration, admin update, EnrollmentDataFactory for tests and the most barebones of test cases specifically for the model. Testing needs to be expanded. Read the module docstring in 'test_enrollment_data_model' for specific tests we should construct
Added new viewset and new serializers for version 2 of the 'learner-metrics' endpoint We keep the old viewset and serializers (for now) to test data correctness with a remote client (which I will write) and to benchmark relative performance differences. The old viewset is moved to endpoint `/figures/api/learner-metrics-v1` The new viewset uses the existing endpoint. This is so we can test the front end against the new endpoint Renamed the test module for v1 of the viewset Added a new test module for the v2 of the viewset, but we need to update the test data to add mock EnrollmentData objects as the tests now fail, so the test module has 'pytest.skip' decoration.
115ed66
to
96ada6d
Compare
Added convenience method to figures.backfill to create EnrollmentData records for the specified site a new deployment of this feature. This means this method needs to be called for every site to fully update the Open edX installation that uses this Added devsite.seed method to backfill all sites in devsite l
Changed ProgressOverview.js to use the key 'enrollmentdata_set' instead of 'enrollments' because of some fussiness with Django REST Framework nested serializers and Django Queries 'prefetch_related' So to simplify the implementation under a tight deadline, we just change the key the front end uses for now See the code changes to figures/serializers.py for a code view explanation
Adds exception handling to figures.compat.course_grade_from_course_id. See the method docstring for details
In the figures.backfill.backfill_enrollment_data_for_site function: If an edx-platform `course` structure cannot be found from the course id, the error will be trapped and added to the 'errors' key in the results. The purpose of this is to not fail the whole site processing if there are defects for a single course. This can happen if there is an "unlinked CourseEnrollment" where the course content was deleted or the course key changed. This should not happen in production, but does happen on staging, so we handle it
Added Celery task and Django management command to update EnrollmentData * The Celery task adds updating EnrollmentData with the daily metrics pipeline task * The Django management command triggers the celery task
Changed update_figures_enrollment_data Django management command to use the figures.task.update_enrollment_data so it can be run in celery
Issue with this particular use of `CourseEnrollmentFactory(course=X,...) in Ginkgo MOX. Did some basic investigation. Interesting thing is that this construct passes in other places, so it my be some Pytest fixture issue. For now just skipping, we'll need to test though before deploying on Ginkgo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johnbaldwin. It's pretty big for me to review it. I think we can merge once tests passes and depend on staging tests this time instead of an extensive code review.
else: # Assuming 1.11+ | ||
dependencies = [ | ||
migrations.swappable_dependency(settings.AUTH_USER_MODEL), | ||
('sites', '0002_alter_domain_unique'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work, no need to specify the latest migration from sites
, so if you'd like it's possible to remove this if
statement
('sites', '0002_alter_domain_unique'), | |
('sites', '0001_initial'), |
# Would be great to be able to filter out dead sites | ||
# Would be really great to be able to filter out dead sites | ||
# Would be really Really great to be able to filter out dead sites | ||
# Would be really Really REALLY great to be able to filter out dead sites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get you John. One idea is to soft delete: https://appsembler.atlassian.net/wiki/spaces/RT/blog/2020/12/10/1160708308/Soft+deleting+dead+sites
cc: @AhmedAljazzar
3b6230f
to
ac7fb39
Compare
Changed to use 'site_id' instead of 'site' in the celery task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @johnbaldwin. My other comment had some random characters, I don't know why.
It's pretty big for me to review it, which is very understandable given the performance issues we're having. I think we can merge once tests passes and depend on staging tests this time instead of an extensive code review.
Looks like all we needed to do was add `.distinct()` to the queryset results
The core idea is that we have a new model,
figures.models.EnrollmentData
that can be retrieved with aprefetch_related
call so there is only one additional query than querying the users (which is the first query). I still have query tweaking to do, but want to seeA. How this performs on staging
B. Check the data are correct (by comparing to the current endpoint which I moved to
/figures/api/learner-metrics-v1/
I'm testing this PR's branch on staging (well, about to).
This is mostly complete. What's left
I'm probably going to do a Django management command to do a data migration. This way it gets triggered automatically on installation. Right now manual work.
https://appsembler.atlassian.net/browse/RED-1689