Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figures 0.4 learner progress overview performance boost #301

Merged
merged 27 commits into from
Dec 11, 2020

Conversation

johnbaldwin
Copy link
Contributor

@johnbaldwin johnbaldwin commented Dec 9, 2020

  • Sorry this is big, I'm in a rush and need to make sure the parts work, but I've layered the PR into individual commits focused on functional areas
  • The PR commit messages should be pretty detailed

The core idea is that we have a new model, figures.models.EnrollmentData that can be retrieved with a prefetch_related call so there is only one additional query than querying the users (which is the first query). I still have query tweaking to do, but want to see
A. How this performs on staging
B. Check the data are correct (by comparing to the current endpoint which I moved to /figures/api/learner-metrics-v1/

I'm testing this PR's branch on staging (well, about to).

This is mostly complete. What's left

  1. Backfilling EnrollmentData for existing CourseEnrollment and LearnerCourseGradeMetrics (LCGM) records

I'm probably going to do a Django management command to do a data migration. This way it gets triggered automatically on installation. Right now manual work.

  1. The EnrollmentData records need to get updated when there is a new CourseEnrollment or a new LCGM record. I'm probably going to implement this with Django Signals to trigger the EnrollmentData record update

https://appsembler.atlassian.net/browse/RED-1689

This is some groundwork to improving the "learner-metrics" API endpoint.

Added new function, 'course_grade_from_course_id' to figures.compat to
simplify and abstract retrieving the course grade structure from
CoureGradeFactory.

Also started with moving edx-platform imports into compat from other
modules to centralize the dependency on the platform to a single module.
The goal is for figures.compat to serve as an adapter to direct access
to platform code
Added a new package, 'figures.progress' instead of adding to
'figures.metrics' in order to prevent cyclical dependencies causing
errors in figures.models

This is a stripped down version of figures.metrics.LearnerCourseGrades
class. The purpose is to minimize queries and object construction in
order to get course structure data
0.4 LPO performance update - new EnrollmentProgress class

Added a new package, 'figures.progress' instead of adding to
'figures.metrics' in order to prevent cyclical dependencies causing
errors in figures.models

This is a stripped down version of figures.metrics.LearnerCourseGrades
class. The purpose is to minimize queries and object construction in
order to get course structure data
0.4 LPO performance update - new EnrollmentProgress class

Added a new package, 'figures.progress' instead of adding to
'figures.metrics' in order to prevent cyclical dependencies causing
errors in figures.models

This is a stripped down version of figures.metrics.LearnerCourseGrades
class. The purpose is to minimize queries and object construction in
order to get course structure data
0.4 LPO performance update - new EnrollmentProgress class

Added a new package, 'figures.progress' instead of adding to
'figures.metrics' in order to prevent cyclical dependencies causing
errors in figures.models

This is a stripped down version of figures.metrics.LearnerCourseGrades
class. The purpose is to minimize queries and object construction in
order to get course structure data
* renamed a function from `_add_<x>` to `_new_<x>` to make it clearer
that we are creating new data
* Make a couple of the lines shorter for better viewing in dual column
mode w/out word wrap
* This commit adds a new model to Figures, "EnrollmentData"
* This commit is a foundational piece to improve the learner progress
overview page performance

The purpose of this is to allow learner metrics data to be retrieved
with a minimum of queries. This model collects a snapshot of a learner's
progress for a given course (an "enrollment"). It forms the foundation
to speed up the 'learner-metrics' API endpoint by reducing the queries
needed to collect data. Prior to this work, 'learner-metrics' needed to
query the platform's CourseEnrollment model and find the most recent
instance of Figures LearnerCourseGradeMetrics model (if it exists: The
learner needs to make course progress for it to exist) for the course.
All this is quite expensive.

Included in this commit is the model, migration, admin update,
EnrollmentDataFactory for tests and the most barebones of test cases
specifically for the model.

Testing needs to be expanded. Read the module docstring in
'test_enrollment_data_model' for specific tests we should construct
Added new viewset and new serializers for version 2 of the
'learner-metrics' endpoint

We keep the old viewset and serializers (for now) to test data
correctness with a remote client (which I will write) and to benchmark
relative performance differences.

The old viewset is moved to endpoint `/figures/api/learner-metrics-v1`

The new viewset uses the existing endpoint. This is so we can test the
front end against the new endpoint

Renamed the test module for v1 of the viewset

Added a new test module for the v2 of the viewset, but we need to update
the test data to add mock EnrollmentData objects as the tests now fail,
so the test module has 'pytest.skip' decoration.
@johnbaldwin johnbaldwin force-pushed the john/0.4-lpo-performance-boost branch from 115ed66 to 96ada6d Compare December 9, 2020 17:44
Added convenience method to figures.backfill to create EnrollmentData
records for the specified site a new deployment of this feature. This
means this method needs to be called for every site to fully update the
Open edX installation that uses this

Added devsite.seed method to backfill all sites in devsite
l
Changed ProgressOverview.js to use the key 'enrollmentdata_set' instead
of 'enrollments' because of some fussiness with Django REST Framework
nested serializers and Django Queries 'prefetch_related'

So to simplify the implementation under a tight deadline, we just change
the key the front end uses for now

See the code changes to figures/serializers.py for a code view
explanation
Adds exception handling to figures.compat.course_grade_from_course_id.

See the method docstring for details
In the figures.backfill.backfill_enrollment_data_for_site function:

If an edx-platform `course` structure cannot be found from the course
id, the error will be trapped and added to the 'errors' key in the
results. The purpose of this is to not fail the whole site processing if
there are defects for a single course.

This can happen if there is an "unlinked CourseEnrollment" where the
course content was deleted or the course key changed. This should not
happen in production, but does happen on staging, so we handle it
Added Celery task and Django management command to update EnrollmentData

* The Celery task adds updating EnrollmentData with the daily metrics
pipeline task
* The Django management command triggers the celery task
Changed update_figures_enrollment_data Django management command to use
the figures.task.update_enrollment_data so it can be run in celery
Issue with this particular use of `CourseEnrollmentFactory(course=X,...)
in Ginkgo MOX. Did some basic investigation. Interesting thing is that
this construct passes in other places, so it my be some Pytest fixture
issue. For now just skipping, we'll need to test though before deploying
on Ginkgo
@johnbaldwin johnbaldwin marked this pull request as ready for review December 10, 2020 11:57
Copy link
Contributor

@OmarIthawi OmarIthawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnbaldwin. It's pretty big for me to review it. I think we can merge once tests passes and depend on staging tests this time instead of an extensive code review.

else: # Assuming 1.11+
dependencies = [
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
('sites', '0002_alter_domain_unique'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work, no need to specify the latest migration from sites, so if you'd like it's possible to remove this if statement

Suggested change
('sites', '0002_alter_domain_unique'),
('sites', '0001_initial'),

Comment on lines +46 to +49
# Would be great to be able to filter out dead sites
# Would be really great to be able to filter out dead sites
# Would be really Really great to be able to filter out dead sites
# Would be really Really REALLY great to be able to filter out dead sites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbaldwin johnbaldwin force-pushed the john/0.4-lpo-performance-boost branch 2 times, most recently from 3b6230f to ac7fb39 Compare December 10, 2020 13:11
Copy link
Contributor

@OmarIthawi OmarIthawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @johnbaldwin. My other comment had some random characters, I don't know why.

It's pretty big for me to review it, which is very understandable given the performance issues we're having. I think we can merge once tests passes and depend on staging tests this time instead of an extensive code review.

Looks like all we needed to do was add `.distinct()` to the queryset
results
@johnbaldwin johnbaldwin merged commit 159668a into master Dec 11, 2020
@johnbaldwin johnbaldwin deleted the john/0.4-lpo-performance-boost branch December 11, 2020 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants