-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline improvement prerequisites #427
Conversation
This commit adds ENV_TOKENS['FIGURES']['DAILY_TASK'] override to define which Celery task is called by CeleryBeat to run the daily data update
figures.sites.default_site() is returning the Site object matching `settings.SITE_ID` if we are running in single site mode and that is exactly what is called in single site mode for `get_site_for_course(course_id)`, so just making it that the source of truth is the same code
This is a convenience function to get the previous calendar day from th ecurrent UTC time. This is the date for which the pipeline uses to collect, save data and aggregate metrics for the daily jobs
Fixes: * Apparently devsite/settings.py broke with the Celery routing addition. This didn't impact `test_settings` but would fail to run devsite `manage.py` which relies on `devsite/settings.py`. This commit fixes that * Added `update_celery_routes` to `devsite/settings.py` as part of the Celery fix to make it to prepare for a better devsite Celery story to make Figures Celery task development a better experience Improvements: * Updated devsite's default Open edX version to Juniper
This commit contains changes to prepare for adding new daily metrics workflow to speed up daily job execution * Added new EnrollmentData model manager method to handle and encapsulate EnrollmentData/LCGM update logic given a `CourseEnrollment` record as parameter. Added unit tests for this new method. * Adds `collect_elapsed` field to EnrollmentData and LCGM to track how long it takes to collect the progress data for that enrollment. The big cost here is currently to `CourseGradeFactory()` * Added convenience method to StudentModuleFactory to create one form a `CourseEnrollment` record instead of having to pass in two variables (user and course_id) to construct a test StudentModule record to match an already constructed CourseEnrollment record
Thanks John! I'll review it today. |
We primarily use this for the daily metrics collection. However, it proves | ||
handy as a convenience function for exploring data in the Django shell | ||
""" | ||
return prev_day(datetime.datetime.utcnow().date()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My standard advice anytime I see datetime.now()
or an equivalent, is that you should allow a now
parameter to be passed in in place of it. Ie,
def utc_yesterday(now=None):
if not now:
now = datetime.datetime.utcnow()
return prev_day(now.date())
This makes it considerably easier to test, particularly for the weird edge cases that come up around dates (daylight savings, leap years, etc.) and eliminate the flakey tests that can come up when tests potentially behave differently depending on when they are run (what happens when this code is run exactly at midnight?).
Code that calls utc_yesterday()
also ought to allow their "now" to be overridden for the same reasons, though they would then only need to pass that along to utc_yesterday()
. That also tends to make it simpler to implement things like backfill operations, since you can run some code as if it were run at some different time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You raise an excellent point make a good argument. I started to draft a reply, but then the reply started to grow longer. So I moved that off for later follow up so I can focus on getting part 2 in PR and come back to this topic. I just wanted to thank you now for reviewing so soon after I posted the PR before the weekend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johnbaldwin! Looks good to me.
I've added one question about timeit
and tests for utc_yesterday
but none are blockers.
figures/models.py
Outdated
@@ -5,6 +5,7 @@ | |||
|
|||
from __future__ import absolute_import | |||
from datetime import date | |||
import timeit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you timeit
safe for production use? I don't know. I've checked https://docs.python.org/3/library/timeit.html and it shows a GCP related note:
Note By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. The disadvantage is that GC may be an important component of the performance of the function being measured. If so, GC can be re-enabled as the first statement in the setup string. For example
I think it's only related to the following usage timeit.Timer('for i in range(10): oct(i)', 'gc.enable()').timeit()
.
I think it's safe, but want to double check if we've missed something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Omar! To be on the safe side, I'll just capture start and end times without timeit
@@ -188,6 +194,15 @@ def prev_day(val): | |||
return days_from(val, -1) | |||
|
|||
|
|||
def utc_yesterday(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a test case for this function? One happy scenario should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
* Replaced `timeit` with `time` to capture EnrollmentData collection timing * Added basic test for `figures.helpers.utc_yesterday`
This PR contains updates to Figures as prerequisites to adding the second switchable workflow
Each PR has it's own description and context. So please read the changes in calendar order so you see the details to the story.
Why this PR exists and high level story
What's the scope of this PR
What's IN this PR?
ENV_TOKENS
flag to choose an override task to kick of the daily Celery top level task or use the default one: 926cd7cfigures.helpers
: 42ce02dIt's the last PR that is most important (I repeat myself because it is the most important. There, I've done it again). Anyway, with the new workflow, we will invert the relationship of
LearnerCourseGradeMetrics
(LCGM) andEnrollmentData
where we write first toEnrollmentData
(let's call him "ed") and use ed as the primary for data retrieval in the API (we started that already when we introduced ed as a model in figures). LCGM gets pushed back to be a historical record to track the time series per-enrollment performanceBut we're not there yet, lots of refactoring to do to get there. What no. 5 commit above does is abstract this relationship from the rest of Figures and isolate the progress retrieval to make Figures safer, faster and more fun to program on