-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor cron jobs with Sentry #4421
Conversation
903fcc6
to
4dcbf22
Compare
dc61bd5
to
2d9795f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, Jon. I've made a couple of tactical comments, which I hope you'll find useful.
Strategically, I'm unsure about the rationale for the Cron
class. If you discussed Cron
when incorporating a similar change into Metrics, then I apologise. Setting that caveat aside, when I was reading about Sentry Crons, I wondered why we weren't doing something like the following in services/sentry.py
:
def monitor(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
*_, schedule, module_name = func.__module__.split(".")
monitor_config = {"schedule": {"type": "crontab", "value": f"@{schedule}"}}
sentry_sdk.crons.monitor(
monitor_slug=module_name, monitor_config=monitor_config
)(func(*args, **kwargs))
return wrapper
And then something like the following when creating a job, such as in jobserver/jobs/daily/clearsessions.py
:
class Job(DailyJob):
@sentry.monitor
def execute(self):
...
What does the Cron
class provide that sentry_sdk.crons.monitor
doesn't?
@monitor | ||
class Job(DailyJob): # pragma: no cover | ||
|
||
def execute(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this live in tests/unit/jobserver/jobs/test_decorators.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to mimic the path structure of the real jobs in order to test the derivation of the schedule from the module path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a bad code smell! How about something like...
from jobserver.jobs.daily.clearsessions import Job
Job.__base__.__name__.removesuffix("Job").lower() # "daily"
I can't find __base__
documented, which is slightly unnerving. However, __bases__
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a bad code smell!
I get you, but AFAICT the path structure is what drives the django job engine (or maybe the Job
subtype, in which case we could parse that instead) so I felt OK mimicking that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the following decorator
def monitor(func):
# wrapper for a Django Job's execute method to add Sentry Cron monitoring
@functools.wraps(func)
def wrapper(*args, **kwargs):
*_, schedule , module_name = func.__module__.split(".")
monitor_config = {
"schedule": {"type": "crontab", "value": f"@{schedule}"},
"timezone": "Etc/UTC",
# If an expected check-in doesn't come in `checkin_margin`
# minutes, it'll be considered missed
"checkin_margin": 30,
# The check-in is allowed to run for `max_runtime` minutes
# before it's considered failed
"max_runtime": 10,
# It'll take `failure_issue_threshold` consecutive failed
# check-ins to create an issue
"failure_issue_threshold": 1,
# It'll take `recovery_threshold` OK check-ins to resolve
# an issue
"recovery_threshold": 1,
}
sentry_sdk.crons.monitor(
monitor_slug=module_name, monitor_config=monitor_config
)(func(*args, **kwargs))
return wrapper
and the following Job
class
from django_extensions.management.jobs import DailyJob
from services.sentry import monitor
class Job(DailyJob):
@monitor
def execute(self):
...
AFAICT it's not possible to get the Job
class (and therefore it's __base__
) from func
within the wrapper
method.
func
has __qualname__
which in the case of declaring the class in test_sentry.test_monitor()
is test_mnonbitor.<locals>.Job.execute
which is a bit of the puzzle but I can't seem to get the actual Job
type out of any of the dunder variables.
despite being smelly, pulling the schedule out of the module path works
In this context, actually nothing. With the way that tasks are executed in Metrics it was easier to use manual check-ins, but that's not true here. Using the sentry-provided |
The one issue that strikes me with putting the decorator in |
b340ca0
to
90bfbb1
Compare
@iaindillingham I've rewritten this based on your comments. The smell of deriving the schedule from the module path persists, but I can't find a better way to do this and have explained/justified this in the commit message. I'd appreciate you taking another look :) |
90bfbb1
to
615bccc
Compare
Thinking some more about it, the assumptions stated in the commit message for d425665 could be asserted in the decorator, which might be a bit more defensive - do you think this is a good idea? Thinking about this even more, that would further complicate the test because as it stands the test |
Thanks, @Jongmassey. Before going any further, could you... python manage.py runjob jobserver clearsessions The The result tells us that decorators are beguilingly simple. In other words, there's a big difference between the following: @my_decorator
def my_function():
...
@my_decorator()
def my_function():
... Check the parentheses! If you're interested in learning more, then see "Defining a Decorator That Takes Arguments" in the Python Cookbook. The result also tells us that the test is insufficient. But rather than digging deeper, let's take several steps back. Ultimately, we want to call from django.core.management import call_command
from django_extensions.management.jobs import DailyJob
from sentry_sdk.crons import monitor
DEFAULT_MONITOR_CONFIG = {"monitor_config": {"type": "crontab", "value": None}}
daily_monitor_config = dict(DEFAULT_MONITOR_CONFIG)
daily_monitor_config["monitor_config"]["value"] = "@daily"
# etc. etc.
class Job(DailyJob):
help = "Do something every day"
@monitor(monitor_slug="do_something", monitor_config=daily_monitor_config)
def execute(self):
call_command("do_something") We could put the configuration in You may object to having four (four!) places where we define the schedule: |
Thanks for the spot - VS Code's autocomplete seems desparate to put the parens in when I don't want them |
615bccc
to
9134d09
Compare
Running python manage.py runjob jobserver clearsessions produces no output |
Could you expand on that? I was mistaken it is being called. |
I can confirm after a bit of digging that when calling the management command you stated, it does indeed make the correct API call to Sentry, it's just that we have hit the limits of our plan |
bade1fd
to
7c08f14
Compare
@iaindillingham f2dd640 now checks that each Job's execute method has been correctly annotated in about a robust a way as I can muster. I can't promise it'll catch every possible mis-decoration but would certainly catch the one that VS Code inflicted upon me! |
7c08f14
to
f2dd640
Compare
Thanks, @Jongmassey. I'm afraid I don't understand why we're not taking several steps back. I mentioned this in a previous comment. We seem to be adding a lot of accidental complexity to save us from repeating a small amount of information, some of which we already repeat elsewhere (the schedule) and some of which may end up being different anyway (the slug). Part of the issue relates to the design of django-extensions "Jobs Scheduling" functionality (spoiler: it doesn't actually schedule jobs). However, we shouldn't let that design add accidental complexity into either our application or our test code. Perhaps I've missed something, though. We have two approaches:
What does the first approach offer that the second approach doesn't? |
As you've already identified there are three definitions for when a given scheduled task will run: dokku cron configuration$ dokku cron:list job-server
ID Schedule Command
33ffvw0nlyj7jjyndqjbk8ft6cr0h8z3q93h6n39qbre4qxdc4w480d4evshvg2tnlx2yyv61tycs905au7t @hourly python manage.py runjobs hourly
27bsuol4zuiy5vbz4yodedbjbvd6mtrnuqxq9ejg69fdyo3ankx4oyuptft8waug3tasridzrgtfh8v3t @daily python manage.py runjobs daily
33ffvw0nlyj7jjyndqjbk8ft6cr0h8z3q93h6n39qbre4qxdc4w480d4evsirbyh1til3xpbynj6qpezy8vt @weekly python manage.py runjobs weekly
4ciq5z8vdwnmhvxwlgvol43n227fc89eo4vjubeblfaa2mbf13gpvqtzkydv78lv60f384x0beybkc47fipkcc9 @monthly python manage.py runjobs monthly task module path~/opensafely-core/job-server$ tree jobserver/jobs/
jobserver/jobs/
├── __init__.py
├── daily
│ ├── __init__.py
│ ├── clearsessions.py
│ └── update_repo_has_github_outputs.py
├── hourly
│ ├── __init__.py
├── monthly
│ ├── __init__.py
├── weekly
│ ├── __init__.py
│ │ └── notify_impending_copilot_support_windows.cpython-312.pyc
│ └── notify_impending_copilot_support_windows.py
└── yearly
├── __init__.py task superclass~/opensafely-core/job-server/jobserver/jobs$ grep class **/*[!\_].py
daily/clearsessions.py:class Job(DailyJob):
daily/update_repo_has_github_outputs.py:class Job(DailyJob):
hourly/dump_db.py:class Job(HourlyJob):
weekly/notify_impending_copilot_support_windows.py:class Job(WeeklyJob): However, if one were to declare a if when and not (job.when == when or job.when is None):
raise JobError("Job %s is not a %s job." % (jobmodule, when)) so we can assert that a well-defined Django Job management Job's superclass-inferred and path-inferred schedules to be the same. Whether dokku has been correctly configured wrt the cron jobs needed for django job management is a systems administration question. I'm loathe to add another definition of when a scheduled task should run (i.e in the sentry monitor configuration), along with a redefinition of the task's name. This feels like a potential footgun on both of these fronts, when both of these values are obtainable from their existing definitions. |
This is a lot of code to avoid moving from three places where we define the schedule to four places where we define the schedule (or from two to three, if you prefer). I understand that you don't like it; I don't like it either. But the cost of the additional code isn't worth the benefit. Take a look at #4470 for an alternative approach. |
We believe that all scheduled jobs should have the same monitor config save for the schedule on which they are expected to execute. We may have to revisit this in future based on observation of job execution times and their reported check-in times with Sentry.
f2dd640
to
9467ff4
Compare
Sentry cron check-in events (#4421) contain float-typed fields which the parse() method would try to iterate whilst searching for sensitive tokens, causing an error. The tokens listed in this method are all strings so safe to exclude floats for now.
fixes #4418