Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor cron jobs with Sentry #4421

Merged
merged 2 commits into from
Jul 25, 2024

Conversation

Jongmassey
Copy link
Contributor

@Jongmassey Jongmassey commented Jul 4, 2024

fixes #4418

@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch 6 times, most recently from 903fcc6 to 4dcbf22 Compare July 4, 2024 16:42
@Jongmassey Jongmassey changed the title Jongmassey/montior cron jobs with sentry Monitor cron jobs with Sentry Jul 5, 2024
@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch 3 times, most recently from dc61bd5 to 2d9795f Compare July 5, 2024 10:05
Copy link
Member

@iaindillingham iaindillingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, Jon. I've made a couple of tactical comments, which I hope you'll find useful.

Strategically, I'm unsure about the rationale for the Cron class. If you discussed Cron when incorporating a similar change into Metrics, then I apologise. Setting that caveat aside, when I was reading about Sentry Crons, I wondered why we weren't doing something like the following in services/sentry.py:

def monitor(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        *_, schedule, module_name = func.__module__.split(".")
        monitor_config = {"schedule": {"type": "crontab", "value": f"@{schedule}"}}
        sentry_sdk.crons.monitor(
            monitor_slug=module_name, monitor_config=monitor_config
        )(func(*args, **kwargs))

    return wrapper

And then something like the following when creating a job, such as in jobserver/jobs/daily/clearsessions.py:

class Job(DailyJob):
    @sentry.monitor
    def execute(self):
        ...

What does the Cron class provide that sentry_sdk.crons.monitor doesn't?

jobserver/jobs/decorators.py Outdated Show resolved Hide resolved
services/sentry.py Outdated Show resolved Hide resolved
services/sentry.py Outdated Show resolved Hide resolved
Comment on lines 6 to 10
@monitor
class Job(DailyJob): # pragma: no cover

def execute(self):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this live in tests/unit/jobserver/jobs/test_decorators.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to mimic the path structure of the real jobs in order to test the derivation of the schedule from the module path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a bad code smell! How about something like...

from jobserver.jobs.daily.clearsessions import Job

Job.__base__.__name__.removesuffix("Job").lower()  # "daily"

I can't find __base__ documented, which is slightly unnerving. However, __bases__ is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a bad code smell!

I get you, but AFAICT the path structure is what drives the django job engine (or maybe the Job subtype, in which case we could parse that instead) so I felt OK mimicking that

Copy link
Contributor Author

@Jongmassey Jongmassey Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the following decorator

def monitor(func):
    # wrapper for a Django Job's execute method to add Sentry Cron monitoring
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        *_, schedule , module_name = func.__module__.split(".")
        monitor_config = {
            "schedule": {"type": "crontab", "value": f"@{schedule}"},
            "timezone": "Etc/UTC",
            # If an expected check-in doesn't come in `checkin_margin`
            # minutes, it'll be considered missed
            "checkin_margin": 30,
            # The check-in is allowed to run for `max_runtime` minutes
            # before it's considered failed
            "max_runtime": 10,
            # It'll take `failure_issue_threshold` consecutive failed
            # check-ins to create an issue
            "failure_issue_threshold": 1,
            # It'll take `recovery_threshold` OK check-ins to resolve
            # an issue
            "recovery_threshold": 1,
        }
        sentry_sdk.crons.monitor(
            monitor_slug=module_name, monitor_config=monitor_config
        )(func(*args, **kwargs))

    return wrapper

and the following Job class

from django_extensions.management.jobs import DailyJob
from services.sentry import monitor

class Job(DailyJob):
    @monitor
    def execute(self):
        ...

AFAICT it's not possible to get the Job class (and therefore it's __base__) from func within the wrapper method.

func has __qualname__ which in the case of declaring the class in test_sentry.test_monitor() is test_mnonbitor.<locals>.Job.execute which is a bit of the puzzle but I can't seem to get the actual Job type out of any of the dunder variables.

despite being smelly, pulling the schedule out of the module path works

@Jongmassey
Copy link
Contributor Author

What does the Cron class provide that sentry_sdk.crons.monitor doesn't?

In this context, actually nothing. With the way that tasks are executed in Metrics it was easier to use manual check-ins, but that's not true here. Using the sentry-provided monitor decorator is totally doable and much simpler in this instance

@Jongmassey
Copy link
Contributor Author

The one issue that strikes me with putting the decorator in services/sentry.py is that the schedule derivation is inherently tied to the path structure of django Jobs. My thinking of putting it in its own module within the jobs path was that it made this clear.

@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch 2 times, most recently from b340ca0 to 90bfbb1 Compare July 16, 2024 22:42
@Jongmassey
Copy link
Contributor Author

@iaindillingham I've rewritten this based on your comments.

The smell of deriving the schedule from the module path persists, but I can't find a better way to do this and have explained/justified this in the commit message.

I'd appreciate you taking another look :)

@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch from 90bfbb1 to 615bccc Compare July 16, 2024 22:50
@Jongmassey
Copy link
Contributor Author

Jongmassey commented Jul 16, 2024

Thinking some more about it, the assumptions stated in the commit message for d425665 could be asserted in the decorator, which might be a bit more defensive - do you think this is a good idea?

Thinking about this even more, that would further complicate the test because as it stands the test Job class doesn't follow the expected module path pattern.

@Jongmassey Jongmassey marked this pull request as ready for review July 17, 2024 08:38
@iaindillingham
Copy link
Member

Thanks, @Jongmassey. Before going any further, could you...

python manage.py runjob jobserver clearsessions

The runjob management command runs a job. The runjobs management command does the same, but on a schedule. (runjobs is called from app.json.)

The result tells us that decorators are beguilingly simple. In other words, there's a big difference between the following:

@my_decorator
def my_function():
    ...

@my_decorator()
def my_function():
    ...

Check the parentheses! If you're interested in learning more, then see "Defining a Decorator That Takes Arguments" in the Python Cookbook.

The result also tells us that the test is insufficient. But rather than digging deeper, let's take several steps back. Ultimately, we want to call sentry_sdk.crons.monitor. We're using a decorator to avoid repeating the monitor_slug and to set the correct schedule in the monitor_config. Let's make our lives easier and our code shorter.

from django.core.management import call_command
from django_extensions.management.jobs import DailyJob
from sentry_sdk.crons import monitor

DEFAULT_MONITOR_CONFIG = {"monitor_config": {"type": "crontab", "value": None}}

daily_monitor_config = dict(DEFAULT_MONITOR_CONFIG)
daily_monitor_config["monitor_config"]["value"] = "@daily"

# etc. etc.

class Job(DailyJob):
    help = "Do something every day"

    @monitor(monitor_slug="do_something", monitor_config=daily_monitor_config)
    def execute(self):
        call_command("do_something")

We could put the configuration in services/sentry.py. That's it!

You may object to having four (four!) places where we define the schedule: app.json, the subdirectories of jobserver/jobs, the parent class of each Job (e.g. DailyJob), and the configuration in services/sentry.py. Three, however, are out of scope: we didn't design Dokku, nor did we design django-extensions. In that context, the fourth doesn't matter.

@Jongmassey
Copy link
Contributor Author

Check the parentheses!

Thanks for the spot - VS Code's autocomplete seems desparate to put the parens in when I don't want them

@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch from 615bccc to 9134d09 Compare July 17, 2024 14:39
@Jongmassey
Copy link
Contributor Author

Running

python manage.py runjob jobserver clearsessions

produces no output

@Jongmassey
Copy link
Contributor Author

Jongmassey commented Jul 17, 2024

The result also tells us that the test is insufficient.

Could you expand on that? AFAICT the issue seems to be that the sentry initialisation in settings.py is not called when a management command is called. Does that align with your understanding of Django @iaindillingham ?

I was mistaken it is being called.

@Jongmassey
Copy link
Contributor Author

I can confirm after a bit of digging that when calling the management command you stated, it does indeed make the correct API call to Sentry, it's just that we have hit the limits of our plan

@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch 2 times, most recently from bade1fd to 7c08f14 Compare July 17, 2024 18:00
@Jongmassey
Copy link
Contributor Author

Jongmassey commented Jul 17, 2024

@iaindillingham f2dd640 now checks that each Job's execute method has been correctly annotated in about a robust a way as I can muster. I can't promise it'll catch every possible mis-decoration but would certainly catch the one that VS Code inflicted upon me!

@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch from 7c08f14 to f2dd640 Compare July 17, 2024 18:04
@iaindillingham
Copy link
Member

Thanks, @Jongmassey. I'm afraid I don't understand why we're not taking several steps back. I mentioned this in a previous comment. We seem to be adding a lot of accidental complexity to save us from repeating a small amount of information, some of which we already repeat elsewhere (the schedule) and some of which may end up being different anyway (the slug).

Part of the issue relates to the design of django-extensions "Jobs Scheduling" functionality (spoiler: it doesn't actually schedule jobs). However, we shouldn't let that design add accidental complexity into either our application or our test code.

Perhaps I've missed something, though. We have two approaches:

  1. The approach set out in this PR
  2. Wrap Job.execute with sentry_sdk.crons.monitor

What does the first approach offer that the second approach doesn't?

@Jongmassey
Copy link
Contributor Author

What does the first approach offer that the second approach doesn't?

As you've already identified there are three definitions for when a given scheduled task will run:

dokku cron configuration

$ dokku cron:list job-server
ID                                                                                       Schedule  Command
33ffvw0nlyj7jjyndqjbk8ft6cr0h8z3q93h6n39qbre4qxdc4w480d4evshvg2tnlx2yyv61tycs905au7t     @hourly   python manage.py runjobs hourly
27bsuol4zuiy5vbz4yodedbjbvd6mtrnuqxq9ejg69fdyo3ankx4oyuptft8waug3tasridzrgtfh8v3t        @daily    python manage.py runjobs daily
33ffvw0nlyj7jjyndqjbk8ft6cr0h8z3q93h6n39qbre4qxdc4w480d4evsirbyh1til3xpbynj6qpezy8vt     @weekly   python manage.py runjobs weekly
4ciq5z8vdwnmhvxwlgvol43n227fc89eo4vjubeblfaa2mbf13gpvqtzkydv78lv60f384x0beybkc47fipkcc9  @monthly  python manage.py runjobs monthly

task module path

~/opensafely-core/job-server$ tree jobserver/jobs/
jobserver/jobs/
├── __init__.py
├── daily
│   ├── __init__.py
│   ├── clearsessions.py
│   └── update_repo_has_github_outputs.py
├── hourly
│   ├── __init__.py
├── monthly
│   ├── __init__.py
├── weekly
│   ├── __init__.py
│   │   └── notify_impending_copilot_support_windows.cpython-312.pyc
│   └── notify_impending_copilot_support_windows.py
└── yearly
    ├── __init__.py

task superclass

~/opensafely-core/job-server/jobserver/jobs$ grep class **/*[!\_].py
daily/clearsessions.py:class Job(DailyJob):
daily/update_repo_has_github_outputs.py:class Job(DailyJob):
hourly/dump_db.py:class Job(HourlyJob):
weekly/notify_impending_copilot_support_windows.py:class Job(WeeklyJob):

However, if one were to declare a Job which extends DailyJob and put it in the weekly directory, the django job management extension would fail to import it:

if when and not (job.when == when or job.when is None):
    raise JobError("Job %s is not a %s job." % (jobmodule, when))

so we can assert that a well-defined Django Job management Job's superclass-inferred and path-inferred schedules to be the same.

Whether dokku has been correctly configured wrt the cron jobs needed for django job management is a systems administration question.

I'm loathe to add another definition of when a scheduled task should run (i.e in the sentry monitor configuration), along with a redefinition of the task's name. This feels like a potential footgun on both of these fronts, when both of these values are obtainable from their existing definitions.

@iaindillingham
Copy link
Member

This is a lot of code to avoid moving from three places where we define the schedule to four places where we define the schedule (or from two to three, if you prefer). I understand that you don't like it; I don't like it either. But the cost of the additional code isn't worth the benefit.

Take a look at #4470 for an alternative approach.

We believe that all scheduled jobs should have the same monitor config
save for the schedule on which they are expected to execute.

We may have to revisit this in future based on observation of job
execution times and their reported check-in times with Sentry.
@Jongmassey Jongmassey force-pushed the Jongmassey/montior-cron-jobs-with-sentry branch from f2dd640 to 9467ff4 Compare July 24, 2024 15:55
@Jongmassey Jongmassey merged commit d42d7f9 into main Jul 25, 2024
8 checks passed
@Jongmassey Jongmassey deleted the Jongmassey/montior-cron-jobs-with-sentry branch July 25, 2024 11:01
Jongmassey added a commit that referenced this pull request Jul 25, 2024
Sentry cron check-in events (#4421) contain float-typed fields which
the parse() method would try to iterate whilst searching for sensitive
tokens, causing an error.

The tokens listed in this method are all strings so safe to exclude
floats for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitor scheduled jobs with Sentry
2 participants