Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designate a queue for running the pipeline tasks #346

Merged
merged 3 commits into from
Jun 9, 2021

Conversation

melvinsoft
Copy link
Contributor

@melvinsoft melvinsoft commented May 21, 2021

Moving Figures pipeline execution away from the default worker is becoming more and more a priority. This is a first pass to try to run it on a specific queue.

If it works, next step are going to be to create a figures pipeline dedicated queue, and ultimately, move that queue to a dedicated server.

TODO:

  • Add test to new settings function
  • Change default queue

figures/tasks.py Outdated
@@ -380,4 +380,7 @@ def run_figures_monthly_metrics():
"""
logger.info('Starting figures.tasks.run_figures_monthly_metrics...')
for site in get_sites():
populate_monthly_metrics_for_site.delay(site_id=site.id)
populate_monthly_metrics_for_site.apply_async(
kwargs={'site_id': 'edx.lms.core.high'}, # TODO: put in settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @melvinsoft! Please keep in mind that Figures community compatibility needs to be maintained. I think this line breaks it.

Also site_id == edx.lms.core.high seems incorrect.

Copy link
Contributor Author

@melvinsoft melvinsoft May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmarIthawi That error is resolved. Yes, I know we'll need to think about community as well. This is one of the default Tahoe workers, but still, it should be controlled by settings, I just need some directions from John to know how to load Figures settings in the tasks.py file.
I'll wait for John approval and directions before merging of course.

Copy link
Contributor

@johnbaldwin johnbaldwin May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also our enterprise servers that we need to support too. Question @melvinsoft or @OmarIthawi , is edx.lms.core.high a queue that exists in upstream Open edX? I think it does, but wanted to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbaldwin Yes, people can change it when they deploy, but those are the default ones: https://github.com/edx/configuration/blob/open-release/juniper.master/playbooks/roles/edxapp/defaults/main.yml#L653

Question, how can I include Figures settings in this file? I'd like to use the same settings I'm using in the lms_settings.py file.

Copy link
Contributor

@johnbaldwin johnbaldwin May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melvinsoft If I understand your question right, you are asking about how to have figures.tasks read the value 'edx.lms.core.high' into the apply_async function?

If so, there isn't a "best practices" way now, but I'll give you a solution, and interested in comments from @OmarIthawi. So, first some history is important.

Way back in Figures early days, Figures had values injected into figures.settings. This was the old way: https://github.com/appsembler/figures/blob/0.2.0/figures/settings.py

But @OmarIthawi didn't like and I saw his point, so when he added Hawthorn plugin support, this value injection from django.conf.settings.ENV_TOKENS['FIGURES'] into figures.settings` went away. See here: #84

After this update, we didn't need Figures to even read from `ENV_TOKENS['FIGURES'] until Omar recently added site filtering on active sites. See these two:

  1. get_sites: configurable backend to filter active sites #321
  2. https://github.com/appsembler/figures/blob/master/figures/sites.py#L312

So there is where we stand today. Here's one option:

  1. Set a default value for FIGURES_MONTHLY_METRICS_QUEUE in figures/settings/__init__.py or figures/settings/lms_production.py (I don't have a really strong opinion here. There is no option for NOT 'lms_production' module, so everything in figures settings is 'lms_production')

  2. in figures.tasks do something similar to what Omar did in figures.sites. Here's one option

from figures.settings.lms_production import FIGURES_MONTHLY_METRICS_QUEUE

Then in the task function that calls an apply_async method with the queue option:

queue = settings.ENV_TOKENS['FIGURES'].get('FIGURES_MONTHLY_METRICS_QUEUE',
    FIGURES_MONTHLY_METRICS_QUEUE)
populate_monthly_metrics_for_site.apply_async(site_id=site.id, queue=queue)

@johnbaldwin
Copy link
Contributor

@melvinsoft
I ran tox locally. I believe @estherjsuh has it running locally too on her MBP, so it should work for you too.

I found a couple of tests failing for the Ginko env and the other envs (which were cancelled in this PR's test exedutions).

The failing tests look straightforward to fix, should simply be two lines of code to change in the tests.

For the settings test, there's an extra key in expected keys:

TestDailyMauPipelineSettings.test_daily_mau_pipeline_flag_enabled 

self = <tests.test_settings.TestDailyMauPipelineSettings object at 0x7fd0c3bafe90>

    def test_daily_mau_pipeline_flag_enabled(self):
        self.settings.ENV_TOKENS['FIGURES'] = { 'ENABLE_DAILY_MAU_IMPORT': True }
        plugin_settings(self.settings)
        assert self.TASK_NAME in self.settings.CELERYBEAT_SCHEDULE
>       assert set(['task', 'schedule']) == set(
            self.settings.CELERYBEAT_SCHEDULE[self.TASK_NAME].keys())
E       AssertionError: assert set(['schedule', 'task']) == set(['options', 'schedule', 'task'])
E         Extra items in the right set:
E         'options'
E         Use -v to get the full diff

tests/test_settings.py:121: AssertionError

The other test failing is because this PR changed the method used from .delay to .apply_async. See the the monkeypatch.setattr below:

test_run_figures_monthly_metrics_with_faked_subtask 

transactional_db = None, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f9a305fcfd0>

    def test_run_figures_monthly_metrics_with_faked_subtask(transactional_db, monkeypatch):
        """Verify we visit the site in the subtask
    
        Faking the subtask for the function under test
        """
        expected_sites = Site.objects.all()
        assert expected_sites.count()
        sites_visited = []
    
        def fake_populate_monthly_metrics_for_site(site_id):
            sites_visited.append(site_id)
    
        monkeypatch.setattr('figures.tasks.populate_monthly_metrics_for_site.delay',
                            fake_populate_monthly_metrics_for_site)
    
        run_figures_monthly_metrics()
    
>       assert set(sites_visited) == set([rec.id for rec in expected_sites])
E       assert set([]) == set([1])
E         Extra items in the right set:
E         1
E         Use -v to get the full diff

tests/tasks/test_monthly_tasks.py:93: AssertionError

@melvinsoft melvinsoft force-pushed the maxi/designate-queue-pipeline branch from 3747b4e to 8ed85ee Compare May 24, 2021 14:43
@melvinsoft
Copy link
Contributor Author

@johnbaldwin Thanks, I fixed the two tests you pointed me to, but I'm getting a new one, and I'm a bit confused. If you can point me in the right direction would be great. I'm not understanding why it's failing.

=================================== FAILURES ===================================
____________ test_run_figures_monthly_metrics_with_unfaked_subtask _____________

transactional_db = None
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f0b6167fe50>

    @pytest.mark.skipif(OPENEDX_RELEASE == GINKGO,
                        reason='Broken test. Apparent Django 1.8 incompatibility')
    def test_run_figures_monthly_metrics_with_unfaked_subtask(transactional_db, monkeypatch):
        """Verify we visit the function our subtasks calls
    
        Faking the function called by the subtask our function under test calls.
        Basically, we're faking two levels below our function under test instead of
        one level below
        """
        expected_sites = Site.objects.all()
        assert expected_sites.count()
        sites_visited = []
    
        def fake_fill_last_smm_month(site):
            # assert site == expected_site
            sites_visited.append(site)
    
        monkeypatch.setattr('figures.tasks.fill_last_smm_month',
                            fake_fill_last_smm_month)
    
        run_figures_monthly_metrics()
    
>       assert set(sites_visited) == set(expected_sites)
E       assert set([]) == set([<Site: example.com>])
E         Extra items in the right set:
E         <Site: example.com>
E         Use -v to get the full diff

tests/tasks/test_monthly_tasks.py:118: AssertionError
======== 1 failed, 380 passed, 160 skipped, 8 xfailed in 36.22 seconds =========
ERROR: InvocationError for command /home/runner/work/figures/figures/.tox/py27-hawthorn/bin/pytest -c pytest-hawthorn.ini (exited with code 1)
___________________________________ summary ____________________________________
ERROR:   py27-hawthorn: commands failed
Error: Process completed with exit code 1.

Thanks in advance!

@johnbaldwin
Copy link
Contributor

@melvinsoft this could be a similar issue to one you fixed in the other task by fixing the monkeypatched path in the test where the expected set of sites visited is empty

>       assert set(sites_visited) == set(expected_sites)
E       assert set([]) == set([<Site: example.com>])
E         Extra items in the right set:
E         <Site: example.com>
E         Use -v to get the full diff

Also, you'll need to merge master branch into your PR branch. Since there's no file overlap, you should be able to rebase master into your branch instead of merge

@melvinsoft
Copy link
Contributor Author

@johnbaldwin I'm sorry, but I'm a bit lost, can you elaborate a bit more.

In the previous part it was more clear to me, I changed a function, then I fixed the monkeypatched function, but here is not so clear to me, since I'm not touching figures.tasks.fill_last_smm_month.

Thanks in advance.

Copy link
Contributor

@bryanlandia bryanlandia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does the Daily metrics take to run on a large install like Tahoe? I'm wondering whether Figures tasks may need their own dedicated custom queue. The edx.lms.core.high (High priority) queue is also used by bulk_email, so if an instructor wants to invite students or communicate to a course those tasks would be waiting for a while, I think, if they try to invite at the same time. But of course that would involve some Ansible setup.

If it works, next step are going to be to create a figures pipeline dedicated queue, and ultimately, move that queue to a dedicated server.

☝️ Oh. I should have read more carefully.

figures/tasks.py Outdated
@@ -380,4 +380,7 @@ def run_figures_monthly_metrics():
"""
logger.info('Starting figures.tasks.run_figures_monthly_metrics...')
for site in get_sites():
populate_monthly_metrics_for_site.delay(site_id=site.id)
populate_monthly_metrics_for_site.apply_async(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melvinsoft Woudl you please add a comment as to why this code is explicitly setting the queue. Maybe also as a module level docstring for future Figures developers who will rewrite the daily tasks. Would be good to warn them that they need to be explicit in setting the queue for any apply_async calls so they don't go to any old queue.

@johnbaldwin
Copy link
Contributor

@johnbaldwin I'm sorry, but I'm a bit lost, can you elaborate a bit more.

In the previous part it was more clear to me, I changed a function, then I fixed the monkeypatched function, but here is not so clear to me, since I'm not touching figures.tasks.fill_last_smm_month.

Thanks in advance.

@melvinsoft I did some initial investigation and there's a couple of issues I would not expect you to see. I had to set breakpoints and inspect test execution running tox locally

  1. Changing .delay to .apply_async in the populate_monthly_metrics_for_site task function appears to break the test as it looks like this apply_async is not being called when the test_run_figures_monthly_metrics_with_unfaked_subtask test function is executed. I'm not yet sure why this is since the docs identify delay as a shortcut to apply_async but obviously there is something different in how they are executed, at least in the test environment

  2. The juniper tests were being skipped because of a naming mismatch in the tox.ini file. This PR fixes where juniper tests were not being run: Reorganize the requirements files for Juniper tox tests #345

I'm happy to rebase master into your PR branch

So before this PR can proceed, we need to rebase master into it and then address why apply_async is not being executed in Pytest

@melvinsoft melvinsoft force-pushed the maxi/designate-queue-pipeline branch 2 times, most recently from 57bc76b to 34f43cd Compare May 27, 2021 01:40
@melvinsoft melvinsoft closed this Jun 2, 2021
@melvinsoft melvinsoft reopened this Jun 2, 2021
@melvinsoft melvinsoft force-pushed the maxi/designate-queue-pipeline branch 3 times, most recently from 2bb4646 to a68545f Compare June 4, 2021 20:48
Fix syntax error and PEP8

add space

hardcode worker

fixup! hardcode worker

fix test

fix another test

fix test funtino

fixup! fix test funtino

fix monkeypatch

fixup! fix monkeypatch

fixup! fix monkeypatch

fix test again

fix test again

fix test again

fix test again

remove settings

fix routing

more fixes

more fixes

more fixes

more fixes on task

more fixes and new way of routing

re add ,

remove unused param

remove unused param 2

allow figures to route tasks to specific queues

undo task changes

undo settings changes

undo settings changes

fix settings name

fix settings name second time

test task

more concrete definition of tasks

fix method

fix method

try apply async

test order

use group

fix

fix

fix settings

try our router

fix char

fix char

try our router

try our router

remove comments

fix flake8

fix flake8 v2

remove unused args

add settings test

add default None
@melvinsoft melvinsoft force-pushed the maxi/designate-queue-pipeline branch from a68545f to 0cfdcad Compare June 4, 2021 20:50
@melvinsoft
Copy link
Contributor Author

@johnbaldwin @OmarIthawi @estherjsuh @thraxil @bryanlandia

Can I get another another round of review from you? I've done a lot of changes in the approach here. Thanks in advance!

for site in get_sites():
populate_monthly_metrics_for_site.delay(site_id=site.id)
all_sites_jobs = group(populate_monthly_metrics_for_site.s(site.id) for site in get_sites())
all_sites_jobs.delay()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melvinsoft These two lines I'm really interested to know if they will work. I had tried doing grouping before in early Figures development, but the tasks seemed to have not been executed or died silently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbaldwin We definitely need to try it out on staging, but I pushed the branch directly on Staging and run the Django management command and I was able to generate data.

I'm not sure how did you tried in the past, but if you look at the first line added, I'm creating a group of signatures. So it's Tasks -> Signature -> Group.

We've a design problem that I'm trying to solve with group, we've a celery task run_figures_monthly_metrics that inside does a loop and trigger a new celery tasks for each site populate_monthly_metrics_for_site. This is not recommended by Celery, and actually, after scratching my head for a couple of days, I found out that the tasks we trigger inside the tasks, do not respecting the routing. So I'd say let's try this approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not recommended by Celery, and actually, after scratching my head for a couple of days, I found out that the tasks we trigger inside the tasks, do not respecting the routing. So I'd say let's try this approach.

Interesting find! Let's try it and see if it works.

Copy link
Contributor

@johnbaldwin johnbaldwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@melvinsoft I've looked over PR changes and didn't see anything jump out at me as an issue. Looks good! Please update the branch and merge. I look forward to seeing how it will work!

@@ -149,6 +150,13 @@ def root(*args):
'FIGURES': {}, # This variable is patched by the Figures' `lms_production.py` settings module.
}

PRJ_SETTINGS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this variable mean?

for site in get_sites():
populate_monthly_metrics_for_site.delay(site_id=site.id)
all_sites_jobs = group(populate_monthly_metrics_for_site.s(site.id) for site in get_sites())
all_sites_jobs.delay()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not recommended by Celery, and actually, after scratching my head for a couple of days, I found out that the tasks we trigger inside the tasks, do not respecting the routing. So I'd say let's try this approach.

Interesting find! Let's try it and see if it works.

@melvinsoft melvinsoft merged commit 703a0a0 into master Jun 9, 2021
@OmarIthawi OmarIthawi deleted the maxi/designate-queue-pipeline branch June 12, 2021 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants