Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve general performance for a variety of high-load job launch use cases #8403

Conversation

chrismeyersfsu
Copy link
Member

reduce per-job database query count

Do not query the database for the set of Instance that belong to the
group for which we are trying to fit a job on, for each job.
Instead, cache the set of instances per-instance group.
reduce parent->child lock contention

We update the parent unified job template to point at new jobs
created. We also update a similar foreign key when the job finishes
running. This causes lock contention when the job template is
allow_simultaneous and there are a lot of jobs from that job template
running in parallel. I've seen as bad as 5 minutes waiting for the lock
when a job finishes.
This change moves the parent->child update to OUTSIDE of the
transaction if the job is allow_simultaneous (inherited from the parent
unified job). We sacrafice a bit of correctness for performance. The
logic is, if you are launching 1,000 parallel jobs do you really care
that the job template contains a pointer to the last one you launched?
Probably not. If you do, you can always query jobs related to the job
template sorted by created time.

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

# This dodges lock contention at the expense of the foreign key not being
# completely correct.
if getattr(self, 'allow_simultaneous', False):
connection.on_commit(self._update_parent_instance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary if self.status == status_before: ?

Would this be better?

if self.status != status_before:
    if getattr(self, 'allow_simultaneous', False):
        connection.on_commit(self._update_parent_instance)
    else:
        self._update_parent_instance()

* We update the parent unified job template to point at new jobs
created. We also update a similar foreign key when the job finishes
running. This causes lock contention when the job template is
allow_simultaneous and there are a lot of jobs from that job template
running in parallel. I've seen as bad as 5 minutes waiting for the lock
when a job finishes.
* This change moves the parent->child update to OUTSIDE of the
transaction if the job is allow_simultaneous (inherited from the parent
unified job). We sacrafice a bit of correctness for performance. The
logic is, if you are launching 1,000 parallel jobs do you really care
that the job template contains a pointer to the last one you launched?
Probably not. If you do, you can always query jobs related to the job
template sorted by created time.
* Do not query the database for the set of Instance that belong to the
group for which we are trying to fit a job on, for each job.
* Instead, cache the set of instances per-instance group.
@chrismeyersfsu chrismeyersfsu force-pushed the fix-same_jt_abuse_devel branch from 7b0ce7a to 2eac5a8 Compare October 19, 2020 14:56
@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded.

@ryanpetrello ryanpetrello changed the title Fix same jt abuse devel Improve performance for a variety of high-load job launch use cases Oct 19, 2020
@ryanpetrello ryanpetrello changed the title Improve performance for a variety of high-load job launch use cases Improve general performance for a variety of high-load job launch use cases Oct 19, 2020
@ryanpetrello
Copy link
Contributor

regate

@softwarefactory-project-zuul
Copy link
Contributor

Build succeeded (gate pipeline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants