Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance instantiating large problems #749

Open
3 of 4 tasks
MBradbury opened this issue May 3, 2024 · 2 comments
Open
3 of 4 tasks

Slow performance instantiating large problems #749

MBradbury opened this issue May 3, 2024 · 2 comments

Comments

@MBradbury
Copy link

Details for the issue

What did you do?

Creating large problems in PULP is slow, there are a two issues:

  1. Creating expressions via pulp.lpSum is slower than directly creating them via pulp.LpAffineExpression
  2. Making constraints requires allocating and filling an additional 2 copies of an LpAffineExpression
import pulp, numpy as np, itertools

# Some reward matrix
reward = np.full((300, 200, 50), 10)

prob = pulp.LpProblem("Problem", pulp.constants.LpMaximize)
dv = pulp.LpVariable.dicts("dv",
       itertools.product(range(300), range(200), range(50)),
       cat=pulp.constants.LpBinary)

Slow objective function:

prob += pulp.lpSum([
            dv[a, b, c] * reward[a, b, c]
            for a in range(300)
            for b in range(200)
            for c in range(50)
        ])

Faster objective function:

prob += pulp.LpAffineExpression(
            (dv[a, b, c], reward[a, b, c])
            for a in range(300)
            for b in range(200)
            for c in range(50)
        )

Slow constraint:

for c in range(50):
  prob += pulp.lpSum([
              dv[a, b, c]
              for a in range(300)
              for b in range(200)
          ]) == 1

Faster constraint:

for c in range(50):
  prob += pulp.LpAffineExpression(
              (dv[a, b, c], 1)
              for a in range(300)
              for b in range(200)
          ) == 1

Making constraints is slow as self - other creates the first LpAffineExpression copy at https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L980, https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L983 or https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L986. A second copy is then made when LpConstraint's constructor is called (https://github.com/coin-or/pulp/blob/master/pulp/pulp.py#L1012).

It seems as if one of these copies can be avoided if the rhs is set when other is int/float. So in LpAffineExpression.__eq__ this would be:

def __eq__(self, other):
        if isinstance(other, (float, int)):
            return LpConstraint(self, const.LpConstraintEQ, rhs=other)
        else:
            return LpConstraint(self - other, const.LpConstraintEQ)

Avoiding the second copy of LpAffineExpression would require that LpConstraint does not inherit from LpAffineExpression, which is a more significant change.

Some performance numbers using the attached script:

Slow Objective: 16.692 seconds
Slow Constraint: 22.000 seconds
Fast Objective: 2.945 seconds
Fast Constraint: 20.278 seconds
Faster Constraint: 15.407 seconds
  • Happy to send a PR with the change to use rhs if this looks correct.
  • Unsure about the right way forward with lpSum, as LpAffineExpression's are constructed when doing the internal multiplication, ideally something more lightweight (e.g., tuple) would be used as the result of the multiplication.

Useful extra information

The info below often helps, please fill it out if you're able to. :)

What operating system are you using?

  • Windows: Microsoft Windows [Version 10.0.19045.4291]

I'm using python version: Python 3.10.12

I installed PuLP via:

  • pypi (python -m pip install pulp)

Did you also

@pchtsp
Copy link
Collaborator

pchtsp commented May 3, 2024 via email

@MBradbury
Copy link
Author

Some initial work here: https://github.com/MBradbury/pulp/tree/perf

My test file now shows these results:

Slow Objective: 16.397 seconds
Slow Constraint: 7.267 seconds
Fast Objective: 2.675 seconds
Fast Constraint: 6.413 seconds

Faster Constraint is no longer present as the change has been incorporated.

No change in performance of building objectives, improvement for constraints. There are still some performance wins when using tuples over lpSum, but constraints are faster now as they avoid copying LpAffineExpression content and just use the object that was provided to them.

The tests I am able to run are fine:

Ran 1327 tests in 40.831s

OK (skipped=1264)

fgvieira added a commit to snakemake/snakemake that referenced this issue Oct 20, 2024
<!--Add a description of your PR here-->
I am running a workflow with ~700k jobs and, at each given time, there
are around 230k jobs ready to be run. The initial building of the DAG is
quite slow (~2h, but I'll leave that for another PR 😄), but the
main issue is that the scheduler takes a lot of time deciding the next
jobs to be submitted.

In my case, all jobs are quite fast and similar in terms of resources,
so the cluster is idle most of the time. The greedy scheduler is
considerably faster, but still too slow.

The ILP should switch to the greedy after 10s, but it sometimes ignores
the timeout (coin-or/Cbc#487) and it has been reported being quite slow
instantiating large problems (coin-or/pulp#749). In my case, the ILP
runs for 60s (the pulp file is 100Mb) before switching to greedy. Apart
from that, and specially on slow file systems, the scheduler can still
be quite slow checking all temp and input files.

Here, I propose sampling ready jobs, so that only a subset of jobs
(instead of all ready jobs) are evaluated by the scheduler. In my tests,
this greatly reduces the scheduler time:

|  | ILP | greedy |
|---|---|---|
| Native |15 - 20 mins |30s - 1 min  |
| Sampling 1000 jobs |  | 1 - 2 s |



### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Introduced a new argument `--scheduler-subsample` to optimize job
scheduling by limiting the number of jobs considered for execution.
- Added a method for inferring resource requirements, enhancing user
experience with better error handling.
- Updated settings to include a new attribute for job subsampling,
improving scheduling flexibility.

- **Bug Fixes**
- Improved error handling and logging for resource evaluation and
parsing, providing clearer guidance for users.
- Enhanced job selection process with a subsampling mechanism to
optimize scheduling efficiency.

- **Refactor**
- Enhanced structure and organization of job scheduling logic for better
integration with existing functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants