Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pylint slow when run on script with pandas #2198

Closed
parantapa opened this issue Jun 17, 2018 · 97 comments
Closed

Pylint slow when run on script with pandas #2198

parantapa opened this issue Jun 17, 2018 · 97 comments
Labels
Maintenance Discussion or action around maintaining pylint or the dev workflow performance

Comments

@parantapa
Copy link

Sample script

> cat hello.py                                                                                                                                               (hodgepodge) 
"""
Hello.
"""

import pandas as pd

def hello():
    """
    Hello.
    """

    test_pdf = pd.DataFrame([[1, 2, 3]])

Running pylint

> /usr/bin/time pylint hello.py                                                                                                                              
No config file found, using default configuration
************* Module hello
W: 12, 4: Unused variable 'test_pdf' (unused-variable)

------------------------------------------------------------------
Your code has been rated at 6.67/10 (previous run: 6.67/10, +0.00)

Command exited with non-zero status 4
48.05user 0.15system 0:44.80elapsed 107%CPU (0avgtext+0avgdata 193132maxresident)k
0inputs+8outputs (0major+72586minor)pagefaults 0swaps

pylint --version output

pylint --version No config file found, using default configuration
pylint 1.9.1,
astroid 1.6.4
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0]

Q1. Is this expected behaviour?
Q1a. If so is there a way to make pylint ignore pandas?

@PCManticore
Copy link
Contributor

No, this is not the expected behaviour, there is probably a check that triggers a deep pandas inference leading to this result. You can try to ignore it with --ignored-modules=pandas but it seems it's not working, since it might be a check that doesn't consider this option.

@kodonnell
Copy link

I'm definitely seeing this on Python 3.6.5 :: Anaconda, Inc. with

astroid==1.6.5
atomicwrites==1.1.5
attrs==18.1.0
certifi==2018.4.16
coverage==4.5.1
isort==4.3.4
lazy-object-proxy==1.3.1
mccabe==0.6.1
mkl-fft==1.0.0
mkl-random==1.0.1
more-itertools==4.2.0
numpy==1.14.5
pandas==0.23.1
pluggy==0.6.0
py==1.5.4
pylint==1.9.2
pytest==3.6.2
pytest-cov==2.5.1
pytest-pylint==0.9.0
python-dateutil==2.7.3
pytz==2018.4
scikit-learn==0.19.1
scipy==1.1.0
six==1.11.0
wrapt==1.10.11

It's fine if I disable all checks. I tried finding the 'culprit' check ... but a variety of checks cause the issue, and I stopped after ten or so.

@parantapa parantapa reopened this Jun 28, 2018
@parantapa
Copy link
Author

Sorry, about that. I hit the close issue by mistake.

@parantapa
Copy link
Author

Does this mean there are more than 10 checks that ignore the ignore-modules=pandas directive?

@kodonnell
Copy link

As mentioned by @PCManticore that directive seems to have no impact.

Some tests using the following:

checkers.txt

$ cat test.py
import pandas
pandas.DataFrame()

$ cat time.py
from datetime import datetime
import os

times = []
with open("checkers.txt") as f:
    for i in f:
        i = i.strip()
        print(i, end="\r")
        t0 = datetime.now()
        _ = os.system("pylint --disable=all --enable=%s test.py > /dev/null 2>&1" % i)
        times.append({"checker": i, "time": datetime.now() - t0})
print()
for i in sorted(times, key=lambda x: -x["time"].total_seconds()):
    print(i["checker"].ljust(80), i["time"])

Results with above code (i.e. no ignored-modules=pandas) - showing only the long ones (all the rest were < 1 second).

missing-kwoa                                                                     0:00:43.373460
redundant-keyword-arg                                                            0:00:42.945230
unexpected-keyword-arg                                                           0:00:41.520050
consider-iterating-dictionary                                                    0:00:40.463160
abstract-class-instantiated                                                      0:00:40.146653
invalid-metaclass                                                                0:00:38.494744
logging-too-few-args                                                             0:00:38.419155
unsupported-binary-operation                                                     0:00:37.847943
invalid-slice-index                                                              0:00:37.419736
truncated-format-string                                                          0:00:37.097308
missing-format-string-key                                                        0:00:36.772325
invalid-sequence-index                                                           0:00:36.551161
unsupported-delete-operation                                                     0:00:36.546976
bad-open-mode                                                                    0:00:36.512247
bad-format-character                                                             0:00:36.458570
no-member                                                                        0:00:35.928044
format-needs-mapping                                                             0:00:35.898208
bad-format-string                                                                0:00:35.860376
invalid-unary-operand-type                                                       0:00:35.613437
unsupported-membership-test                                                      0:00:35.588740
not-context-manager                                                              0:00:35.584344
unused-format-string-key                                                         0:00:35.542302
logging-unsupported-format                                                       0:00:35.339732
mixed-format-string                                                              0:00:35.305616
assignment-from-none                                                             0:00:35.211583
repeated-keyword                                                                 0:00:35.161800
logging-too-many-args                                                            0:00:35.054962
logging-format-truncated                                                         0:00:34.871881
too-many-function-args                                                           0:00:34.859482
not-callable                                                                     0:00:34.792770
unsubscriptable-object                                                           0:00:34.770419
bad-str-strip-call                                                               0:00:34.757161
assignment-from-no-return                                                        0:00:34.727277
no-value-for-parameter                                                           0:00:34.664874
logging-not-lazy                                                                 0:00:34.529719
too-few-format-args                                                              0:00:34.459010
unsupported-assignment-operation                                                 0:00:34.373915
too-many-format-args                                                             0:00:34.195957
bad-format-string-key                                                            0:00:33.899370
unused-format-string-argument                                                    0:00:33.887841
missing-format-argument-key                                                      0:00:33.785160
redundant-unittest-assert                                                        0:00:33.740884
missing-format-attribute                                                         0:00:33.455067
logging-format-interpolation                                                     0:00:33.449355
bad-thread-instantiation                                                         0:00:33.306790
format-combined-specification                                                    0:00:33.294820
c-extension-no-member                                                            0:00:33.291890
keyword-arg-before-vararg                                                        0:00:33.234156
stop-iteration-return                                                            0:00:33.176356
invalid-format-index                                                             0:00:33.068345
deprecated-method                                                                0:00:33.047958
shallow-copy-environ                                                             0:00:32.959599

with ignored-modules=pandas

assignment-from-no-return                                                        0:00:47.875978
missing-kwoa                                                                     0:00:42.181332
invalid-sequence-index                                                           0:00:42.111520
logging-too-many-args                                                            0:00:41.893597
invalid-slice-index                                                              0:00:41.541294
no-value-for-parameter                                                           0:00:41.099715
not-callable                                                                     0:00:39.675871
bad-format-character                                                             0:00:39.205697
unsupported-binary-operation                                                     0:00:38.976457
repeated-keyword                                                                 0:00:38.825842
logging-too-few-args                                                             0:00:38.704934
no-member                                                                        0:00:38.503515
unexpected-keyword-arg                                                           0:00:38.427282
bad-open-mode                                                                    0:00:38.288543
unsubscriptable-object                                                           0:00:38.138226
redundant-keyword-arg                                                            0:00:38.018166
redundant-unittest-assert                                                        0:00:37.861007
logging-unsupported-format                                                       0:00:37.601585
too-many-function-args                                                           0:00:37.465114
unsupported-assignment-operation                                                 0:00:37.417673
logging-format-truncated                                                         0:00:37.258374
deprecated-method                                                                0:00:37.193201
bad-thread-instantiation                                                         0:00:37.192946
not-context-manager                                                              0:00:37.158083
assignment-from-none                                                             0:00:37.151823
unsupported-delete-operation                                                     0:00:37.124939
invalid-metaclass                                                                0:00:36.988846
truncated-format-string                                                          0:00:36.970254
shallow-copy-environ                                                             0:00:36.970162
mixed-format-string                                                              0:00:36.951497
invalid-unary-operand-type                                                       0:00:36.896878
unused-format-string-argument                                                    0:00:36.888768
unsupported-membership-test                                                      0:00:36.557271
format-needs-mapping                                                             0:00:36.517421
too-few-format-args                                                              0:00:36.205480
bad-format-string-key                                                            0:00:35.988614
missing-format-attribute                                                         0:00:35.296619
format-combined-specification                                                    0:00:34.861939
consider-iterating-dictionary                                                    0:00:34.782166
unused-format-string-key                                                         0:00:34.754750
abstract-class-instantiated                                                      0:00:34.620736
bad-format-string                                                                0:00:34.261011
invalid-format-index                                                             0:00:34.254017
c-extension-no-member                                                            0:00:33.711384
too-many-format-args                                                             0:00:33.553213
missing-format-string-key                                                        0:00:33.182854
stop-iteration-return                                                            0:00:33.017166
missing-format-argument-key                                                      0:00:32.750723
bad-str-strip-call                                                               0:00:32.468865
keyword-arg-before-vararg                                                        0:00:32.018734
logging-not-lazy                                                                 0:00:31.984399
logging-format-interpolation                                                     0:00:31.922702

Note: i7-7700HQ, so reasonable CPU.

@PCManticore
Copy link
Contributor

As mentioned earlier, ignored-modules is in fact used only by a handful of checks, as per its description:

List of module names for which member attributes should not be checked (useful for modules/projects where namespaces are manipulated during runtime and thus existing member attributes cannot be deduced by static analysis. It supports qualified module names, as well as Unix pattern matching.

It's not intended to ignore all the errors that happens to be with a given module.

Also if anyone wants to investigate which checks contribute to the slowness of pylint, the better way to do it is to use yappi or a different profiler. Here is an example of a PR where @nickdrozd used that profiler in order to determine some hotspots in astroid, an approach that should be far more reliable than running pylint for one check (as pylint already has an incurred overhead from both the subprocess start and from instantiating all the scaffolding needed for running the checks).

@SeppMe
Copy link

SeppMe commented Jul 20, 2018

While I was as of yet unable to find the root cause, I nevertheless traced this problem down to the 1.6.2 release of astroid.

I took a simple test program like the one above and varied the installed versions of Pandas, Numpy, Pylint, and astroid. I found that the Pandas, Numpy, and Pylint versions do not matter at all, I tested several versions of each from Summer 2017 up until today. But astroid <= 1.6.1 took only about 20-30 seconds, whereas anything >= 1.6.2 took 8-10 minutes! This also applies to the 2.x.y releases of Pylint and astroid, they take forever to analyse the simple test program.

@PCManticore
Copy link
Contributor

Thanks @SeppMe I don't out of the top of my head what features shipped with astroid 1.6, but most likely there's something odd going on with the inference, which triggers these abnormal running times.

@co-dh
Copy link

co-dh commented Jul 20, 2018

It's astroid.
strace pylint foo.py
Ctrl-c while you see a lot of mmap and munmap, you'll get a Keyboard exception with 1000 stacks.

@dickreuter
Copy link
Contributor

Is there any update on this?

@kapsh
Copy link
Contributor

kapsh commented Aug 11, 2018

Did someone run git bisect for this? For me it looks like regression came in pylint-dev/astroid@206d8a2 (hope if it helps).

bisect.log

@dickreuter
Copy link
Contributor

@kapsh Could we just revert that commit?

@PCManticore
Copy link
Contributor

@kapsh and @dickreuter No one got to work on this ticket just yet. We're still trying to fix the issues created after 2.0 launch, so we didn't have the time to investigate this issue or any other reported performance issues. Bare with us while we're working on our way through the backlog in order to get to this issue or investigate yourselves what the root cause is and send a PR to fix the problem.

@dickreuter
Copy link
Contributor

Thanks for letting me know. But please note that the whole package is completely unusable at the moment after version 1.6.2. So not sure what other errors you're looking at, but most likely this problem deserves a higher priority.

@kodonnell
Copy link

FYI @dickreuter , it seems to be 5-10x faster since I last tried it:

$ time pylint hello.py
************* Module hello
hello.py:12:0: C0304: Final newline missing (missing-final-newline)
hello.py:12:4: W0612: Unused variable 'test_pdf' (unused-variable)

------------------------------------------------------------------
Your code has been rated at 3.33/10 (previous run: 3.33/10, +0.00)


real    0m7.925s
user    0m6.016s
sys     0m1.859s

pylint: 7c103cd
astroid: 5b5cd7acbecaa9b587b07de27a3334a2ec4f2a79

$ pylint --version
pylint 2.2.0
astroid 2.0.4
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51)
[GCC 7.2.0]

@PCManticore
Copy link
Contributor

@dickreuter That's a bit of an exaggeration that the package is completely unusable. As @kodonnell mentioned, do make sure to test with the latest version.

@kapsh
Copy link
Contributor

kapsh commented Aug 15, 2018

@PCManticore thanks for feedback! I understand that your hands are full and didn't try to blame someone. Unfortunately I barely understand what that commit is doing — have seen only pylint's codebase and don't know anything about astroid.

@dickreuter personally I wouldn't rush into it, there should be reasons for that commit and reverting it can break more things. Didn't check this though.

@kodonnell that's interesting. Confirmed with pylint 2.1.1 & astroid 2.0.3 (5 seconds vs 35). Doesn't help much in my case (my project using pandas still stuck on Python 2), but generally it's a good news.

@co-dh
Copy link

co-dh commented Aug 16, 2018

Install an old version of pylint and asteroid can help.

@klahnen
Copy link

klahnen commented Aug 29, 2018

have you tried incresing the number of jobs, does it really work?

@SeppMe
Copy link

SeppMe commented Aug 30, 2018

I most definitely cannot observe anything getting better with the most current versions.

Simple testcases, just as above, one with an import pandas, another without.
Latest pylint commit (66cb321), astroid 2.0.4, without import pandas: 5 seconds
Latest pylint commit (66cb321), astroid 2.0.4, with import pandas: I gave up after about 5 minutes
Pylint 1.9.3, astroid 1.6.1, without import pandas: 5 seconds
Pylint 1.9.3, astroid 1.6.1, with import pandas: 25 seconds

@PCManticore PCManticore added this to the Next minor release milestone Aug 31, 2018
@PCManticore
Copy link
Contributor

@SeppMe Thanks for letting us know, we'll get to it. This issue is now part of the Faster pylint project, which is going to be my main focus this autumn, so we'll definitely get to see why pylint and astroid are slow and how we can improve that experience across the board.

@kodonnell
Copy link

Can you provide more detail about the Faster pylint project @PCManticore (or a link to the announcement)? I'm interested (especially given the recent cython work I looked into).

@PCManticore
Copy link
Contributor

@kodonnell There's nothing formal per se, just a GitHub project to track all the issues that are related to performance: https://github.com/PyCQA/pylint/projects/3. This doesn't include the issues on astroid, but you might be interested in pylint-dev/astroid#610 (we probably need to do a project on astroid as well for easier tracking of planning)

@PCManticore PCManticore unpinned this issue Dec 30, 2018
@rafalp
Copy link

rafalp commented Dec 30, 2018

Great news, can't wait to try it out! 👍

Thanks!

@DannyNemer
Copy link

@PCManticore Thank you very much! When do you plan to release Astroid v2.2.0 with this fix? I am debating whether to update my organization's requirements.txt files to include git+https://github.com/PyCQA/astroid.git@master, or whether I should wait for the v2.2.0 release. I appreciate your work!

@PCManticore
Copy link
Contributor

For now installing from master should do it. I'm planning to release the next version in a couple of weeks, but might take a bit.

@sp-daniel-pinyol
Copy link

Hi,
I tried linting a single file with these 2 lines with master versions of astroid & pylint:

from flask import Flask
app = Flask(__name__)

It still takes quite time on a macos i7:

$ pylint --version
pylint 2.3.0-dev0
astroid 2.2.0-dev
Python 3.7.2 (default, Jan 13 2019, 12:50:01) 
[Clang 10.0.0 (clang-1000.11.45.5)]

time pylint app/flask.py
real    0m1.428s
user    0m1.315s
sys     0m0.088s

With pylint release 2.2.2 it takes roughly the same time

@PCManticore
Copy link
Contributor

@sp-daniel-pinyol Please report a separate issue. From a quick look it seems that both 1.9 and 2.2 exhibit the same behaviour, it doesn't seem to be caused by the regression which caused this particular issue with pandas.

@dickreuter
Copy link
Contributor

Any estimate when we'll get this released? many thanks

@PCManticore
Copy link
Contributor

@dickreuter I'll release 2.3 somewhere in February, in the meantime you can use the dev release.

@gnu-user
Copy link

gnu-user commented Feb 8, 2019

@dickreuter We've just switched to using flake8 for the time being until the new version is released.

@dickreuter
Copy link
Contributor

Yes it’s really odd that it takes months for a release that important. It should be done asap. Currently plyint is totallly unusable.

The only way is to take it directly from GitHub.

@PCManticore
Copy link
Contributor

@dickreuter you can install the dev package if you want.

@dickreuter
Copy link
Contributor

It’s tricky for large corporations as we can only use packages from anaconda.

@DannyNemer
Copy link

DannyNemer commented Feb 8, 2019

My company switched to using the dev release, and it reduced the duration of our entire CI/CD from 25+ minutes to 4 minutes.

@PCManticore
Copy link
Contributor

@dickreuter Cool, quick question: can those large corporations pay for provided support of one of these tools they're using, like pylint for instance? This is a genuine question. I find it that I feel burned out working as a volunteer for pylint, especially since I can't focus my time on improving the capabilities of the tool, and money could provide an additional incentive to make that work.

@DannyNemer
Copy link

I believe Microsoft has sponsored engineers who work solely on open source work not tied to the company (e.g., Lodash). Also, I know Stripe has sponsored several open source developers before.

However, @dickreuter, if you more publically state your case just as you have here, I feel it is reasonable for companies - even small startups like mine - to donate.

@dickreuter
Copy link
Contributor

dickreuter commented Feb 8, 2019 via email

@James-Quigley
Copy link

Pylint taking ~7 minutes on 1500 files. Tried upgrading, tried ignoring pandas, and not seeing any improvements. Has anybody found a solution with pylint + pandas that doesn't take minutes to run? We already separated various rules into different pylintrc files and run those in parallel in an attempt to speed things up

@dickreuter
Copy link
Contributor

dickreuter commented Apr 24, 2019 via email

@James-Quigley
Copy link

Actually realizing its prospector causing the slowness. Pylint run on its own takes no time at all.

d-fence added a commit to odoo-dev/odoo that referenced this issue May 7, 2019
According to this pylint-dev/pylint#2198
The latest version fixed performance issues.
Let's test it on runbot, if it works, the runbot Dockerfile can be
adapted accordingly.

It could fix things like that:
http://runbot.odoo.com/runbot/build/513891
@rvanlaar
Copy link

rvanlaar commented Dec 11, 2019

I'm experiencing major slowdowns for checking pandas/numpy files.

A 366 line file takes 128 seconds to check.
Other files don't have this problem.

pylint --version:

pylint 2.4.4
astroid 2.3.0
Python 3.6.9 (default, Oct 24 2019, 17:02:38) 
[GCC 8.3.0]

@michael-lefkowitz-techlabs

pylint is still taking over a minute to lint a short .py file that contains a very long string (65510 chars). I'm not sure if it's related to this issue, or if I should open a new issue.

pylint 2.6.0
astroid 2.4.2
Python 3.8.0 (default, Feb 25 2021, 22:10:10)
[GCC 8.4.0]

@Pierre-Sassoulas
Copy link
Member

Probably due to #4062 instead.

@Pierre-Sassoulas Pierre-Sassoulas added Maintenance Discussion or action around maintaining pylint or the dev workflow and removed task labels Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Maintenance Discussion or action around maintaining pylint or the dev workflow performance
Projects
None yet
Development

No branches or pull requests