Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] serious performance issues with the file.tidied module #63231

Closed
nicholasmhughes opened this issue Dec 6, 2022 · 1 comment · Fixed by #63238
Closed

[BUG] serious performance issues with the file.tidied module #63231

nicholasmhughes opened this issue Dec 6, 2022 · 1 comment · Fixed by #63238
Assignees
Labels
Bug broken, incorrect, or confusing behavior needs-triage Performance State-Module

Comments

@nicholasmhughes
Copy link
Collaborator

Description
We ran into some serious performance issues with the file.tidied module. We have a temp directory with 500k files under it. A pure python os.walk with a stat of each file + dir takes 2.3s. However salt runs on that host are taking almost an hour with tons of CPU time spent by Salt. strace shows it doing about 3 stats per file and only progressing about 1 file per every 10ms.

Setup

tidy.sls:

tidy_temp:
  file.tidied:
    - name: /tmp
    - rmdirs: false
    - rmlinks: false
    - full_path_match: true
    - time_comparison: mtime
    - age_size_only: age
    - exclude:
      - .*/crash/.*
    - age: 30

Steps to Reproduce the behavior
Run the file.tidied state from the current master branch on a directory with over 500k files in it.

Expected behavior
I'd expect the time for the code to run to be much closer to a raw os.walk + os.stat test.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3006.0+0na.5b18e86
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.4
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.16.0
        pygit2: Not Installed
        Python: 3.7.3 (default, Oct 31 2022, 14:04:00)
  python-gnupg: Not Installed
        PyYAML: 6.0
         PyZMQ: 24.0.1
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-22-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster
@nicholasmhughes nicholasmhughes added State-Module Bug broken, incorrect, or confusing behavior labels Dec 6, 2022
@nicholasmhughes
Copy link
Collaborator Author

Ended up tracking the issue down to this line:

if elem in dirs:

One directory had 61k directories in it, so performing a "is this string in this list" check was very costly. PR incoming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage Performance State-Module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants