Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: dvc startup became slow #2445

Closed
Suor opened this issue Aug 27, 2019 · 12 comments
Closed

ui: dvc startup became slow #2445

Suor opened this issue Aug 27, 2019 · 12 comments
Assignees
Labels
enhancement Enhances DVC p1-important Important, aka current backlog of things to do performance improvement over resource / time consuming tasks research ui user interface / interaction

Comments

@Suor
Copy link
Contributor

Suor commented Aug 27, 2019

$ time dvc --version
0.57.0+e3f6c7

real	0m2.186s
user	0m1.834s
sys	0m0.288s
@efiop
Copy link
Contributor

efiop commented Aug 27, 2019

Might be imports again. Need to check cprofile report, it will make it clear.

@efiop efiop added enhancement Enhances DVC performance improvement over resource / time consuming tasks p2-medium Medium priority, should be done, but less important labels Aug 27, 2019
@efiop efiop added p1-important Important, aka current backlog of things to do ui user interface / interaction and removed p2-medium Medium priority, should be done, but less important labels Sep 9, 2019
@Suor Suor self-assigned this Sep 10, 2019
@Suor
Copy link
Contributor Author

Suor commented Sep 10, 2019

So it's imports:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
4903/2532    0.004    0.000    0.471    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
     1802    0.003    0.000    0.004    0.000 <frozen importlib._bootstrap>:103(release)
     1330    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:143(__init__)
     1330    0.001    0.000    0.018    0.000 <frozen importlib._bootstrap>:147(__enter__)
     1330    0.001    0.000    0.004    0.000 <frozen importlib._bootstrap>:151(__exit__)
     1802    0.012    0.000    0.015    0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
     1317    0.002    0.000    0.002    0.000 <frozen importlib._bootstrap>:176(cb)
      472    0.001    0.000    0.003    0.000 <frozen importlib._bootstrap>:194(_lock_unlock_module)
   1597/4    0.001    0.000    1.011    0.253 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
    17493    0.004    0.000    0.004    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
       23    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:232(_requires_builtin_wrapper)
      169    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:269(_module_repr)
     1242    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:307(__init__)
     1242    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:311(__enter__)
     1242    0.002    0.000    0.005    0.000 <frozen importlib._bootstrap>:318(__exit__)
     4954    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:321(<genexpr>)
     1190    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap>:35(_new_module)
     2461    0.002    0.000    0.002    0.000 <frozen importlib._bootstrap>:369(__init__)
     2405    0.001    0.000    0.012    0.000 <frozen importlib._bootstrap>:403(cached)
     2554    0.001    0.000    0.002    0.000 <frozen importlib._bootstrap>:416(parent)

@efiop
Copy link
Contributor

efiop commented Sep 10, 2019

Need to research which modules are causing us the most delay (e.g. boto, paramiko and other remote libs or something else?). And how to solve it? (e.g. simple dynamic imports or writing our own lazy_import implementation or what else?)

@Suor
Copy link
Contributor Author

Suor commented Sep 11, 2019

Timed imports:

553.1 ms dvc.cli
534.1 ms dvc.repo
161.7 ms dvc.remote.oss
123.8 ms dvc.remote.hdfs
92.6 ms oss2.api
90.3 ms oss2.crypto
88.6 ms dvc.remote.azure
84.4 ms azure.storage.blob
83.5 ms dvc.remote.gs
83.2 ms azure.storage.blob.appendblobservice
66.7 ms pyarrow
65.8 ms oss2.utils
62.8 ms azure.storage.common.tokencredential
54.5 ms pyarrow.lib
44.4 ms google.cloud.storage.client
44.1 ms google.cloud.client
42.4 ms urllib3.connectionpool
38.0 ms dvc.remote.s3
37.4 ms boto3.session
35.5 ms Crypto.IO
31.8 ms google.cloud.storage.batch
25.2 ms google.cloud
24.0 ms Crypto.Cipher._mode_ecb
23.4 ms urllib3.packages.rfc3986.normalizers
23.2 ms urllib3.packages.rfc3986.api
22.5 ms urllib3.packages.rfc3986.iri
21.8 ms Crypto.Util._raw_api
21.7 ms Crypto.Math.Numbers
21.3 ms Crypto.Math._IntegerGMP
18.1 ms cryptography.hazmat.backends.openssl.backend
17.0 ms cryptography.x509.base
16.6 ms dvc.remote.ssh
16.4 ms cryptography.x509.extensions
15.8 ms Crypto.IO._PBES
15.3 ms urllib3.connection
13.8 ms paramiko.transport
12.5 ms urllib3.util.ssl_
12.0 ms botocore.compat
11.6 ms asn1crypto.keys
10.7 ms Crypto.Cipher._mode_siv
10.6 ms botocore.exceptions
10.5 ms Crypto.Protocol.KDF

@ghost
Copy link

ghost commented Sep 12, 2019

@Suor , how did you timed the imports?

Suor added a commit to Suor/dvc that referenced this issue Sep 13, 2019
@Suor
Copy link
Contributor Author

Suor commented Sep 13, 2019

$ PYTHONPROFILEIMPORTTIME=1 dvc --version 2> imports.txt

imports.txt

From what I can see dvc.utils take some time to load (~50ms). We might want to restructure it a bit. There are some unneeded pieces like:

  • http.server - 27 ms - used only in tests
  • configparser - 2 ms - used only in innosetup

Another things, which can be delayed:

  • tqdm - 10 ms
  • ruamel.yaml - 10 ms
  • dvc.remote.base - 8 ms
  • psutil - 6 ms

Loading dvc.repo takes 25 ms now and is triggered by importing dvc.command.get.

@Suor
Copy link
Contributor Author

Suor commented Sep 13, 2019

P.S. dvc --version takes about 200 ms for me after #2494 and #2495.

efiop added a commit to efiop/dvc that referenced this issue Sep 23, 2019
Decreases `time dvc --version` from ~1.4 sec to ~0.3 sec.

Related iterative#2495
Related iterative#2445

Signed-off-by: Ruslan Kuprieiev <[email protected]>
@efiop
Copy link
Contributor

efiop commented Sep 24, 2019

@Suor Could you please run your checks once more on master after #2526 got merged? How is everything looking now?

@efiop efiop added p2-medium Medium priority, should be done, but less important and removed p1-important Important, aka current backlog of things to do labels Sep 24, 2019
@efiop efiop unassigned Suor Sep 24, 2019
@efiop efiop added p1-important Important, aka current backlog of things to do and removed p2-medium Medium priority, should be done, but less important labels Sep 24, 2019
@Suor
Copy link
Contributor Author

Suor commented Sep 27, 2019

It's about 200 ms for dvc --version and 210 ms for dvc --help for me now.

Here are latest imports profiling: imports.txt

So that leaves us:

  • tqdm - 10 ms
  • ruamel.yaml - 10 ms
  • dvc.remote.base - 8 ms
  • psutil - 7 ms
  • inflect + humanize - 6 ms
  • configobj - 5 ms
  • configparser - 2 ms - used only in innosetup

Our things:

  • dvc.remote.* - 13 ms
  • dvc.utils.compat - 31 ms - too many things combined

Some of this things intersect so numbers can't be summed up. It feels like to reach under 100 ms for simpler commands we will need more than just making more imports dynamic, some code restructuring might be needed. E.g.:

  • separate exception declarations from the bulk of the code
  • move things out of __init__.py files,
  • maybe return to lazy imports for things used in too many functions along the single file,
  • use lazy shortcuts like Tqdm(disable=False, ...) not loading tqdm,
  • something else I am unaware about right now.

There are also some extreme approaches like short-circuiting particular commands, like those version and help ones. Ideally nothing should be loaded until cmdline args are parsed and passed.

@Suor
Copy link
Contributor Author

Suor commented Sep 27, 2019

So we need to decide how deep we need to go into this hole. If we decide to follow the rabbit then I suggest starting from dvc.main this time and investigating from there.

@efiop
Copy link
Contributor

efiop commented Sep 27, 2019

@Suor Thank you for the investigation! 200 ms is pretty good for now, we can leave the rest for the next iteration in the following sprints.

@Suor
Copy link
Contributor Author

Suor commented Dec 4, 2019

Closing this for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p1-important Important, aka current backlog of things to do performance improvement over resource / time consuming tasks research ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

2 participants