Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small refactor of Spreadsheet.from_dict to make it faster. #213

Merged
merged 1 commit into from
Jun 18, 2019

Conversation

doconix
Copy link
Contributor

@doconix doconix commented Jun 14, 2019

After profiling the code to see why from_dict was slow, it was obvious that
the inner function find_cell was the culprit. This refactor removes the
inner function and instead uses a temporary dictionary to make things fast.

The two profile outputs below show load time going from 2.42 -> .23 seconds.


Code used to test (this uses a fairly large, complex spreadsheet):
    from koala import Spreadsheet
    import cProfile, pstats
    pr = cProfile.Profile()
    file_name = '/Users/conan/Downloads/logicandreference-a9-Conan_Albrecht-1.xlsm'
    sp1 = Spreadsheet(file_name)
    data = sp1.asdict()
    pr.enable()
    for i in range(5):
        sp2 = Spreadsheet.from_dict(data)
    pr.disable()
    ps = pstats.Stats(pr).sort_stats('time')
    ps.print_stats(50)

Before the change:
         7158776 function calls (7158276 primitive calls) in 2.418 seconds

   Ordered by: internal time
   List reduced from 83 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5520    1.408    0.000    2.202    0.000 /Users/conan/Documents/data/programming/koala/koala/Spreadsheet.py:1073(find_cell)
  6889145    0.797    0.000    0.797    0.000 /Users/conan/Documents/data/programming/koala/koala/Cell.py:172(address)
     2135    0.050    0.000    0.050    0.000 {built-in method builtins.compile}
    10975    0.047    0.000    0.067    0.000 /Users/conan/Documents/data/programming/koala/koala/Cell.py:21(__init__)

After the change:
         297041 function calls (296541 primitive calls) in 0.230 seconds

   Ordered by: internal time
   List reduced from 83 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10975    0.063    0.000    0.083    0.000 /Users/conan/Documents/data/programming/koala/koala/Cell.py:21(__init__)
     2135    0.041    0.000    0.041    0.000 {built-in method builtins.compile}
        5    0.024    0.005    0.053    0.011 /Users/conan/.pyenv/versions/me3.6/lib/python3.6/site-packages/networkx/readwrite/json_graph/node_link.py:104(node_link_graph)

After profiling the code to see why `from_dict` was slow, it was obvious that
the inner function `find_cell` was the culprit. This refactor removes the
inner function and instead uses a temporary dictionary to make things fast.

The two profile outputs below show load time going from 2.42 -> .23 seconds.

===========================

Code used to test (this uses a fairly large, complex spreadsheet):
    from koala import Spreadsheet
    import cProfile, pstats
    pr = cProfile.Profile()
    file_name = '/Users/conan/Downloads/logicandreference-a9-Conan_Albrecht-1.xlsm'
    sp1 = Spreadsheet(file_name)
    data = sp1.asdict()
    pr.enable()
    for i in range(5):
        sp2 = Spreadsheet.from_dict(data)
    pr.disable()
    ps = pstats.Stats(pr).sort_stats('time')
    ps.print_stats(50)

Before the change:
         7158776 function calls (7158276 primitive calls) in 2.418 seconds

   Ordered by: internal time
   List reduced from 83 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5520    1.408    0.000    2.202    0.000 /Users/conan/Documents/data/programming/koala/koala/Spreadsheet.py:1073(find_cell)
  6889145    0.797    0.000    0.797    0.000 /Users/conan/Documents/data/programming/koala/koala/Cell.py:172(address)
     2135    0.050    0.000    0.050    0.000 {built-in method builtins.compile}
    10975    0.047    0.000    0.067    0.000 /Users/conan/Documents/data/programming/koala/koala/Cell.py:21(__init__)

After the change:
         297041 function calls (296541 primitive calls) in 0.230 seconds

   Ordered by: internal time
   List reduced from 83 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10975    0.063    0.000    0.083    0.000 /Users/conan/Documents/data/programming/koala/koala/Cell.py:21(__init__)
     2135    0.041    0.000    0.041    0.000 {built-in method builtins.compile}
        5    0.024    0.005    0.053    0.011 /Users/conan/.pyenv/versions/me3.6/lib/python3.6/site-packages/networkx/readwrite/json_graph/node_link.py:104(node_link_graph)
@danielsjf
Copy link
Collaborator

This looks good to me.

@danielsjf danielsjf merged commit a9f06e6 into vallettea:master Jun 18, 2019
@danielsjf danielsjf mentioned this pull request Jun 18, 2019
@vallettea
Copy link
Owner

this is awesome, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants