Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make retrievecontentpack faster #4863

Closed
aronasorman opened this issue Feb 18, 2016 · 5 comments
Closed

Make retrievecontentpack faster #4863

aronasorman opened this issue Feb 18, 2016 · 5 comments

Comments

@aronasorman
Copy link
Collaborator

Benchmark results here: #4827 (comment)

@MCGallaspy
Copy link
Contributor

Tentatively calling this a release blocker.

@MCGallaspy
Copy link
Contributor

Would it be faster to extract all and then rename files and directories?

@aronasorman
Copy link
Collaborator Author

My baseline for extract_assessment_items:

*** PROFILER RESULTS ***
extract_assessment_items (/home/aron/src/github.com/aronasorman/ka-lite/kalite/distributed/management/commands/retrievecontentpack.py:170)
function called 1 times

         2511087 function calls in 26.822 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 43 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    4.612    4.612   26.822   26.822 retrievecontentpack.py:170(extract_assessment_items)
    59502    8.154    0.000    8.154    0.000 {open}
    29751    0.328    0.000    5.525    0.000 shutil.py:46(copyfileobj)
    78358    0.521    0.000    4.155    0.000 zipfile.py:621(read)
    29752    0.817    0.000    3.906    0.000 general.py:136(ensure_dir)
   108081    1.117    0.000    3.542    0.000 zipfile.py:649(read1)
    29751    0.821    0.000    2.746    0.000 zipfile.py:937(open)
    29752    0.411    0.000    2.582    0.000 os.py:136(makedirs)
   108109    1.409    0.000    1.409    0.000 {method 'read' of 'file' objects}
    48607    1.042    0.000    1.042    0.000 {method 'write' of 'file' objects}
    29751    0.323    0.000    1.017    0.000 zipfile.py:701(close)
    29752    0.251    0.000    0.971    0.000 genericpath.py:15(exists)
    59504    0.948    0.000    0.948    0.000 {posix.stat}
    48607    0.199    0.000    0.932    0.000 zipfile.py:639(_update_crc)
    78358    0.773    0.000    0.773    0.000 {zlib.crc32}
    59503    0.568    0.000    0.721    0.000 posixpath.py:68(join)
    29752    0.695    0.000    0.695    0.000 {posix.mkdir}
    29752    0.155    0.000    0.507    0.000 genericpath.py:38(isdir)
    29752    0.384    0.000    0.505    0.000 posixpath.py:89(split)
    29751    0.491    0.000    0.491    0.000 {method 'close' of 'file' objects}
    29751    0.403    0.000    0.473    0.000 zipfile.py:515(__init__)
    29751    0.445    0.000    0.445    0.000 {method 'seek' of 'file' objects}
    29751    0.244    0.000    0.380    0.000 posixpath.py:119(basename)
   872694    0.244    0.000    0.244    0.000 {len}
    29752    0.224    0.000    0.224    0.000 retrievecontentpack.py:180(<genexpr>)
    29751    0.203    0.000    0.203    0.000 {function close at 0x7fcfbda9b5a0}
    59503    0.175    0.000    0.175    0.000 {method 'rfind' of 'str' objects}
    29751    0.104    0.000    0.136    0.000 zipfile.py:904(getinfo)
    29752    0.080    0.000    0.123    0.000 stat.py:40(S_ISDIR)
    29751    0.103    0.000    0.103    0.000 {_struct.unpack}
    48607    0.100    0.000    0.100    0.000 {max}
    59503    0.099    0.000    0.099    0.000 {method 'startswith' of 'str' objects}
    29751    0.081    0.000    0.081    0.000 {isinstance}
    29752    0.062    0.000    0.062    0.000 {method 'rstrip' of 'str' objects}
    59503    0.054    0.000    0.054    0.000 {method 'endswith' of 'str' objects}
    48607    0.050    0.000    0.050    0.000 {min}
    29752    0.043    0.000    0.043    0.000 stat.py:24(S_IFMT)
    29751    0.031    0.000    0.031    0.000 {method 'get' of 'dict' objects}
    29751    0.030    0.000    0.030    0.000 {hasattr}
        1    0.018    0.018    0.023    0.023 zipfile.py:872(namelist)

@aronasorman
Copy link
Collaborator Author

I went with what you suggested @MCGallaspy. New profile:

*** PROFILER RESULTS ***
extract_assessment_items (/home/aron/src/github.com/aronasorman/ka-lite/kalite/distributed/management/commands/retrievecontentpack.py:170)
function called 1 times

         261427 function calls (257338 primitive calls) in 2.774 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.774    2.774 retrievecontentpack.py:170(extract_assessment_items)
        1    0.001    0.001    2.774    2.774 retrievecontentpack.py:199(_remove_old_assessment_resources)
   4090/1    0.283    0.000    2.773    2.773 shutil.py:210(rmtree)
    29751    0.986    0.000    0.986    0.000 {posix.remove}
     4090    0.858    0.000    0.858    0.000 {posix.listdir}
    37930    0.233    0.000    0.233    0.000 {posix.lstat}
    33841    0.153    0.000    0.199    0.000 posixpath.py:68(join)
     4090    0.111    0.000    0.111    0.000 {posix.rmdir}
    33840    0.052    0.000    0.082    0.000 stat.py:40(S_ISDIR)
     4090    0.012    0.000    0.038    0.000 posixpath.py:139(islink)
    37930    0.032    0.000    0.032    0.000 stat.py:24(S_IFMT)
    33841    0.030    0.000    0.030    0.000 {method 'startswith' of 'str' objects}
    33841    0.016    0.000    0.016    0.000 {method 'endswith' of 'str' objects}
     4090    0.006    0.000    0.009    0.000 stat.py:55(S_ISLNK)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        0    0.000             0.000          profile:0(profiler)


[398592 refs]

26 seconds vs. 2 seconds. 👍

@aronasorman
Copy link
Collaborator Author

NVM. Last option had a bug. New profile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   15.781   15.781 retrievecontentpack.py:170(extract_assessment_items)
        1    0.000    0.000   15.781   15.781 retrievecontentpack.py:205(_extract_assessment_resources)
        1    0.114    0.114   15.725   15.725 zipfile.py:1030(extractall)
    29751    0.353    0.000   15.525    0.001 zipfile.py:1016(extract)
    29751    3.980    0.000   15.004    0.001 zipfile.py:1042(_extract_member)
    29751    0.322    0.000    5.494    0.000 shutil.py:46(copyfileobj)
    78358    0.527    0.000    4.104    0.000 zipfile.py:621(read)
   108081    1.076    0.000    3.484    0.000 zipfile.py:649(read1)
    29751    0.628    0.000    2.096    0.000 zipfile.py:937(open)
   108109    1.430    0.000    1.430    0.000 {method 'read' of 'file' objects}
    48607    1.069    0.000    1.069    0.000 {method 'write' of 'file' objects}
    48607    0.194    0.000    0.930    0.000 zipfile.py:639(_update_crc)
    29751    0.336    0.000    0.865    0.000 zipfile.py:701(close)
    78358    0.767    0.000    0.767    0.000 {zlib.crc32}
    29752    0.423    0.000    0.645    0.000 posixpath.py:336(normpath)
    33841    0.180    0.000    0.587    0.000 genericpath.py:15(exists)
    59505    0.281    0.000    0.494    0.000 {method 'join' of 'str' objects}
    29751    0.427    0.000    0.427    0.000 {open}
    33842    0.407    0.000    0.407    0.000 {posix.stat}
    29751    0.315    0.000    0.370    0.000 zipfile.py:515(__init__)
    29751    0.339    0.000    0.339    0.000 {method 'seek' of 'file' objects}
    29751    0.289    0.000    0.289    0.000 {method 'close' of 'file' objects}
4090/4089    0.026    0.000    0.270    0.000 os.py:136(makedirs)
    29751    0.178    0.000    0.253    0.000 posixpath.py:127(dirname)
   876795    0.243    0.000    0.243    0.000 {len}
    29751    0.241    0.000    0.241    0.000 {function close at 0x7fea346775a0}
    29755    0.177    0.000    0.235    0.000 posixpath.py:68(join)
   119004    0.213    0.000    0.213    0.000 zipfile.py:1055(<genexpr>)
     4091    0.168    0.000    0.168    0.000 {posix.mkdir}
    59503    0.128    0.000    0.128    0.000 {method 'split' of 'str' objects}
    89254    0.119    0.000    0.119    0.000 {isinstance}
    29751    0.075    0.000    0.102    0.000 zipfile.py:904(getinfo)
    48607    0.095    0.000    0.095    0.000 {max}
    29752    0.086    0.000    0.086    0.000 retrievecontentpack.py:208(<genexpr>)
    29751    0.079    0.000    0.079    0.000 {_struct.unpack}
   178512    0.071    0.000    0.071    0.000 {method 'append' of 'list' objects}
    89260    0.069    0.000    0.069    0.000 {method 'startswith' of 'str' objects}
    29751    0.056    0.000    0.056    0.000 {method 'replace' of 'str' objects}
    48607    0.054    0.000    0.054    0.000 {min}
    33841    0.040    0.000    0.040    0.000 {method 'rfind' of 'str' objects}


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants