Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Cythonizes GROParser & GROReader very minimally #2227

Closed
wants to merge 9 commits into from

Conversation

fenilsuchak
Copy link
Member

@fenilsuchak fenilsuchak commented Mar 27, 2019

Changes made in this Pull Request:

  • Cythonized version of GRO class is called when parsing ascii file. Not sure if this is the best way to do it. Please suggest changes.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@orbeckst
Copy link
Member

In your email you said

I have made changes in the GROParser and created a cythonized version(very basic changes). It gives an improvement of 0.05 seconds on average on a 2mb gro file. This is not much given that cythonization is not extensively done.

The code does not seem to run on the CI (AppVeyor fails) but taking for a moment that you got your changes to build locally, it is interesting that the simplest approach of just wrapping the code in cython does not change much.

Do you know where the Python code spends most of its time, i.e., what is the performance bottleneck? Did you profile the Python GRO parser (e.g. with line_profiler) to find out? This will give you a better idea of what needs optimizing.

(Also, eventually the PR needs to build on the CI and we need to figure out if we can move the cython code to the lib module - might be tricky for parsers because code in lib is supposed to be independent of MDAnalysis things such as topologies and AtomGroups. But first explore and see if there are some obvious ways to speed up the parser.)

@orbeckst
Copy link
Member

@Fenil3510 , I remembered that a while ago PR #1862 was started with the goal to leverage the chemfiles python interface for some trajectory formats. They don't support all formats but a a fair number, at least as coordinate "Readers/Writers". Not sure if they can be used as topology "parsers", too.

chemfiles contains a GRO reader/writer (written in C/C++ but apparently based on VMD's molfile plugin). I don't know how seamlessly chemfiles can interact with MDAnalysis.

We didn't have this in our project description, partly because the PR stalled, partly because it is complicated enough to understand MDAnalysis and we don't necessarily want you to having to spend time learning another library that we ourselves know little about. However, I feel that this should be mentioned so that you have a better idea of what else is going on in MDAnalysis, as potentially related to your project.

If you or anyone else has opinions, please discuss them here.

cc @richardjgowers @jbarnoud

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Mar 28, 2019

The code does not seem to run on the CI (AppVeyor fails) but taking for a moment that you got your changes to build locally, it is interesting that the simplest approach of just wrapping the code in cython does not change much.

Do you know where the Python code spends most of its time, i.e., what is the performance bottleneck? Did you profile the Python GRO parser (e.g. with line_profiler) to find out? This will give you a better idea of what needs optimizing.

Thanks for the suggestions @orbeckst. Yes, I just wrapped the code in Cython. I'll see why tests are failing. I'll look into line_profiler to find out what's to be done to speed it better.

@richardjgowers
Copy link
Member

@orbeckst I think the problem with adopting C++/C for reading ascii is it then becomes very hard (for python devs) to patch. Currently our stuff bends over backwards to read everything because we can patch it.

@Fenil3510 I think this might be a good place to get ideas for how to do quick ascii i/o:

http://wesmckinney.com/blog/speeding-up-pandass-file-parsers-with-cython/

Then obviously finding the corresponding cython code in the pandas repo (if they still use Cython).

@Luthaf
Copy link
Contributor

Luthaf commented Mar 28, 2019

Hey, chemfiles dev here! So PR #1862 have stalled a bit, but I just finished the release of version 0.9 of chemfiles so I'll try to back to it. It was basically working, just needed a bit of documentation. It might take a bit of time as I am currently writing my thesis ...

Not sure if they can be used as topology "parsers", too.

It should be easy to convert from chemfiles Topology class to MDAnalysis version. Current version of #1862 only uses the coordinates and throw away the topology, but I could update it to include chemfiles as a topology reader.

chemfiles contains a GRO reader/writer (written in C/C++ but apparently based on VMD's molfile plugin). I don't know how seamlessly chemfiles can interact with MDAnalysis.

We actually have our own implementation of GRO format, not based on VMD molfiles. It lives here: https://github.com/chemfiles/chemfiles/blob/master/src/formats/GRO.cpp.

I think the problem with adopting C++/C for reading ascii is it then becomes very hard (for python devs) to patch. Currently our stuff bends over backwards to read everything because we can patch it.

We try to write readable and easy to modify C++ code in chemfiles, and modern C++ is closer to Python than old school C. But there is always a barrier to overcome when contributing to a project in a different language.

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other obvious thing that can be done is cython -a c_GROParser.pyx and look for the yellow lines in the html output for expensive Python overhead calls.

One of us will probably do that eventually when reviewing the PR since it is so straightforward.

package/MDAnalysis/topology/GROParser.py Outdated Show resolved Hide resolved
@tylerjereddy
Copy link
Member

finding the corresponding cython code in the pandas repo

+1 to at least check this; pandas goes to great lengths to optimize some if its file loaders and trying to optimize text file reading can be a pretty deep rabbit hole.

@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #2227 into develop will increase coverage by 0.34%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2227      +/-   ##
===========================================
+ Coverage     89.3%   89.65%   +0.34%     
===========================================
  Files          156      172      +16     
  Lines        19407    21414    +2007     
  Branches      2783     2779       -4     
===========================================
+ Hits         17332    19198    +1866     
- Misses        1473     1620     +147     
+ Partials       602      596       -6
Impacted Files Coverage Δ
package/MDAnalysis/topology/GROParser.pyx 100% <100%> (ø)
package/MDAnalysis/coordinates/MOL2.py 93.33% <100%> (+0.05%) ⬆️
package/MDAnalysis/analysis/encore/utils.py 83.45% <100%> (ø) ⬆️
package/MDAnalysis/topology/MOL2Parser.py 95% <100%> (+0.06%) ⬆️
package/MDAnalysis/coordinates/TRJ.py 90.6% <0%> (-0.91%) ⬇️
...age/MDAnalysis/analysis/hbonds/hbond_autocorrel.py 87.8% <0%> (-0.54%) ⬇️
core/util.py 100% <0%> (ø)
transformations/__init__.py 100% <0%> (ø)
auxiliary/base.py 89% <0%> (ø)
util.py 88.15% <0%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c7c394e...62e3f04. Read the comment docs.

@fenilsuchak
Copy link
Member Author

@tylerjereddy @orbeckst. I profiled the code of GROParser line by line and checked what takes the most time. It was obviously the looping part, so I thought to make the for loop cythonized. I read here that cythons access to numpy array is much faster for indexing than the regular array. I made the changes accordingly, but I get an improvement of only 40ms on average, ie. actual average time reduces from 710 to 640ms. I still have to dig into pandas parsers as suggested. Any other obvious cythonization I'm missing?

cdef np.ndarray[np.int32_t] resids
cdef np.ndarray[object] resnames
cdef np.ndarray[object] names
cdef np.ndarray[np.int32_t] indices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use typed memoryviews here vs. the old buffer syntax

# 1) find large changes in index, (ie non sequential blocks)
diff = np.diff(wraps) != 1
# 2) make array of where 0-blocks start
starts = np.hstack([wraps[0], wraps[1:][diff]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starts is initialized here without prior type declaration

@fenilsuchak
Copy link
Member Author

@tylerjereddy numpy functions np.any and np.where don't seem to work with cython typed memoryviews. Would have to write custom function if they are to be used.

@tylerjereddy
Copy link
Member

You can use constructs like np.asarray(memoryview) where needed, but if there's an opportunity for creative speedup with custom loops you may want to look into it. Memoryviews usually can work without the GIL unlike the buffer interface, so it may require some careful design to really take advantage.

@tylerjereddy
Copy link
Member

NumPy also has a C API

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Apr 2, 2019

@tylerjereddy , I see. I will change it to memoryview and see if the custom function has any improvement. I was focussing more on the for loop for line in inf and that's where most of the time goes (according to line_profiler). So I was thinking of

  1. Faster indexing of numpy array (which I expected to speed up after using buffer interface, but didn't)
  2. Custom type conversion function(no python interaction) to replace int(line[0:5]) but it ended being rather costly
  3. Can string slicing be improved in any way?
  4. Using Pandas directly?

I'd be grateful for any suggestions.

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Apr 4, 2019

The current cythonization is in the GROParser ie. parse function to be specific. Profiling the parse function separately gives an improvement of 2.5 - 3x on any file (files tested 2MB to 17 MB). But the overall reading from mda.Universe() shows a negligible gain. This is because parsing takes 6% of the total time. So this is surely not the bottleneck. I chose GROParser for getting started. I will be looking into GROReader for some time and probably find out the bottleneck
Any suggestions are welcomed.

@fenilsuchak fenilsuchak changed the title Cythonizes GROParser very minimally [WIP] Cythonizes GROParser very minimally Apr 17, 2019
@fenilsuchak
Copy link
Member Author

fenilsuchak commented Apr 26, 2019

@orbeckst as you said

Also, eventually the PR needs to build on the CI and we need to figure out if we can move the cython code to the lib module - might be tricky for parsers because code in lib is supposed to be independent of MDAnalysis things such as topologies and AtomGroups.

I was wondering what is the reason to move cython code to lib module? Sorry, if the question is very basic.

@orbeckst
Copy link
Member

When you look at lib.formats you see that these readers are written in a way independent from core MDAnalysis code (i.e., no universes, AtomGroups, Topology, etc). In #1194 we aimed for making more of our readers low-level and MDAnalysis-agnostic. When I said

we need to figure out if we can move the cython code to the lib module - might be tricky for parsers because code in lib is supposed to be independent of MDAnalysis things such as topologies and AtomGroups.

I was expressing the desire to do the same thing for parsers but also expressed that I kind of doubted that this could be done easily.

In short: Moving GROParser to lib.formats is not high priority. I was interested in knowing if it was even a possibility.

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Apr 26, 2019

@orbeckst thankyou for the information. Yes, the intermediate Python file can be removed I believe, I had kept it to play safe. Now that I've familiarized with Cython modules, looks like it shouldn't cause an issue, I'll make the changes soon.

@fenilsuchak
Copy link
Member Author

Had some tests failing in test_nuclinfo.py, so had to merge develop into this.

@fenilsuchak
Copy link
Member Author

Hmm, test fails due to a numpy merge here. Some changes in pickling.

cdef object[:] resnames
cdef object[:] names
cdef int[:] indices
cdef long long[:] starts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure we just need ints here too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis was failing with int and expecting it to be long long as I remember. I'll double check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.where returns a np.int64 but mac integer is 32-bit as the same issure here, I've type-casted the arrays inplace with np.int32 and works with int now.

@richardjgowers
Copy link
Member

@Fenil3510 don't worry about those failures. What are loading times like with these changes?

Copy link
Member Author

@fenilsuchak fenilsuchak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fenil3510 don't worry about those failures. What are loading times like with these changes?

The current cythonization is in the GROParser ie. parse function to be specific. Profiling the parse function separately gives an improvement of 2.5 - 3x on any file (files tested 2MB to 17 MB). But the overall reading from mda.Universe() shows a very less gain. This is because parsing takes 6% of the total time. So this is surely not the bottleneck. I chose GROParser for getting started. I will be looking into GROReader for some time and probably find out the bottleneck. Haven't had time to profile it further. I'll be doing this over a couple of days.

package/MDAnalysis/topology/c_GROParser.pyx Outdated Show resolved Hide resolved
package/MDAnalysis/topology/GROParser.pyx Outdated Show resolved Hide resolved
cdef object[:] resnames
cdef object[:] names
cdef int[:] indices
cdef long long[:] starts
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis was failing with int and expecting it to be long long as I remember. I'll double check.

@orbeckst
Copy link
Member

@Fenil3510 could you please benchmark the parser on its own and full universe creation with the GRO files in the vesicle data set https://www.mdanalysis.org/MDAnalysisData/vesicles.html ?

conda install -c conda-forge MDAnalysisData

and

from MDAnalyisData.datasets import fetch_vesicle_lib

dataset = fetch_vesicle_lib()

grofiles = dataset.structure    # should be 3 GRO files

These are fairly big systems (see dataset.labels and the web page for description). It would ne nice to have a reproducible benchmark that we can also report, e.g., in a blog post or paper.

@fenilsuchak
Copy link
Member Author

@orbeckst sorry for the delay, have my exams running parallelly.
Here are the results and benchmarks

  1. This is only the parser (ie. parse function to be specific), shows gains of 2.5x to 3.5x
    benchmark_1

  2. Gains in making the whole universe are less for smallest file but upto 1.5x on biggest file. This is expected though.
    image

Working on cythonizing the readers will definitely help increase gains.

@orbeckst
Copy link
Member

@Fenil3510 many thanks, very useful and encouraging!

Good luck with your exams!

@fenilsuchak
Copy link
Member Author

fenilsuchak commented May 2, 2019

It would be nice to have a reproducible benchmark that we can also report, e.g., in a blog post or paper.

Sure, I will start working on this after May 9th.

@orbeckst
Copy link
Member

orbeckst commented Jun 5, 2019

I merged current develop, hopefully the tests will all pass then.

(Apologies – it made the diff noisy.)

@fenilsuchak
Copy link
Member Author

I've been busy lately, I'll get back on this pretty soon.

@orbeckst
Copy link
Member

Not sure why the old numpy test (NUMPY_VERSION=1.10.4) fails, Job #1030.5

==================================== ERRORS ====================================
_________________ ERROR collecting MDAnalysisTests/test_api.py _________________
ImportError while importing test module '/home/travis/build/MDAnalysis/mdanalysis/testsuite/MDAnalysisTests/test_api.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
testsuite/MDAnalysisTests/test_api.py:29: in <module>
    import MDAnalysis as mda
package/MDAnalysis/__init__.py:184: in <module>
    from .lib import log
package/MDAnalysis/lib/__init__.py:42: in <module>
    from . import pkdtree
package/MDAnalysis/lib/pkdtree.py:35: in <module>
    from scipy.spatial import cKDTree
../../../miniconda/envs/test/lib/python3.5/site-packages/scipy/spatial/__init__.py:99: in <module>
    from .kdtree import *
../../../miniconda/envs/test/lib/python3.5/site-packages/scipy/spatial/kdtree.py:8: in <module>
    import scipy.sparse
../../../miniconda/envs/test/lib/python3.5/site-packages/scipy/sparse/__init__.py:230: in <module>
    from .csr import *
../../../miniconda/envs/test/lib/python3.5/site-packages/scipy/sparse/csr.py:13: in <module>
    from ._sparsetools import (csr_tocsc, csr_tobsr, csr_count_blocks,
E   ImportError: numpy.core.multiarray failed to import
------------------------------- Captured stderr --------------------------------
RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa

I am restarting the job... maybe it's a conda glitch???

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Jun 13, 2019

Looks like an issue with Travis's Numpy version. All other PR's are failing the same test.

@fenilsuchak fenilsuchak changed the title [WIP] Cythonizes GROParser very minimally [WIP] Cythonizes GROParser & GROReader very minimally Jun 17, 2019
@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #2227 into develop will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2227      +/-   ##
===========================================
- Coverage    89.66%   89.65%   -0.02%     
===========================================
  Files          172      172              
  Lines        21398    21414      +16     
  Branches      2785     2779       -6     
===========================================
+ Hits         19186    19198      +12     
- Misses        1616     1620       +4     
  Partials       596      596
Impacted Files Coverage Δ
package/MDAnalysis/topology/GROParser.pyx 100% <100%> (ø)
package/MDAnalysis/coordinates/TRJ.py 90.6% <0%> (-0.91%) ⬇️
package/MDAnalysis/lib/util.py 88.18% <0%> (-0.3%) ⬇️
package/MDAnalysis/core/groups.py 98.25% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c820cc5...a41a5c1. Read the comment docs.

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Jun 17, 2019

I'd need some help here. I declared static types in GROReader(GRO.py) for some variables and converted it into Cython file(This method worked with GROParser). But too many tests are failing for this commit, while I can build the universe with MDAnalysisData.datasets with this commit and gains of 3-4x but tests are failing as I can't access position method of Atom class which is weird. I'm unable to trace the exact reason for this, any help would be much appreciated.

@richardjgowers
Copy link
Member

richardjgowers commented Jun 20, 2019 via email

@@ -116,6 +116,10 @@
from ..core import flags
from ..exceptions import NoDataError
from ..lib import util
cimport cython
@cython.boundscheck(False) # Deactivate bounds checking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these decorators aren't doing what you are expecting here, they won't apply to the whole file. I think they're probably just modifying the Timestep object.

You probably need to decorate the class method directly

@richardjgowers
Copy link
Member

@Fenil3510 the reader class wasn't getting added to mda._READERS which is a dict of all the reader classes which is usually done automatically by the class it inherits from... Not sure why, but I've manually added it for now in coordinates.__init__. I did a rebase and force pushed to your branch, but it's "working" now, in that the code is ran and throws an error :)

@fenilsuchak
Copy link
Member Author

fenilsuchak commented Jul 8, 2019

Yeah, it's failing silently even though there is an except statement to raise KeyError if it's not in the registry. Looking into it as many other tests failing too.

@richardjgowers
Copy link
Member

hey @Fenil3510

I played around with writing a Cython XYZReader function here:

https://gist.github.com/richardjgowers/dbec308c364aae8b9ca84e8e22171823

There's no error checking (this would need to be added!) but it seems to go pretty fast if you want some ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants