-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh: optimizing the reading of eigenstates from WFSX file #388
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have another closer look at your comments later, this week is havock here ;)
Great work!
Thanks for your comments. Indeed this shows some inconsistencies in the entire SiestaBin suite!!! :) I agree this might be fitting in the binary sile for siesta, for coherence! Thanks for noticing. Regarding the |
No, no. It wasn't there. I added it and then removed it. But couldn't it be useful in some cases to be able to get an eigenstate by its index? It would use the old subroutines and therefore avoid going back and forth between fortran and python. |
Since your method is so fast I think that having two methods for the same thing is not ideal. In any case the end-user would still require some method of knowing the k-point. I can't really see a really good use case that can't be solved by explicit |
Ok! Is this last change fine or was |
I added the possibility to read the information (not values) of all k points at once. It is useful for debugging but could be also useful for other things (you can get a Since all information is read in one call this is significantly faster than iterating over the eigenstates: import sisl
wfsx = sisl.get_sile("spin_unpolarized.bands.WFSX")
geom = sisl.geom.diamond()
def all_fortran():
wfsx.read_file_info(parent=geom)
def python_loop():
for state in wfsx.yield_eigenstate(parent=geom):
state.info
%timeit all_fortran()
%timeit python_loop()
Here is an example of what is returned by
On the 505 MB file that I attached, reading the info takes 140 ms. So I'd say this is very usable in practice even for huge WFSX files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have another closer look next week on this. Looks interesting.
Ok, all this is because I want to allow using WFSX files for plotting wavefunctions, (fat)bands and PDOS. For the wavefunctions and (fat)bands I will just add entry points in the respective plots. For the PDOS, do you think it would be useful to provide a method outside the plotting suite as well? E.g. : Or maybe in a more abstract way provide |
In general, an eigenstate iterator may be interesting(?) Then the code could be generalized and the only thing to change between reading from different sources would be the generator. EigenstateIterator(provider=wfsx.yield_eigenstate(), wrap=calc_PDOS) And if you wanted to do the same with the hamiltonian/BZ you just change the provider: EigenstateIterator(provider=bz.apply.iter.eigenstate(), wrap=calc_PDOS) In the future, for any other code that has a file with eigenstate outputs, it would be trivial to implement the calculation of PDOS. In the raw case it seems pretty stupid but you would have iterators ready to calculate the PDOS, the bands, etc... (e.g. parallelization could be built-in) I don't know exactly how it should work or if it would really be useful, but just dropping the idea here 😅 |
I have added the possibility to read the basis. Although there's not much information in it, it might be useful sometimes as a last resort, who knows (also it probably can be done much more efficiently). Also, I have reorganized things so that they are (hopefully) easier to read and understand. All functions that need external handling of the file unit are called |
c89e5d5
to
85d73bd
Compare
Can I generate a |
Also, one question for PDOS. If I want to generate the PDOS from the |
Yes, I can't see why not. Except that the
Yes, you need the overlap matrix as well. |
So just pass the list of points to the |
Hmm, I see. It may not be so viable after all... Is there a particular reason for not just using the |
I think I just needed And |
Latest commit has a |
062c642
to
854079f
Compare
Could we perhaps limit this PR to the reading of the eigenstates, you are adding too many things that ought to be in a separate MR ;) |
Hmm ok, but I wanted to make sure that the API is useful for creating the plots, which was my initial motivation. Maybe when I finish with the plots I can just keep the changes in the wfsx sile if you want. |
It is fine for now, but generally it would be better to create a new branch off of this one and test it there. Smaller merge requests that are self-contained is much better and easier to manage. |
854079f
to
49f143a
Compare
This is ready for review! After the review, all that's left is to add tests and remove the old reading function. I squashed everything into two commits so that you can check it easier: one for WFSX reading and the other to incorporate it into the plots. |
49f143a
to
18256a7
Compare
Generally one should limit the return types as dictionaries as it imposes naming conventions on certain variables that may not be optimal for the end user. In these cases namedtuple returns are better since users can do whatever they wish with them. Removed lots of not-used variables in the fortran code. Changed names of many of the _bin_* methods to explicitly mention the fortran interface. This clarifies belonging. Also ran pep8 which fixed lots of indentation problems.
I have pushed a new commit which you can check, also added a test for this. |
No, I don't think we should plaster the file with this, it can be done with I clearly made a mistake in the Hamiltonian object to add DOS and PDOS methods there. Better to have these decoupled. |
Note that now each method does not have the wfsxSileSiesta(..., parent=?, geometry=?, sc=?) where the |
Great! I'll submit another PR in a moment then because in the plots I was using |
Also, could you add spin-polarized and noncolinear spin versions of the |
Great, I have a helper script |
Yes, I'll do that |
I remember why I returned dictionaries. Dictionaries allow returning additional items in the future without breaking backwards compatibility. If you return a tuple, you are basically stuck with the choice you made in the first place. What if there's an extra size that you think could be useful to return in the future? If you make the tuple larger, working code will get errors when updating the sisl version, which can be annoying. |
Yes, but dictionaries is really annoying to work with when dealing with simple data. I agree that the flexibility is good. But I am thinking about the end users who are needing to write code. And using dictionaries is in my opinion a needless hurdle. It obfuscates code and complicates things. for out in ...iter:
no_u = out["no_u"]
data = out["data"] could be done with for no_u, data in ...iter:
... it is clear and meaningful. Also when you see other big, succesfull code projects (numpy, scipy). They only ever return dictionaries when you know you have variable data associated. And here we must have some kind of consistency. If we later wishes to return something else, then this could be solved by adding flags to the sile itself. |
Actually, this is why I would prefer namedtuple returns, it has a direct |
No, but it may change 10 years from now and then break codes whose authors may not be maintaining anymore.
I think this could be the best option. |
You can see that that is what I have done in this PR |
Aah ok yes, I was looking at |
Addresses #147 by making python keep track of the fortran file unit.
Now if one wants to iterate over all eigenstates the whole file is only read once, in contrast to the previous implementation, which read the file from the beginning at each iteration. This quick code snippet shows the difference in performance for iterating over all eigenstates for the file
spin_unpolarized.bands.WFSX
, which contains 95 k points:I have some doubts:
spin-loop[k-loop]
). I believe it's the opposite (k-loop[spin-loop]
). You can test this by trying to read a polarized WFSX file, which will fail on the second eigenstate. I attach one so that you can test it. If this is true, I need to make some changes to the old implementation.ik = 68
. Why is this? 😅Files (too big to attach): https://drive.google.com/drive/folders/1lgqckfkpq0mZo-3ucBxtXvz5fMKlXCQv?usp=sharing