-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding element attribute to txyz parser #3826
Conversation
Hello @aya9aladdin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
Codecov ReportBase: 94.33% // Head: 94.33% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## develop #3826 +/- ##
========================================
Coverage 94.33% 94.33%
========================================
Files 193 193
Lines 24976 24981 +5
Branches 3369 3370 +1
========================================
+ Hits 23562 23567 +5
Misses 1365 1365
Partials 49 49
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There is some more context in that comment: #3753 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR!
I made a couple of comments. One about the behaviour, and another one about the tests.
attrs = [Atomnames(names), | ||
Atomids(atomids), | ||
Atomtypes(types), | ||
Elements(np.array(validated_elements, dtype=object)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to add the elements
attribute only if all the names are valid element symbols. Otherwise, we take the risk of mistaking names for element symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember where we got on this conversation, but I actually thought we had somehow agreed that ''
was a valid "invalid element was found" fallback case?
I know there's a few places where we allow for incomplete element arrays, but raise a user warning about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty string is indeed what we agreed upon as a placeholder value. My logic, here, is that the column we use to fill the element attribute is defined as being either the atom names or the element symbols. If we have invalid element symbols it is likely that the column is used as atom names; then, we cannot assume that the names are actual element symbols so we should consider we do not know about the elements.
The relevant part of the Tinker documentation says:
The .xyz file is the basic Tinker Cartesian coordinates file type. It contains a title line followed by one line for each atom in the structure. Each line contains: the sequential number within the structure, an atomic symbol or name, X-, Y-, and Z-coordinates, the force field atom type number of the atom, and a list of the atoms connected to the current atom.
So, at least, it is how I understand the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok yeah that makes sense. I guess it's like a pseudo guess to work out which of the two is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it but as I'm taking the guessing point of view, if the name column have invalid elements then guessing behavior will take two different patterns before and after the new guesser API as follows:
old behavior: guessing masses will be done from atom names which can lead to trashy results here and there
new behavior: guessing masses will be done from atom type, which will result in completely invalid output cause atom types are represented by numeric values
I think sticking to not guess any trashy masses is a good update, but I'm afraid it will break any current default behavior
attrs = [Atomnames(names), | ||
Atomids(atomids), | ||
Atomtypes(types), | ||
Elements(np.array(validated_elements, dtype=object)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember where we got on this conversation, but I actually thought we had somehow agreed that ''
was a valid "invalid element was found" fallback case?
I know there's a few places where we allow for incomplete element arrays, but raise a user warning about it?
if n.capitalize() in SYMB2Z: | ||
validated_elements.append(n.capitalize()) | ||
else: | ||
validated_elements.append('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to raise a warning about this case? I think we already do this for a few other places.
Beyond the comments, you will have to address the todo list from the PR template.
You also need to fix the points raised in the automatic PEP8 comment: #3826 (comment) |
I'm confused about those pep8 errors, the lines that are mentioned seems to be well indented, can't figure out where is the issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. Look at how the CHANGELOG is constructed, though. There is one block per version, and there is especially one block for the version we are currently working on (i.e. 2.4.0). This is the block you need to update rather than create a new one.
package/CHANGELOG
Outdated
|
||
09/16/22 aya9aladdin | ||
|
||
* 2.4.0 | ||
|
||
Changes | ||
*adding element attribute to TXYZParser if all atom names are valid element symbols | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the idea. However, the only things you need to do is add your handle alongside the ones already there for version 2.4.0 and add the description line with the changes for that version. There is no need to create a new block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to reference the rull request. Look at other entries to see how to do it.
Co-authored-by: Jonathan Barnoud <[email protected]>
all done except for the pep8 error messages that seems to be wiered |
Co-authored-by: Jonathan Barnoud <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! 🎉
One more PR merged 😁 Congrats! |
* adding element attribute to txyz parser * adding elements when names is valid only * update docstring and change log * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> Co-authored-by: Jonathan Barnoud <[email protected]>
* adding element attribute to txyz parser * adding elements when names is valid only * update docstring and change log * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> Co-authored-by: Jonathan Barnoud <[email protected]>
* adding element attribute to txyz parser * adding elements when names is valid only * update docstring and change log * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> Co-authored-by: Jonathan Barnoud <[email protected]>
Fixes #
While working on my GSoC project #3753 I encountered an issue with the
TXYZParser
. As I'm removing guessing processes from happening inside parsers and transferring it to happen through theuniverse
's new guesser API (guess_topologyAttributes
), I found that TXYZParser has a special behavior of guessingmasses
fromnames
instead of guessing it fromelements
orAtomTypes
as per rest of the parsers.In the new guessing methodology, guessing happens through various guesser classesn, one of which is the
DfaultGuesser
, which is a general-purpose guesser that mimic the current guesser behavior of the library. In theDefaultGuesser
masses are guessed either fromelements
oratom types
. So, in order to not break the current behavior of theTXYZParser
, I added Elements attribute to its output to be used inmass
guessing.PR Checklist