Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding element attribute to txyz parser #3826

Merged
merged 15 commits into from
Sep 21, 2022
Merged

Conversation

aya9aladdin
Copy link
Contributor

Fixes #

While working on my GSoC project #3753 I encountered an issue with the TXYZParser. As I'm removing guessing processes from happening inside parsers and transferring it to happen through the universe's new guesser API (guess_topologyAttributes), I found that TXYZParser has a special behavior of guessing masses from names instead of guessing it from elements or AtomTypes as per rest of the parsers.

In the new guessing methodology, guessing happens through various guesser classesn, one of which is the DfaultGuesser, which is a general-purpose guesser that mimic the current guesser behavior of the library. In the DefaultGuesser masses are guessed either from elements or atom types. So, in order to not break the current behavior of the TXYZParser, I added Elements attribute to its output to be used in mass guessing.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

Hello @aya9aladdin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 124:14: E111 indentation is not a multiple of four
Line 124:14: E117 over-indented
Line 125:18: E111 indentation is not a multiple of four
Line 126:14: E111 indentation is not a multiple of four

@codecov
Copy link

codecov bot commented Sep 12, 2022

Codecov Report

Base: 94.33% // Head: 94.33% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (e114997) compared to base (80ff647).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3826   +/-   ##
========================================
  Coverage    94.33%   94.33%           
========================================
  Files          193      193           
  Lines        24976    24981    +5     
  Branches      3369     3370    +1     
========================================
+ Hits         23562    23567    +5     
  Misses        1365     1365           
  Partials        49       49           
Impacted Files Coverage Δ
package/MDAnalysis/topology/TXYZParser.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jbarnoud
Copy link
Contributor

There is some more context in that comment: #3753 (comment)

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

I made a couple of comments. One about the behaviour, and another one about the tests.

attrs = [Atomnames(names),
Atomids(atomids),
Atomtypes(types),
Elements(np.array(validated_elements, dtype=object)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add the elements attribute only if all the names are valid element symbols. Otherwise, we take the risk of mistaking names for element symbols.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember where we got on this conversation, but I actually thought we had somehow agreed that '' was a valid "invalid element was found" fallback case?

I know there's a few places where we allow for incomplete element arrays, but raise a user warning about it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty string is indeed what we agreed upon as a placeholder value. My logic, here, is that the column we use to fill the element attribute is defined as being either the atom names or the element symbols. If we have invalid element symbols it is likely that the column is used as atom names; then, we cannot assume that the names are actual element symbols so we should consider we do not know about the elements.

The relevant part of the Tinker documentation says:

The .xyz file is the basic Tinker Cartesian coordinates file type. It contains a title line followed by one line for each atom in the structure. Each line contains: the sequential number within the structure, an atomic symbol or name, X-, Y-, and Z-coordinates, the force field atom type number of the atom, and a list of the atoms connected to the current atom.

So, at least, it is how I understand the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok yeah that makes sense. I guess it's like a pseudo guess to work out which of the two is happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it but as I'm taking the guessing point of view, if the name column have invalid elements then guessing behavior will take two different patterns before and after the new guesser API as follows:
old behavior: guessing masses will be done from atom names which can lead to trashy results here and there
new behavior: guessing masses will be done from atom type, which will result in completely invalid output cause atom types are represented by numeric values
I think sticking to not guess any trashy masses is a good update, but I'm afraid it will break any current default behavior

attrs = [Atomnames(names),
Atomids(atomids),
Atomtypes(types),
Elements(np.array(validated_elements, dtype=object)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember where we got on this conversation, but I actually thought we had somehow agreed that '' was a valid "invalid element was found" fallback case?

I know there's a few places where we allow for incomplete element arrays, but raise a user warning about it?

if n.capitalize() in SYMB2Z:
validated_elements.append(n.capitalize())
else:
validated_elements.append('')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to raise a warning about this case? I think we already do this for a few other places.

@jbarnoud
Copy link
Contributor

Beyond the comments, you will have to address the todo list from the PR template.

  • Tests. I dressed them in this review comment: adding element attribute to txyz parser #3826 (comment)
  • Docs. You need to document your change in two ways. The first is to explain the new behaviour in the docstring of the TXYZParser. The second is to announce in the change in the docstring using the .. versionchanged:: keyword. See
    .. versionchanged:: 0.18.0
    Added parsing of Record types
    .. versionchanged:: 1.0.0
    Added parsing of valid Elements
    .. versionchanged:: 2.0.0
    Bonds attribute is not added if no bonds are present in PDB file.
    If elements are invalid or partially missing, empty elements records
    are now assigned (Issue #2422).
    Aliased ``bfactors`` topologyattribute to ``tempfactors``.
    ``bfactors`` is deprecated and will be removed in 3.0 (Issue #1901)
    .. versionchanged:: 2.3.0
    Formal charges are now read from PDB files if present. No formalcharge
    attribute is created if no formal charges are present in the PDB file.
    Any formal charges not set are assumed to have a value of 0.
    Raise `UserWarning` instead `RuntimeError`
    when CONECT records are corrupt.
    for instance.
  • Changelog. You need to add your GitHub handle here:
    08/29/22 IAlibay, PicoCentauri, orbeckst, hmacdope, rmeli, miss77jun, rzhao271,
    yuxuanzhuang, hsadia538, lilyminium
    and you need to describe what you changed under that section: the point is for users who update to figure out if they are affected.
  • There is no issue to reference so you can skip that point.

You also need to fix the points raised in the automatic PEP8 comment: #3826 (comment)

@aya9aladdin
Copy link
Contributor Author

Hello @aya9aladdin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 124:14: E111 indentation is not a multiple of four
Line 124:14: E117 over-indented
Line 125:18: E111 indentation is not a multiple of four
Line 126:14: E111 indentation is not a multiple of four

I'm confused about those pep8 errors, the lines that are mentioned seems to be well indented, can't figure out where is the issue

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Look at how the CHANGELOG is constructed, though. There is one block per version, and there is especially one block for the version we are currently working on (i.e. 2.4.0). This is the block you need to update rather than create a new one.

Comment on lines 16 to 23

09/16/22 aya9aladdin

* 2.4.0

Changes
*adding element attribute to TXYZParser if all atom names are valid element symbols

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the idea. However, the only things you need to do is add your handle alongside the ones already there for version 2.4.0 and add the description line with the changes for that version. There is no need to create a new block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to reference the rull request. Look at other entries to see how to do it.

package/MDAnalysis/topology/TXYZParser.py Outdated Show resolved Hide resolved
package/MDAnalysis/topology/TXYZParser.py Outdated Show resolved Hide resolved
@aya9aladdin
Copy link
Contributor Author

all done except for the pep8 error messages that seems to be wiered

package/CHANGELOG Outdated Show resolved Hide resolved
package/MDAnalysis/topology/TXYZParser.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 🎉

@jbarnoud jbarnoud merged commit fa8b03b into MDAnalysis:develop Sep 21, 2022
@jbarnoud
Copy link
Contributor

One more PR merged 😁 Congrats!

jaclark5 pushed a commit to jaclark5/mdanalysis that referenced this pull request Sep 22, 2022
* adding element attribute to txyz parser

* adding elements when names is valid only

* update docstring and change log

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update CHANGELOG

* Apply suggestions from code review

Co-authored-by: Jonathan Barnoud <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jonathan Barnoud <[email protected]>

Co-authored-by: Jonathan Barnoud <[email protected]>
jaclark5 pushed a commit to jaclark5/mdanalysis that referenced this pull request Sep 23, 2022
* adding element attribute to txyz parser

* adding elements when names is valid only

* update docstring and change log

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update CHANGELOG

* Apply suggestions from code review

Co-authored-by: Jonathan Barnoud <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jonathan Barnoud <[email protected]>

Co-authored-by: Jonathan Barnoud <[email protected]>
jaclark5 pushed a commit to jaclark5/mdanalysis that referenced this pull request Oct 11, 2022
* adding element attribute to txyz parser

* adding elements when names is valid only

* update docstring and change log

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update TXYZParser.py

* Update CHANGELOG

* Apply suggestions from code review

Co-authored-by: Jonathan Barnoud <[email protected]>

* Apply suggestions from code review

Co-authored-by: Jonathan Barnoud <[email protected]>

Co-authored-by: Jonathan Barnoud <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants