adding element attribute to txyz parser #3826

aya9aladdin · 2022-09-12T21:34:31Z

Fixes #

While working on my GSoC project #3753 I encountered an issue with the TXYZParser. As I'm removing guessing processes from happening inside parsers and transferring it to happen through the universe's new guesser API (guess_topologyAttributes), I found that TXYZParser has a special behavior of guessing masses from names instead of guessing it from elements or AtomTypes as per rest of the parsers.

In the new guessing methodology, guessing happens through various guesser classesn, one of which is the DfaultGuesser, which is a general-purpose guesser that mimic the current guesser behavior of the library. In the DefaultGuesser masses are guessed either from elements or atom types. So, in order to not break the current behavior of the TXYZParser, I added Elements attribute to its output to be used in mass guessing.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

pep8speaks · 2022-09-12T21:34:35Z

Hello @aya9aladdin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/topology/TXYZParser.py:

Line 124:14: E111 indentation is not a multiple of four
Line 124:14: E117 over-indented
Line 125:18: E111 indentation is not a multiple of four
Line 126:14: E111 indentation is not a multiple of four

codecov · 2022-09-12T21:54:58Z

Codecov Report

Base: 94.33% // Head: 94.33% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (e114997) compared to base (80ff647).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3826   +/-   ##
========================================
  Coverage    94.33%   94.33%           
========================================
  Files          193      193           
  Lines        24976    24981    +5     
  Branches      3369     3370    +1     
========================================
+ Hits         23562    23567    +5     
  Misses        1365     1365           
  Partials        49       49

Impacted Files	Coverage Δ
package/MDAnalysis/topology/TXYZParser.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jbarnoud · 2022-09-13T06:13:47Z

There is some more context in that comment: #3753 (comment)

jbarnoud

Thank you for the PR!

I made a couple of comments. One about the behaviour, and another one about the tests.

jbarnoud · 2022-09-13T06:16:52Z

package/MDAnalysis/topology/TXYZParser.py

        attrs = [Atomnames(names),
                 Atomids(atomids),
                 Atomtypes(types),
+                 Elements(np.array(validated_elements, dtype=object)),


I think it would be better to add the elements attribute only if all the names are valid element symbols. Otherwise, we take the risk of mistaking names for element symbols.

I can't remember where we got on this conversation, but I actually thought we had somehow agreed that '' was a valid "invalid element was found" fallback case?

I know there's a few places where we allow for incomplete element arrays, but raise a user warning about it?

An empty string is indeed what we agreed upon as a placeholder value. My logic, here, is that the column we use to fill the element attribute is defined as being either the atom names or the element symbols. If we have invalid element symbols it is likely that the column is used as atom names; then, we cannot assume that the names are actual element symbols so we should consider we do not know about the elements.

The relevant part of the Tinker documentation says:

The .xyz file is the basic Tinker Cartesian coordinates file type. It contains a title line followed by one line for each atom in the structure. Each line contains: the sequential number within the structure, an atomic symbol or name, X-, Y-, and Z-coordinates, the force field atom type number of the atom, and a list of the atoms connected to the current atom.

So, at least, it is how I understand the problem.

Ah ok yeah that makes sense. I guess it's like a pseudo guess to work out which of the two is happening.

I got it but as I'm taking the guessing point of view, if the name column have invalid elements then guessing behavior will take two different patterns before and after the new guesser API as follows:
old behavior: guessing masses will be done from atom names which can lead to trashy results here and there
new behavior: guessing masses will be done from atom type, which will result in completely invalid output cause atom types are represented by numeric values
I think sticking to not guess any trashy masses is a good update, but I'm afraid it will break any current default behavior

testsuite/MDAnalysisTests/topology/test_txyz.py

IAlibay · 2022-09-13T18:31:25Z

package/MDAnalysis/topology/TXYZParser.py

        attrs = [Atomnames(names),
                 Atomids(atomids),
                 Atomtypes(types),
+                 Elements(np.array(validated_elements, dtype=object)),


I can't remember where we got on this conversation, but I actually thought we had somehow agreed that '' was a valid "invalid element was found" fallback case?

I know there's a few places where we allow for incomplete element arrays, but raise a user warning about it?

IAlibay · 2022-09-13T18:31:47Z

package/MDAnalysis/topology/TXYZParser.py

+             if n.capitalize() in SYMB2Z:
+                 validated_elements.append(n.capitalize())
+             else:
+                validated_elements.append('')


Do we want to raise a warning about this case? I think we already do this for a few other places.

jbarnoud · 2022-09-15T08:25:05Z

Beyond the comments, you will have to address the todo list from the PR template.

Tests. I dressed them in this review comment: adding element attribute to txyz parser #3826 (comment)

Docs. You need to document your change in two ways. The first is to explain the new behaviour in the docstring of the TXYZParser. The second is to announce in the change in the docstring using the .. versionchanged:: keyword. See

mdanalysis/package/MDAnalysis/topology/PDBParser.py

Lines 181 to 196 in e7ee5a4

    
               .. versionchanged:: 0.18.0 
        
                  Added parsing of Record types 
        
               .. versionchanged:: 1.0.0 
        
                  Added parsing of valid Elements 
        
               .. versionchanged:: 2.0.0 
        
                  Bonds attribute is not added if no bonds are present in PDB file. 
        
                  If elements are invalid or partially missing, empty elements records 
        
                  are now assigned (Issue #2422). 
        
                  Aliased ``bfactors`` topologyattribute to ``tempfactors``. 
        
                  ``bfactors`` is deprecated and will be removed in 3.0 (Issue #1901) 
        
               .. versionchanged:: 2.3.0 
        
                  Formal charges are now read from PDB files if present. No formalcharge 
        
                  attribute is created if no formal charges are present in the PDB file. 
        
                  Any formal charges not set are assumed to have a value of 0. 
        
                  Raise `UserWarning` instead `RuntimeError` 
        
                  when CONECT records are corrupt.

for instance.

Changelog. You need to add your GitHub handle here:

mdanalysis/package/CHANGELOG

Lines 36 to 37 in e7ee5a4

    
           08/29/22 IAlibay, PicoCentauri, orbeckst, hmacdope, rmeli, miss77jun, rzhao271, 
        
                    yuxuanzhuang, hsadia538, lilyminium

and you need to describe what you changed under that section:

mdanalysis/package/CHANGELOG

Line 60 in e7ee5a4

Changes

the point is for users who update to figure out if they are affected.

There is no issue to reference so you can skip that point.

You also need to fix the points raised in the automatic PEP8 comment: #3826 (comment)

aya9aladdin · 2022-09-16T15:44:38Z

Hello @aya9aladdin! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/topology/TXYZParser.py:

Line 124:14: E111 indentation is not a multiple of four
Line 124:14: E117 over-indented
Line 125:18: E111 indentation is not a multiple of four
Line 126:14: E111 indentation is not a multiple of four

I'm confused about those pep8 errors, the lines that are mentioned seems to be well indented, can't figure out where is the issue

jbarnoud

Almost there. Look at how the CHANGELOG is constructed, though. There is one block per version, and there is especially one block for the version we are currently working on (i.e. 2.4.0). This is the block you need to update rather than create a new one.

jbarnoud · 2022-09-19T06:55:57Z

package/CHANGELOG

+
+09/16/22 aya9aladdin
+
+* 2.4.0
+
+Changes
+  *adding element attribute to TXYZParser if all atom names are valid element symbols
+


That is the idea. However, the only things you need to do is add your handle alongside the ones already there for version 2.4.0 and add the description line with the changes for that version. There is no need to create a new block.

You also need to reference the rull request. Look at other entries to see how to do it.

package/MDAnalysis/topology/TXYZParser.py

testsuite/MDAnalysisTests/topology/test_txyz.py

Co-authored-by: Jonathan Barnoud <[email protected]>

aya9aladdin · 2022-09-19T20:55:37Z

all done except for the pep8 error messages that seems to be wiered

package/CHANGELOG

package/MDAnalysis/topology/TXYZParser.py

Co-authored-by: Jonathan Barnoud <[email protected]>

jbarnoud

Great! 🎉

jbarnoud · 2022-09-21T05:48:14Z

One more PR merged 😁 Congrats!

* adding element attribute to txyz parser * adding elements when names is valid only * update docstring and change log * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update TXYZParser.py * Update CHANGELOG * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> * Apply suggestions from code review Co-authored-by: Jonathan Barnoud <[email protected]> Co-authored-by: Jonathan Barnoud <[email protected]>

adding element attribute to txyz parser

45da090

github-actions bot added the Component-Topology label Sep 12, 2022

jbarnoud requested a review from micaela-matta September 13, 2022 06:14

jbarnoud reviewed Sep 13, 2022

View reviewed changes

jbarnoud added the Format-Tinker label Sep 13, 2022

IAlibay reviewed Sep 13, 2022

View reviewed changes

aya9aladdin and others added 8 commits September 16, 2022 17:00

adding elements when names is valid only

2ee44f3

update docstring and change log

173d159

Update TXYZParser.py

6fa4e87

Update TXYZParser.py

440bf34

Update TXYZParser.py

008eb6e

Update TXYZParser.py

38a2ebd

Update TXYZParser.py

46927fc

Update TXYZParser.py

8797f3a

aya9aladdin added 2 commits September 16, 2022 17:50

Merge branch 'develop' into txyz

2cbc1a4

Update TXYZParser.py

dc5e1e7

jbarnoud requested changes Sep 19, 2022

View reviewed changes

aya9aladdin and others added 2 commits September 19, 2022 22:53

Update CHANGELOG

f4f3939

Apply suggestions from code review

7349abd

Co-authored-by: Jonathan Barnoud <[email protected]>

jbarnoud reviewed Sep 20, 2022

View reviewed changes

package/CHANGELOG Outdated Show resolved Hide resolved

package/MDAnalysis/topology/TXYZParser.py Outdated Show resolved Hide resolved

aya9aladdin and others added 2 commits September 20, 2022 18:36

Apply suggestions from code review

654d1f5

Co-authored-by: Jonathan Barnoud <[email protected]>

Merge branch 'develop' into txyz

e114997

jbarnoud approved these changes Sep 21, 2022

View reviewed changes

jbarnoud merged commit fa8b03b into MDAnalysis:develop Sep 21, 2022

IAlibay added the enhancement label Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding element attribute to txyz parser #3826

adding element attribute to txyz parser #3826

aya9aladdin commented Sep 12, 2022

pep8speaks commented Sep 12, 2022

codecov bot commented Sep 12, 2022 •

edited

Loading

jbarnoud commented Sep 13, 2022

jbarnoud left a comment

jbarnoud Sep 13, 2022

IAlibay Sep 13, 2022

jbarnoud Sep 14, 2022

IAlibay Sep 14, 2022

aya9aladdin Sep 16, 2022

IAlibay Sep 13, 2022

IAlibay Sep 13, 2022

jbarnoud commented Sep 15, 2022

aya9aladdin commented Sep 16, 2022

jbarnoud left a comment

jbarnoud Sep 19, 2022

jbarnoud Sep 19, 2022

aya9aladdin commented Sep 19, 2022

jbarnoud left a comment

jbarnoud commented Sep 21, 2022

adding element attribute to txyz parser #3826

adding element attribute to txyz parser #3826

Conversation

aya9aladdin commented Sep 12, 2022

PR Checklist

pep8speaks commented Sep 12, 2022

codecov bot commented Sep 12, 2022 • edited Loading

Codecov Report

jbarnoud commented Sep 13, 2022

jbarnoud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbarnoud commented Sep 15, 2022

aya9aladdin commented Sep 16, 2022

jbarnoud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aya9aladdin commented Sep 19, 2022

jbarnoud left a comment

Choose a reason for hiding this comment

jbarnoud commented Sep 21, 2022

codecov bot commented Sep 12, 2022 •

edited

Loading