Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obscure Eigenvalue error when calling shape_parameter for empty ag #3837

Closed
jaclark5 opened this issue Sep 20, 2022 · 16 comments · Fixed by #3839
Closed

Obscure Eigenvalue error when calling shape_parameter for empty ag #3837

jaclark5 opened this issue Sep 20, 2022 · 16 comments · Fixed by #3839

Comments

@jaclark5
Copy link
Contributor

Expected behavior

When calculating the shape_parameter of an AtomGroup, provide error if the group is empty. This is best placed in center_of_mass and center_of_charge since all other functions reference these two methods first.

Actual behavior

Obscure error that can eventually be determined to result from an empty AtomGroup:
numpy.linalg.LinAlgError: Eigenvalues did not converge

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PDB
universe = mda.Universe(PDB)
ag = universe.select_atoms("type 1")
print(len(ag), ag.shape_parameter())

Current version of MDAnalysis

  • Which version are you using? 2.4.0-dev0
  • Which version of Python? 3.9.13
  • Which operating system? MacOS 11.6
@jaclark5
Copy link
Contributor Author

An example of how to resolve this issue is shown here:
https://github.com/jaclark5/mdanalysis/tree/empty_atomgroup

@orbeckst
Copy link
Member

Would mind making a PR? Thanks.

@jaclark5 jaclark5 mentioned this issue Sep 20, 2022
3 tasks
@jaclark5
Copy link
Contributor Author

I added a draft of a pull request, is that appropriate? The way I read the contributors page suggested that I need to have an issue with a conversation before making Pull Requests.

@orbeckst
Copy link
Member

Thanks for the draft PR, that's great!

The way I read the contributors page suggested that I need to have an issue with a conversation before making Pull Requests.

I don't know if you're referring to https://userguide.mdanalysis.org/stable/contributing_code.html#contributing-to-the-main-codebase :

If your contribution is major, such as a bug fix or a new feature, start by opening an issue first. That way, other people can weigh in on the discussion before you do any work. If you also create a pull request, you should link it to the issue by including the issue number in the pull request’s description.

This is supposed to mean that we really appreciate any and all contributions but that we sometimes have a good idea if something fits into the core library or not or if a particular issue has a non-obvious solution, so we don't want you to waste your time doing a lot of work in advance. But at the end of the day, a PR is a clear way to show how you think a problem can be solved so it's always welcome.

@orbeckst
Copy link
Member

I looked at the draft PR #3839 . At the moment, center_of_mass() and center_of_charge() return empty arrays of shape (0, 3) when applied to an empty AtomGroup. My initial reaction was that this seems to be reasonable behavior, even though it's not documented.

However, these empty arrays are somewhat peculiar objects. Let's make one:

import MDAnalysis as mda; from MDAnalysisTests import datafiles as data
u = mda.Universe(data.TPR, data.XTC)
empty = u.atoms[:0]   # empty AtomGroup

x = u.atoms[:5].positions
e =  empty.center_of_mass()

where x and e are

In [28]: x
Out[28]:
array([[52.02    , 43.560005, 31.550003],
       [51.190002, 44.11    , 31.720001],
       [51.550003, 42.83    , 31.04    ],
       [52.47    , 43.170006, 32.370003],
       [53.07    , 44.21    , 30.75    ]], dtype=float32)

In [29]: e
Out[29]: array([], shape=(0, 3), dtype=float32)

What's weird is how e behaves: It does not broadcast:

In [30]: x + e
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [30], line 1
----> 1 x + e

ValueError: operands could not be broadcast together with shapes (5,3) (0,3)

whereas x + x[0] works (x[0].shape == (1, 3)).

But using this shaped, literally empty array in any kind of operations creates another empty one

In [32]: x[0] + e
Out[32]: array([], shape=(0, 3), dtype=float32)

In [33]: e * x[0]
Out[33]: array([], shape=(0, 3), dtype=float32)

In [34]: 42 * e
Out[34]: array([], shape=(0, 3), dtype=float32)

It's kind of NaN value.

It's a fair question if we want the existing behavior or if we want to replace it, e.g., with raising an exception. More broadly, what should be the behavior of a method of an AtomGroup if the method cannot provide a sensible answer?

Opinions welcome, @MDAnalysis/coredevs .

@hmacdope
Copy link
Member

I would argue an exception is the right behaviour.

@orbeckst
Copy link
Member

I am also tending towards exception.

Will we consider switching to an exception a fix (then it can go into release 2.4) or is it a behavior change (then it might have to wait for 3.0). Any opinions, especially @IAlibay ?

@IAlibay
Copy link
Member

IAlibay commented Sep 22, 2022

Sorry there's a lot of threads here and it's hard to see what the impact is. May I ask for a bit of a ELI5 on which bits of the codes are we looking to specifically change?

@jaclark5
Copy link
Contributor Author

Wait which function are we talking about? The original issue is that the anisotropy methods fail (and soon multipole moments ;) )with an obscure error when the group is empty. Couldn't I add an exception there?
Originally I suggested a change to center_of_* because they were referenced in all those other features.

@orbeckst
Copy link
Member

I had to look up ELI5... The question that @hmacdope and I have been moving towards extends beyond this particular issue: What should our AtomGroups do when a calculation is requested of an empty AtomGroup?

This question is a bit of an edge case because normally AtomGroups only have methods attached that are suitable for the attributes held by the group and so you typically don't run into a situation where a calculation will become non-sensical (incorrectly guessed masses being a glaring exception for things like center_of_mass() etc). This issue showed that it can happen that as part of normal calculations, empty AtomGroups are generated and they return weird numpy stuff (#3837 (comment)), namely shaped empty-set arrays that function as NaNs in the sense that they turn all calculations that they participate in also into these weird arrays (maybe @tylerjereddy can say more about these arrays??). We have been moving towards the following view :

  1. Calculations on empty AtomGroups should raise ValueError (or maybe exceptions.NoDataError
    class NoDataError(ValueError, AttributeError):
    which is derived from ValueError (and AttributeError)?).
  2. We consider the current behavior a bug/currently undefined behavior (and so we can fix it before 3.0).

One can make the case that the mass or charge of an empty group should be 0. But I have no idea what the center of mass or centroid of an empty group should be — certainly, [0, 0, 0] is not the correct answer because for certain molecules it could just be that. In order to be consistent (and following the Python Zen "fail early and often"), I'd suggest that any calculation on an empty AtomGroup raises an exception.

@lilyminium
Copy link
Member

I voted for an exception or NaN in the Discord poll, because that seems more consistent with numpy:

>>> import numpy as np
>>> np.mean([])
/Users/lily/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/Users/lily/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
nan

@orbeckst
Copy link
Member

We had a discussion and a straw-poll in the public #developer channel on Discord. Here's the outcome of the discussion:

  1. Nobody likes the current behavior so we will change it.
  2. The majority seems to favor a hard fail with an exception. The more numpy-style approach to return NaN (and let the user figure out eventually that something went wrong somewhere) also had some proponents.

My fear is that with NaN we get more obscure error reports that we have to help track down to an empty AtomGroup somewhere, thus creating more support burden. Therefore, I am very much inclined to go with raising an exception. (I'd be following Errors should never pass silently. (from the Zen of Python) and "fail early" (a programming principle dating back to at least 2004).)

Based on the characterization of ValueError

Raised when an operation or function receives an argument that has the right type but an inappropriate value [...]

I'd say we raise ValueError.

@IAlibay
Copy link
Member

IAlibay commented Sep 23, 2022

I was actually thinking NoDataError would make more sense? If it's a method then you're error-ing on a lack of data not the fact that it's the wrong value (because technically there are no values?)

Re: API break, I can live with this being a 2.4.0 change, but I would ask that we be very loud about it.

@richardjgowers
Copy link
Member

richardjgowers commented Sep 23, 2022 via email

@IAlibay
Copy link
Member

IAlibay commented Sep 23, 2022

I always thought NoDataError was more like missing topology attribute,
rather than zero sized input.

On Fri, Sep 23, 2022 at 19:53, Irfan Alibay @.***>
wrote:

I was actually thinking NoDataError would make more sense? If it's a
method then you're error-ing on a lack of data not the fact that it's the
wrong value (because technically there are no values?)

Re: API break, I can live with this being a 2.4.0 change, but I would ask
that we be very loud about it.


Reply to this email directly, view it on GitHub
#3837 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACGSGB2TRR26LECUIG4NZN3V7X4AZANCNFSM6AAAAAAQQSXAOE
.
You are receiving this because you are on a team that was mentioned.Message
ID: @.***>

It's a bit of an academic discussion for not a big deal either way tbh, but that falls within my understanding of the chain of failure.

Empty atomgroup -> no masses -> can't use masses

The attribute is still there but only by virtue that our default is that we default to nothing instead of raising an error when there's nothing?

@orbeckst
Copy link
Member

orbeckst commented Sep 23, 2022

Similar to #3837 (comment) I'd also seen NoDataError more like to missing something from the topology. However, after a quick git grep NoDataError it looks as if we are using it quite liberally to mean "you haven't done something that would provide data that I need" or "this timestep doesn't have what I need right now (for whatever reasons)". Based on the current usage practice, I'd also be ok with NoDataError.

NoDataError is derived from ValueError (and AttributeError) so we can treat it as ValueError when catching exceptions. If we start out with ValueError then it's not a problem to make it a NoDataError later because code will still work. Turning a NoDataError into ValueError is a breaking change, though.

(RuntimeError is also a possibility but I'd prefer one of the other, more specific errors.)

Do we want NoDataError instead of ValueError?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants