Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to retrieve specific types of structure info from PubChem #544

Merged
merged 5 commits into from
Dec 15, 2019

Conversation

hay-k
Copy link
Contributor

@hay-k hay-k commented Nov 29, 2019

Currently, the function geometry_from_pubchem in utils/_pubchem.py/ constructs the geometry of a molecule based on 3D structure info if available, otherwise the 2D structure info is used. If both 3D and 2D info are not available in PubChem, None is returned.

In this pull request I have implemented a new argument for this functiion: structure. With this argument users of the function will be able to request only 3D or only 2D information. The default behavior of the function is unchanged, i.e. if the 'structure' argument is not specified, the function will behave as before this pull request, for backwards compatibility.

The reason for the implementation of this argument is, that for a bunch of molecules, the 2D structure information has nothing to do with the actual geometric sizes of the molecule. And, when 3D is not available (e.g for H2), the default behavior will return incorrect geometry based on the non-accurate 2D structure info. It is sometimes better to return nothing, than return incorrect data. With this new option users will be able to force the function to use e.g. only 3D structure info.

PubChem support confirms, that the 2D structure info is not guaranteed to reflect actual information about molecules. Here is a snipped from their reply:

PubChem 2-D structures have coordinates generated by a layout algorithm. They do not reflect anything other than that to help provide a depiction that is a suitable distance from each other. Yes, they could be normalized in some way (e.g., 0 and 1 or -0.5 and 0.5) but they are arbitrary, in a sense.
Re_ [Pubchem-help] 2D coordinates of compounds_a.pdf

My full email conversation with PubChem support is attached!

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@hay-k
Copy link
Contributor Author

hay-k commented Nov 29, 2019

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@babbush babbush requested a review from ncrubin December 2, 2019 03:16
src/openfermion/utils/_pubchem.py Outdated Show resolved Hide resolved
src/openfermion/utils/_pubchem.py Show resolved Hide resolved
src/openfermion/utils/_pubchem.py Show resolved Hide resolved
@ncrubin
Copy link
Collaborator

ncrubin commented Dec 2, 2019

@hay-k Thanks for the submission and clarifying this! This is really helpful to get cleared up and make sure people are using pubchem in a way that is reasonable. I've only requested minor changes to make.

@ncrubin ncrubin self-requested a review December 2, 2019 03:36
@hay-k
Copy link
Contributor Author

hay-k commented Dec 5, 2019

Hi @ncrubin, I have commited the changes.

Copy link
Collaborator

@ncrubin ncrubin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covering the last line with a test and then we'll be done!

pubchempy_molecule = pubchempy.get_compounds(name, 'name',
record_type='2d')
else:
raise ValueError('Incorrect value for the argument structure=%s' % structure)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this line is not covered by a test. You can construct this test by mocking a get_compounds method from pubchempy which returns something other than ['2d', '3d', None] and then checking the ValueError with pytest.raises.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought...I'll just merge this now. Coveralls doesn't seem to mind. I'll add a test to this later.

@ncrubin ncrubin merged commit 6bba343 into quantumlib:master Dec 15, 2019
philipp-q pushed a commit to philipp-q/OpenFermion that referenced this pull request Sep 2, 2020
…ubChem (quantumlib#544)

* implement the structure argument for geometry_from_pubchem

* add a test case for the argument 'structure' in geometry_from_pubchem

* add the molecule name to the error message

* explicitly check all scenarios in the if statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants