-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to retrieve specific types of structure info from PubChem #544
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@hay-k Thanks for the submission and clarifying this! This is really helpful to get cleared up and make sure people are using pubchem in a way that is reasonable. I've only requested minor changes to make. |
Hi @ncrubin, I have commited the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covering the last line with a test and then we'll be done!
pubchempy_molecule = pubchempy.get_compounds(name, 'name', | ||
record_type='2d') | ||
else: | ||
raise ValueError('Incorrect value for the argument structure=%s' % structure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this line is not covered by a test. You can construct this test by mocking a get_compounds method from pubchempy which returns something other than ['2d', '3d', None] and then checking the ValueError with pytest.raises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought...I'll just merge this now. Coveralls doesn't seem to mind. I'll add a test to this later.
…ubChem (quantumlib#544) * implement the structure argument for geometry_from_pubchem * add a test case for the argument 'structure' in geometry_from_pubchem * add the molecule name to the error message * explicitly check all scenarios in the if statement
Currently, the function geometry_from_pubchem in utils/_pubchem.py/ constructs the geometry of a molecule based on 3D structure info if available, otherwise the 2D structure info is used. If both 3D and 2D info are not available in PubChem, None is returned.
In this pull request I have implemented a new argument for this functiion: structure. With this argument users of the function will be able to request only 3D or only 2D information. The default behavior of the function is unchanged, i.e. if the 'structure' argument is not specified, the function will behave as before this pull request, for backwards compatibility.
The reason for the implementation of this argument is, that for a bunch of molecules, the 2D structure information has nothing to do with the actual geometric sizes of the molecule. And, when 3D is not available (e.g for H2), the default behavior will return incorrect geometry based on the non-accurate 2D structure info. It is sometimes better to return nothing, than return incorrect data. With this new option users will be able to force the function to use e.g. only 3D structure info.
PubChem support confirms, that the 2D structure info is not guaranteed to reflect actual information about molecules. Here is a snipped from their reply:
My full email conversation with PubChem support is attached!