Add files via upload #88

b-shields · 2020-04-21T15:30:19Z

schema test - Ahneman Science 2018 - HTE entries for ~4000 experiments entered via csv file

schema test - Ahneman Science 2018 - HTE entries for ~4000 experiments

skearnes · 2020-04-21T21:21:47Z

Thanks Ben! I'm working on fixing the build_and_test workflow; GitHub seems to be having API issues today.

skearnes · 2020-04-21T21:29:03Z

A couple comments:

Please add a {{ badge }} field in a markdown cell in your notebook; this will be filled in by a link to open the notebook in colab.
Please run the notebook with the latest code from github; in particular, the output of validate_message has changed.
Please remove code that is commented out.
The build_and_test workflow currently runs all the cells in a notebook; it looks like this could take ~6k seconds for the one here. Is there a way to trim it down and say "this is what you'd do to get the whole thing..."?

b-shields · 2020-04-22T16:34:44Z

No problem! I think the whole site was having issues yesterday.

I'll keep these in mind moving forward. For now:

Badge added.
I should have it set up to sync my fork now (I am pretty new to git).
Will do.
For this case I could just run a random sample of the experiments and add a comment. Thoughts?

Question: since the build_and_test workflow runs everything do I need to be careful about which packages I am using?

skearnes · 2020-04-22T16:55:04Z

For this case I could just run a random sample of the experiments and add a comment. Thoughts?

Yes, I think this would make sense.

Question: since the build_and_test workflow runs everything do I need to be careful about which packages I am using?

Possibly. If you're using packages that are not part of the standard library, you may need to pip install them in the notebook or try to avoid using them at all. To be clear, I'm less concerned about the fact that you're using additional packages than I am about making sure that the notebooks run properly in the tests :)

experiments for build_and_test

b-shields · 2020-04-22T18:24:00Z

Yeah that makes sense.

Unfortunately the notebook also isn't running on my system now. After updating I am getting an import error:

ImportError: cannot import name '_message' from 'google.protobuf.pyext' (C:\Users\Ben\Anaconda3\envs\ord\lib\site-packages\google\protobuf\pyext\__init__.py)

Some googling suggests that this could be a python 3.7 issue. I built my conda env from the rdkit base. What env are you working in? Do you think it would be helpful to include a .yml to set up a conda env for ord?

skearnes · 2020-04-22T18:30:07Z

Yeah that makes sense.

Unfortunately the notebook also isn't running on my system now. After updating I am getting an import error:

ImportError: cannot import name '_message' from 'google.protobuf.pyext' (C:\Users\Ben\Anaconda3\envs\ord\lib\site-packages\google\protobuf\pyext\__init__.py)

Some googling suggests that this could be a python 3.7 issue. I built my conda env from the rdkit base. What env are you working in? Do you think it would be helpful to include a .yml to set up a conda env for ord?

Arg. This might depend on your install order (see here). Maybe try following the order from the actions? Take a look here where we are using python 3.7.

skearnes · 2020-04-22T18:36:55Z

I'm also going to see if I can get rid of that import, since it might cause problems for others.

b-shields · 2020-04-22T18:47:40Z

Yup pip install --upgrade --force-reinstall protobuf did the trick!

b-shields · 2020-04-22T18:54:04Z

Ok it looks like the notebook ran properly :).

skearnes · 2020-04-22T20:00:10Z

There's a bunch of build/lib files in the PR now that I don't think you intended to add to git?

skearnes · 2020-04-22T20:36:01Z

Quick tip: you can include multiple files in git commits, and you can also make multiple commits before pushing to GitHub.

b-shields · 2020-04-22T20:55:34Z

I see what happened here. I pushed from ord-shema which included the build/lib stuff and mistakenly thought that it would only pull the example specified from the initial request. Hopefully it is all squared away now.

Thanks for the tip! This has ended up being a much needed git crash course 👍 .

skearnes

Please remove the *checkpoint file from the PR.

connorcoley · 2020-04-22T21:58:53Z

I've been investigating the slow validation step, and it seems like there might be some bugs related to how you're doing the stock solutions. The actual reason it's taking so long is because you're ending up with a lot of compounds where the only identifier is a name, so the validation script uses the PubChem resolver to get a SMILES.

In your final timing loop, I interrupted and inspected the reaction object after the catalyst.mix:

reaction = reaction_pb2.Reaction()
reaction.identifiers.add(value=r'Buchwald-Hartwig Amination', type='NAME')

catalyst = stock_solution(reaction, r'Pd precatalyst in DMSO')
catalyst.add_solute('CATALYST', lig_n, SMILES=lig_s)
catalyst.add_solvent(r'DMSO', SMILES=r'O=S(C)C', volume_liters=200e-9)
catalyst.mix(concentration_molar=0.05)
print(reaction)

yields

identifiers {
  type: NAME
  value: "Buchwald-Hartwig Amination"
}
inputs {
  key: "Pd precatalyst in DMSO"
  value {
    components {
      identifiers {
        type: NAME
        value: "X-Phos"
      }
      identifiers {
        type: SMILES
        value: "CC(C)C1=CC(C(C)C)=CC(C(C)C)=C1C2=C(P(C3CCCCC3)C4CCCCC4)C=CC=C2"
      }
      identifiers {
        type: SMILES
        value: "O=S(C)C"
      }
      moles {
        units: MOLES
      }
      reaction_role: CATALYST
      preparation {
        type: NONE
      }
    }
    components {
      identifiers {
        type: NAME
        value: "DMSO"
      }
      volume {
        units: LITER
      }
      reaction_role: SOLVENT
      preparation {
        type: NONE
      }
    }
  }
}

connorcoley · 2020-04-22T22:11:37Z

Just to follow-up, I think there are two issues:

In your add_solvent method, your line to add a SMILES identifier is assigning the value to self.solute instead of self.solvent
The moles and volume are all being cast as integers, which sets them to zero. That's why they don't appear when you print out the reaction message. Since the amounts are so small, you'll need to scale them up so they can be printed as a float string for the unit resolver, e.g.,

def mix(self, concentration_molar=0):
        """Mix function resolves moles and volume from availible information (concentration, moles, volume)"""
        
        self.concentration = concentration_molar
        
        # Resolve concentration
        if self.moles > 0 and self.volume > 0:
            self.solute.moles.CopyFrom(unit_resolver.resolve(f'{self.moles*(10**6):16f} umol'))
            self.solvent.volume.CopyFrom(unit_resolver.resolve(f'{self.volume*(10**6):16f} uL'))
        elif self.concentration > 0 and self.volume > 0:
            self.moles = self.concentration * self.volume
            self.solute.moles.CopyFrom(unit_resolver.resolve(f'{self.moles*(10**6):16f} umol'))
            self.solvent.volume.CopyFrom(unit_resolver.resolve(f'{self.volume*(10**6):16f} uL'))

Fixing the first issue makes validation very quick

b-shields · 2020-04-22T22:16:34Z

I've been investigating the slow validation step, and it seems like there might be some bugs related to how you're doing the stock solutions. The actual reason it's taking so long is because you're ending up with a lot of compounds where the only identifier is a name, so the validation script uses the PubChem resolver to get a SMILES.

I see. I did notice these lines in the output:

identifiers { type: SMILES details: "NAME resolved by PubChem" value: "CS(=O)C" }

There it is in the add_solvent function.....

if SMILES != None: self.solute.identifiers.add(value=SMILES, type='SMILES')

solute not solvent.

So that's how long it takes to resolve DMSO ~4000 times.

b-shields · 2020-04-22T22:17:53Z

Sorry I didn't see your comment above. Gotcha.

… comments

skearnes · 2020-04-27T15:00:31Z

OK, I think this is ready to go in? I'll let you merge in case you want to change anything else before merging. Thanks!

skearnes · 2020-04-29T18:24:09Z

Hey Ben, can you check that actions are enabled on your repository so the colab badge updater will run? They should be on by default, but I don't see actions listed on the "Actions" tab for your repo.

https://help.github.com/en/actions/getting-started-with-github-actions/about-github-actions#disabling-or-limiting-github-actions-for-your-repository-or-organization

b-shields · 2020-04-29T20:13:32Z

Yup, workflows were disabled. Should be all good now!

b-shields · 2020-04-29T20:40:52Z

OK, I think this is ready to go in? I'll let you merge in case you want to change anything else before merging. Thanks!

Everything looks good to me. I am seeing that it can be automatically merged but no option to merge.

skearnes · 2020-04-29T23:48:35Z

Thanks; I'm going to try to trigger the action to get the badges in the notebook, and then I'll merge.

Add files via upload

66a272c

schema test - Ahneman Science 2018 - HTE entries for ~4000 experiments

Added colab badge, removed commented code, and ran a subset of

5576acf

experiments for build_and_test

Rebuilt ord env and ran notebook

597c8fa

skearnes mentioned this pull request Apr 22, 2020

Remove _message imports #99

Merged

b-shields added 16 commits April 22, 2020 16:26

Delete message_helpers.py

beb147a

Delete __init__.py

5e3dc90

Delete message_helpers_test.py

e815477

Delete bq_schema.py

7ca72d0

Delete __init__.py

843b64f

Delete bq_schema_test.py

9aaeeb5

Delete json_schema.py

699a87a

Delete json_schema_test.py

75d5388

Delete proto_to_json.py

cb32cbd

Delete proto_to_json_test.py

4c365af

Delete reaction_pb2_test.py

0876c80

Delete units.py

ace362f

Delete units_test.py

4293d74

Delete validate_reactions.py

358a88c

Delete validate_reactions_test.py

5d67825

Delete validations.py

d7dd69f

Delete top_level.txt

92ef104

b-shields added 2 commits April 22, 2020 16:42

Delete ord_schema-0.1-py3.7.egg

c6456e8

Merge branch 'master' into master

f3bccd1

skearnes requested changes Apr 22, 2020

View reviewed changes

Delete example_Ahenman-checkpoint.ipynb

88022ce

b-shields and others added 3 commits April 23, 2020 12:20

Added changes suggested in pull/88#issuecomment-618067295 and removed…

e7eabd3

… comments

Merge branch 'master' into master

be23bd5

Merge branch 'master' into master

d083118

skearnes approved these changes Apr 27, 2020

View reviewed changes

skearnes mentioned this pull request Apr 27, 2020

Update colab_badges.yml #114

Merged

skearnes added 4 commits April 27, 2020 10:41

Merge branch 'master' into master

fd4bfa1

Merge branch 'master' into master

6cd93c6

Merge branch 'master' into master

eb4f42c

Merge branch 'master' into master

17a52ac

skearnes and others added 3 commits April 29, 2020 17:01

install ord_schema if needed

6004153

Add/Update Colab badges

18bfe86

trigger workflow

e8c2335

skearnes merged commit 11fc144 into open-reaction-database:master Apr 30, 2020

skearnes mentioned this pull request Oct 3, 2020

Add a script for generating a data table with names, SMILES, yields, etc. doylelab/rxnpredict#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add files via upload #88

Add files via upload #88

b-shields commented Apr 21, 2020

skearnes commented Apr 21, 2020

skearnes commented Apr 21, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 22, 2020

skearnes commented Apr 22, 2020

b-shields commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 22, 2020

skearnes commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes left a comment

connorcoley commented Apr 22, 2020

connorcoley commented Apr 22, 2020 •

edited

Loading

b-shields commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 27, 2020

skearnes commented Apr 29, 2020

b-shields commented Apr 29, 2020

b-shields commented Apr 29, 2020

skearnes commented Apr 29, 2020

Add files via upload #88

Add files via upload #88

Conversation

b-shields commented Apr 21, 2020

skearnes commented Apr 21, 2020

skearnes commented Apr 21, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 22, 2020

skearnes commented Apr 22, 2020

b-shields commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 22, 2020

skearnes commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes left a comment

Choose a reason for hiding this comment

connorcoley commented Apr 22, 2020

connorcoley commented Apr 22, 2020 • edited Loading

b-shields commented Apr 22, 2020

b-shields commented Apr 22, 2020

skearnes commented Apr 27, 2020

skearnes commented Apr 29, 2020

b-shields commented Apr 29, 2020

b-shields commented Apr 29, 2020

skearnes commented Apr 29, 2020

connorcoley commented Apr 22, 2020 •

edited

Loading