Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add files via upload #88

Merged
merged 37 commits into from
Apr 30, 2020
Merged

Conversation

b-shields
Copy link
Contributor

schema test - Ahneman Science 2018 - HTE entries for ~4000 experiments entered via csv file

schema test - Ahneman Science 2018 - HTE entries for ~4000 experiments
@skearnes
Copy link
Collaborator

Thanks Ben! I'm working on fixing the build_and_test workflow; GitHub seems to be having API issues today.

@skearnes
Copy link
Collaborator

A couple comments:

  • Please add a {{ badge }} field in a markdown cell in your notebook; this will be filled in by a link to open the notebook in colab.
  • Please run the notebook with the latest code from github; in particular, the output of validate_message has changed.
  • Please remove code that is commented out.
  • The build_and_test workflow currently runs all the cells in a notebook; it looks like this could take ~6k seconds for the one here. Is there a way to trim it down and say "this is what you'd do to get the whole thing..."?

@b-shields
Copy link
Contributor Author

No problem! I think the whole site was having issues yesterday.

I'll keep these in mind moving forward. For now:

  • Badge added.
  • I should have it set up to sync my fork now (I am pretty new to git).
  • Will do.
  • For this case I could just run a random sample of the experiments and add a comment. Thoughts?

Question: since the build_and_test workflow runs everything do I need to be careful about which packages I am using?

@skearnes
Copy link
Collaborator

  • For this case I could just run a random sample of the experiments and add a comment. Thoughts?

Yes, I think this would make sense.

Question: since the build_and_test workflow runs everything do I need to be careful about which packages I am using?

Possibly. If you're using packages that are not part of the standard library, you may need to pip install them in the notebook or try to avoid using them at all. To be clear, I'm less concerned about the fact that you're using additional packages than I am about making sure that the notebooks run properly in the tests :)

@b-shields
Copy link
Contributor Author

Yeah that makes sense.

Unfortunately the notebook also isn't running on my system now. After updating I am getting an import error:

ImportError: cannot import name '_message' from 'google.protobuf.pyext' (C:\Users\Ben\Anaconda3\envs\ord\lib\site-packages\google\protobuf\pyext\__init__.py)

Some googling suggests that this could be a python 3.7 issue. I built my conda env from the rdkit base. What env are you working in? Do you think it would be helpful to include a .yml to set up a conda env for ord?

@skearnes
Copy link
Collaborator

Yeah that makes sense.

Unfortunately the notebook also isn't running on my system now. After updating I am getting an import error:

ImportError: cannot import name '_message' from 'google.protobuf.pyext' (C:\Users\Ben\Anaconda3\envs\ord\lib\site-packages\google\protobuf\pyext\__init__.py)

Some googling suggests that this could be a python 3.7 issue. I built my conda env from the rdkit base. What env are you working in? Do you think it would be helpful to include a .yml to set up a conda env for ord?

Arg. This might depend on your install order (see here). Maybe try following the order from the actions? Take a look here where we are using python 3.7.

@skearnes
Copy link
Collaborator

I'm also going to see if I can get rid of that import, since it might cause problems for others.

@skearnes skearnes mentioned this pull request Apr 22, 2020
@b-shields
Copy link
Contributor Author

Yup pip install --upgrade --force-reinstall protobuf did the trick!

@b-shields
Copy link
Contributor Author

Ok it looks like the notebook ran properly :).

@skearnes
Copy link
Collaborator

There's a bunch of build/lib files in the PR now that I don't think you intended to add to git?

@skearnes
Copy link
Collaborator

Quick tip: you can include multiple files in git commits, and you can also make multiple commits before pushing to GitHub.

@b-shields
Copy link
Contributor Author

I see what happened here. I pushed from ord-shema which included the build/lib stuff and mistakenly thought that it would only pull the example specified from the initial request. Hopefully it is all squared away now.

Thanks for the tip! This has ended up being a much needed git crash course 👍 .

Copy link
Collaborator

@skearnes skearnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the *checkpoint file from the PR.

@connorcoley
Copy link
Collaborator

I've been investigating the slow validation step, and it seems like there might be some bugs related to how you're doing the stock solutions. The actual reason it's taking so long is because you're ending up with a lot of compounds where the only identifier is a name, so the validation script uses the PubChem resolver to get a SMILES.

In your final timing loop, I interrupted and inspected the reaction object after the catalyst.mix:

reaction = reaction_pb2.Reaction()
reaction.identifiers.add(value=r'Buchwald-Hartwig Amination', type='NAME')

catalyst = stock_solution(reaction, r'Pd precatalyst in DMSO')
catalyst.add_solute('CATALYST', lig_n, SMILES=lig_s)
catalyst.add_solvent(r'DMSO', SMILES=r'O=S(C)C', volume_liters=200e-9)
catalyst.mix(concentration_molar=0.05)
print(reaction)

yields

identifiers {
  type: NAME
  value: "Buchwald-Hartwig Amination"
}
inputs {
  key: "Pd precatalyst in DMSO"
  value {
    components {
      identifiers {
        type: NAME
        value: "X-Phos"
      }
      identifiers {
        type: SMILES
        value: "CC(C)C1=CC(C(C)C)=CC(C(C)C)=C1C2=C(P(C3CCCCC3)C4CCCCC4)C=CC=C2"
      }
      identifiers {
        type: SMILES
        value: "O=S(C)C"
      }
      moles {
        units: MOLES
      }
      reaction_role: CATALYST
      preparation {
        type: NONE
      }
    }
    components {
      identifiers {
        type: NAME
        value: "DMSO"
      }
      volume {
        units: LITER
      }
      reaction_role: SOLVENT
      preparation {
        type: NONE
      }
    }
  }
}

@connorcoley
Copy link
Collaborator

connorcoley commented Apr 22, 2020

Just to follow-up, I think there are two issues:

  1. In your add_solvent method, your line to add a SMILES identifier is assigning the value to self.solute instead of self.solvent
  2. The moles and volume are all being cast as integers, which sets them to zero. That's why they don't appear when you print out the reaction message. Since the amounts are so small, you'll need to scale them up so they can be printed as a float string for the unit resolver, e.g.,
def mix(self, concentration_molar=0):
        """Mix function resolves moles and volume from availible information (concentration, moles, volume)"""
        
        self.concentration = concentration_molar
        
        # Resolve concentration
        if self.moles > 0 and self.volume > 0:
            self.solute.moles.CopyFrom(unit_resolver.resolve(f'{self.moles*(10**6):16f} umol'))
            self.solvent.volume.CopyFrom(unit_resolver.resolve(f'{self.volume*(10**6):16f} uL'))
        elif self.concentration > 0 and self.volume > 0:
            self.moles = self.concentration * self.volume
            self.solute.moles.CopyFrom(unit_resolver.resolve(f'{self.moles*(10**6):16f} umol'))
            self.solvent.volume.CopyFrom(unit_resolver.resolve(f'{self.volume*(10**6):16f} uL'))

Fixing the first issue makes validation very quick

@b-shields
Copy link
Contributor Author

I've been investigating the slow validation step, and it seems like there might be some bugs related to how you're doing the stock solutions. The actual reason it's taking so long is because you're ending up with a lot of compounds where the only identifier is a name, so the validation script uses the PubChem resolver to get a SMILES.

I see. I did notice these lines in the output:

identifiers { type: SMILES details: "NAME resolved by PubChem" value: "CS(=O)C" }

There it is in the add_solvent function.....

if SMILES != None: self.solute.identifiers.add(value=SMILES, type='SMILES')

solute not solvent.

So that's how long it takes to resolve DMSO ~4000 times.

@b-shields
Copy link
Contributor Author

Sorry I didn't see your comment above. Gotcha.

@skearnes
Copy link
Collaborator

OK, I think this is ready to go in? I'll let you merge in case you want to change anything else before merging. Thanks!

@skearnes skearnes mentioned this pull request Apr 27, 2020
@skearnes
Copy link
Collaborator

Hey Ben, can you check that actions are enabled on your repository so the colab badge updater will run? They should be on by default, but I don't see actions listed on the "Actions" tab for your repo.

https://help.github.com/en/actions/getting-started-with-github-actions/about-github-actions#disabling-or-limiting-github-actions-for-your-repository-or-organization

@b-shields
Copy link
Contributor Author

Yup, workflows were disabled. Should be all good now!

@b-shields
Copy link
Contributor Author

OK, I think this is ready to go in? I'll let you merge in case you want to change anything else before merging. Thanks!

Everything looks good to me. I am seeing that it can be automatically merged but no option to merge.

@skearnes
Copy link
Collaborator

Thanks; I'm going to try to trigger the action to get the badges in the notebook, and then I'll merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants