Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open an sdf file #154

Open
maartensandbox opened this issue Oct 23, 2023 · 10 comments
Open

open an sdf file #154

maartensandbox opened this issue Oct 23, 2023 · 10 comments

Comments

@maartensandbox
Copy link

It is not clear to me how I can generally set up a working system. As a simple example, it is not possible to set up a system starting from an sdf file:

CT1000292221


  3  2  0  0  0               999 V2000
    0.0021   -0.0041    0.0020 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0110    0.9628    0.0073 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.8669    1.3681    0.0011 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
M  END
$$$$

System("water.sdf",ff)

straight up fails.

Is there somewhere an example where I start with a practical system (say a compound from https://www.rcsb.org ) and set up an MD simulation?

@jgreener64
Copy link
Collaborator

We use Chemfiles for the file reading and residues are not currently supported there for SDF (https://chemfiles.org/chemfiles/latest/formats.html#list-of-supported-formats), so the setup fails.

If you want to read in data from the PDB you can use PDB format, see the example at https://juliamolsim.github.io/Molly.jl/stable/docs/#Simulating-a-protein, though currently atom names must exactly match the residue templates. This is a topic of future work.

@maartensandbox
Copy link
Author

And with a PDB file, how can I make the atom names match exactly while also indicating that multiple hydrogen atoms are connected to the same molecule? For example, if we follow the last example from https://www.biostat.jhsph.edu/~iruczins/teaching/260.655/links/pdbformat.pdf we see that certain hydrogen atoms will be named as "1HG1". I take it that's not supported, and I should manually place the atoms and fix up the topology? Is there yet another format that is better supported? Do hydrogen atoms follow a different convention in molly.jl?

@jgreener64
Copy link
Collaborator

The residue template in the force field xml file defines the atom names (which currently must be matched exactly) and the connectivity between them. For example for alanine in ff99SBildn.xml:

    <Residue name="ALA">
      <Atom name="N" type="N" charge="-0.4157"/>
      <Atom name="H" type="H" charge="0.2719"/>
      <Atom name="CA" type="CT" charge="0.0337"/>
      <Atom name="HA" type="H1" charge="0.0823"/>
      <Atom name="CB" type="CT" charge="-0.1825"/>
      <Atom name="HB1" type="HC" charge="0.0603"/>
      <Atom name="HB2" type="HC" charge="0.0603"/>
      <Atom name="HB3" type="HC" charge="0.0603"/>
      <Atom name="C" type="C" charge="0.5973"/>
      <Atom name="O" type="O" charge="-0.5679"/>
      <Bond atomName1="N" atomName2="H"/>
      <Bond atomName1="N" atomName2="CA"/>
      <Bond atomName1="CA" atomName2="HA"/>
      <Bond atomName1="CA" atomName2="CB"/>
      <Bond atomName1="CA" atomName2="C"/>
      <Bond atomName1="CB" atomName2="HB1"/>
      <Bond atomName1="CB" atomName2="HB2"/>
      <Bond atomName1="CB" atomName2="HB3"/>
      <Bond atomName1="C" atomName2="O"/>
      <ExternalBond atomName="N"/>
      <ExternalBond atomName="C"/>
    </Residue>

The PDB file would have to have ALA as the residue name and every atom name appearing exactly once for a residue to match this template. Future work will allow flexibility in residue names and atom names by searching for the best template.

At the minute you are probably best off removing all hydrogens and adding them back with OpenMM, which will give atom names consistent with OpenMM XML force field files. All residues will need a template available, though that is the case with most software.

@maartensandbox
Copy link
Author

I think I correctly implemented alanine for use with ff99

ATOM      1  H1  ACE     1       2.000   1.000  -0.000
ATOM      2  CH3 ACE     1       2.000   2.090   0.000
ATOM      3  H2  ACE     1       1.486   2.454   0.890
ATOM      4  H3  ACE     1       1.486   2.454  -0.890
ATOM      5  C   ACE     1       3.427   2.641  -0.000
ATOM      6  O   ACE     1       4.391   1.877  -0.000
ATOM      7  N   ALA     2       3.555   3.970  -0.000
ATOM      8  H   ALA     2       2.733   4.556  -0.000
ATOM      9  CA  ALA     2       4.853   4.614  -0.000
ATOM     10  HA  ALA     2       5.408   4.316   0.890
ATOM     11  CB  ALA     2       5.661   4.221  -1.232
ATOM     12 HB1  ALA     2       5.123   4.521  -2.131
ATOM     13 HB2  ALA     2       6.630   4.719  -1.206
ATOM     14 HB3  ALA     2       5.809   3.141  -1.241
ATOM     15  C   ALA     2       4.713   6.129   0.000
ATOM     16  O   ALA     2       3.601   6.653   0.000
ATOM     17  N   NME     3       5.846   6.835   0.000
ATOM     18  H   NME     3       6.737   6.359  -0.000
ATOM     19  CH3 NME     3       5.846   8.284   0.000
ATOM     20 HH31 NME     3       4.819   8.648   0.000
ATOM     21 HH32 NME     3       6.360   8.648   0.890
ATOM     22 HH33 NME     3       6.360   8.648  -0.890
TER
END

However, when I create the system and look at the interactions:
sys = System("alanine.pdb",ff,boundary = CubicBoundary(4.0u"nm"), loggers=(coords=CoordinateLogger(50),),rename_terminal_res = false)

then I only find sys.specific_inter_lists to contain interactions between atoms in the ALA group. Shouldn't there also be a harmonicbond force between for example the C and O atoms in the ACE group?

@jgreener64
Copy link
Collaborator

Yes there should, I think this a problem with Chemfiles only reading bonds for non-standard residues when there is a CONECT record in the PDB file. I can add a mention to the docs, but in general non-standard residues (including ACE/NME terminal caps) aren't tested yet.

See https://github.com/noeblassel/SINEQSummerSchool2023/blob/main/notebooks/dipeptide_nowater.pdb for an alanine dipeptide that reads in okay:

using Molly
ff_dir = joinpath(dirname(pathof(Molly)), "..", "data", "force_fields")
ff = MolecularForceField(joinpath.(ff_dir, ["ff99SBildn.xml", "tip3p_standard.xml"])...)
sys = System("dipeptide_nowater.pdb", ff; rename_terminal_res=false)

@maartensandbox
Copy link
Author

One more issue:

using Molly
ff = MolecularForceField("ff14SB.xml");

sys = System("big_alanine.pdb",ff,boundary = CubicBoundary(1000.0u"nm"),rename_terminal_res = false)
total_energy(sys,find_neighbors(sys))

prints 1128.45113918646 kJ mol⁻¹

the python equivalent

import openmm
from openmm import app, unit
forcefield = app.ForceField('amber/ff14SB.xml')
f = app.PDBFile('big_alanine.pdb')
system = forcefield.createSystem(f.getTopology(), nonbondedMethod=app.NoCutoff)
integrator = openmm.LangevinMiddleIntegrator(310 * unit.kelvin,1.0 / unit.picosecond,1 * unit.femtosecond)
sim = app.Simulation(f.getTopology(), system, integrator)
sim.context.setPositions(f.getPositions())
state = sim.context.getState(getEnergy=True)
state.getPotentialEnergy()

prints Quantity(value=547.6336059570312, unit=kilojoule/mole)

Other quantities are also incorrect, so presumably my file is still being read in incorrectly? Or was a different convention used between openmm and molly?

big_alanine.pdb.txt

@jgreener64
Copy link
Collaborator

It seems okay, the equivalent OpenMM call would be:

import openmm
from openmm import app, unit

forcefield = app.ForceField('amber/ff14SB.xml')
f = app.PDBFile('big_alanine.pdb')

system = forcefield.createSystem(
    f.getTopology(),
    nonbondedMethod=app.CutoffPeriodic,
    nonbondedCutoff=1*unit.nanometer,
    constraints=None,
    rigidWater=False,
    switchDistance=None,
    useDispersionCorrection=False,
)

integrator = openmm.LangevinMiddleIntegrator(310 * unit.kelvin,1.0 / unit.picosecond,1 * unit.femtosecond)
sim = app.Simulation(f.getTopology(), system, integrator)
sim.context.setPositions(f.getPositions())
state = sim.context.getState(getEnergy=True)
state.getPotentialEnergy()
Quantity(value=1128.4508056640625, unit=kilojoule/mole)

Which is much better, there is still a 3E-4 kJ/mol discrepancy.

Note you need a line like

CRYST1  100.000  100.000  100.000  90.00  90.00  90.00 P 1           1

in the PDB file for CutoffPeriodic to work.

@maartensandbox
Copy link
Author

that clarifies some stuff! I'm really struggling with basic things but from testing it looks like molly is at least a factor 10 faster than openmm (for my admittedly weird use case).

@jgreener64
Copy link
Collaborator

Cool, I'd be interested to hear roughly what you are doing if you are able to share, we might be able to add more features and docs to help.

Btw make sure you are on Molly.jl v0.18.2 or later as that version had a significant performance improvement for periodic boundary conditions.

@maartensandbox
Copy link
Author

The plan is to wrap it up in a paper (though the draft is progressing slowly), so molly will definitely be cited! We really only need a way to evaluate energies and manipulate positions. Molly is nice because unlike pythonic packages that bind to optimized C code, I can directly profile and manipulate the system structure.

At the moment I'm only interested in all-to-all, but there are two other projects I want to try molly on, one which requires mixing periodic boundary conditions in one direction with open boundary conditions in another, and another where I'll be using conventional periodic boundary conditions. Thanks a lot for the hard work that went into the package! It was bit rough to get started simulating my first molecule, but after that it's really only been smooth sailing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants