Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-terminal residue added with wrong chirality and/or a cis-peptide bond #145

Open
rafwiewiora opened this issue Jan 10, 2017 · 29 comments
Open

Comments

@rafwiewiora
Copy link

rafwiewiora commented Jan 10, 2017

We've been on a chirality and cis-peptide mistake hunting recently, after we discovered that Ensembler was causing these in some simulations. I also just looked through some of my manually set up simulations and discovered that in some cases pdbfixer had modeled in the N-terminal residue with the wrong chirality and with a cis-peptide bond to the next residue.

Here's an example (turns out I had pdbfixer 1.2 installed when I'd run this):
(pdbfixer 1.2 py27_1 omnia)

Starting structure: 2BQZ

Code:

import pdbfixer
from simtk.openmm.app import PDBFile

fixer_SET8 = pdbfixer.PDBFixer('pdb2bqz.ent')
fixer_SET8.removeChains([1,2,3,4,5,6,7,8,9])
fixer_SET8.findMissingResidues()
fixer_SET8.missingResidues = {(0,0): ['SER']}
fixer_SET8.findMissingAtoms()
fixer_SET8.addMissingAtoms()
PDBFile.writeFile(fixer_SET8.topology, fixer_SET8.positions, open('SET8_noH.pdb', 'w'))

Output: (SER0 as spheres, ARG1 as sticks, note both the wrong chirality of SER0 and cis-peptide bond SER0 - ARG1)

2bqz_pdbfixer_wrong_chir_cispept

Reproducing on the newest pdbfixer:
(pdbfixer 1.3.1 py35_1 omnia)

I re-run the above code 10 times. We have:

  • all good: 2/10
  • only chirality wrong: 3/10
  • only cis-peptide: 3/10
  • both chirality wrong and cis-peptide: 2/10

We check for these problems using the VMD plugins: http://www.ks.uiuc.edu/Research/vmd/plugins/chirality/ http://www.ks.uiuc.edu/Research/vmd/plugins/cispeptide/

@maxentile has written code to do this in MDTraj, I just need to write a few tests and we will contribute.

I wonder if modeller.addHydrogens() would benefit from at least adding chirality checks for the residues we protonate or even a chirality repair code? Currently the wrong chirality just goes through no problem (as does a cis-peptide):

2bqz_pdbfixer_wrongchir_cispept_proton

@jchodera will be interested

@peastman
Copy link
Member

I tried running your script on that PDB structure, and didn't get anything nearly as sensible as you described. Just a mangled residue sitting right on top of another residue. From your description, I thought we might be doing something subtly wrong. But there's nothing subtle about this; it's just completely failing. So that's a relief. :)

To understand what's going on, you need to know a bit about how PDBFixer adds missing residues, and how not at all sophisticated it is about it. It just has a template for every amino acid, a PDB structure giving one plausible conformation for it. And it drops copies of those templates down in a line sticking out from the protein. They don't even connect up to each other or anything. Then it does an energy minimization with a soft core force field, and hopes that will somehow manage to transform this into something physically plausible.

Amazingly, that actually works a lot of the time. As long as the starting conformation isn't too horrible, the minimization will fix all the problems and give you a reasonable structure. In this case, "horrible" doesn't mean "unrealistic", because it's almost always unrealistic. It just means there aren't any huge barriers blocking it from getting to a realistic configuration. If the residues are reasonably spaced out through empty space, it usually works.

So it has to figure out where to put the new residues to give the minimizer the best shot. It does a not-at-all-sophisticated analysis of the density of atoms in the local neighborhood, and tries to figure out the best direction to use.

So why's it failing in this case? The most terminal residue present in the file is the ARG, whose side chain points directly outward from the main body of the protein. Now you ask it to add one more residue to the chain. So it looks around and concludes, "The best direction for me to add new residues in is away from the main body of the protein." Unfortunately, in this case, that puts it right on top of the ARG side chain. The odds of the minimizer being able to recover from that are unfortunately fairly low.

@jchodera
Copy link
Member

It seems like the most sensible thing to do in this case is to raise an exception and fail if we can't recover from it. Otherwise, we run the risk of producing a file that is undetected and people simulate happily away without knowing something is wrong. While you ended up with something that was a mangled mess, on @rafwiewiora's system, we ended up with something subtly wrong, so the risk exists.

What about adding some basic checks afterwards, like

  • Are any of the atoms we added still clashing badly even after minimization?
  • Did any of the residues we added end up with the wrong chirality? (Harder to check for, but if you're willing to add mdtraj as a dependency, we can add chirality checking code to that.)

@MehtapIsik
Copy link

I am experiencing the same problem with PDBFixer. It is introducing cis-peptide bonds and residues with wrong chirality when it adds missing terminal residues.

I am using pdbfixer 1.4 py27_0 http://conda.binstar.org/omnia

The PDB structure I am working with is 1AO6. 4 residues from N-terminus and 3 residues from C-terminus are missing in the original structure.

I am using PDBFixer's Python API:

from simtk.openmm.app import PDBFile
import pdbfixer

def write_file(filename, contents):
    with open(filename, 'w') as outfile:
        outfile.write(contents)

def get_pdb_biological_unit(pdb_id):
    try:
        # python 2
        from urllib2 import urlopen
    except:
        # python 3
        from urllib.request import urlopen
    print('Retrieving PDB for %s...' % pdb_id)
    fullurl = 'http://www.rcsb.org/pdb/files/'+pdb_id+'.pdb1'
    f = urlopen(fullurl)
    result = f.read()
    result = result.decode('unicode_escape')
    result = result.replace('XXXX', pdb_id)
    return result

# Retrieve PDB from RCSB:
pdb_id = "1ao6"
path = "./"
pdb = get_pdb_biological_unit(pdb_id)
write_file('%s%s.pdb' %(path,pdb_id), pdb)
pdb_file = '%s%s.pdb' %(path,pdb_id)

# PDBFixer and write fixed file
fixer = pdbfixer.PDBFixer(pdb_file)
fixer.findMissingResidues()
print('Missing residues found by PDBfixer:', fixer.missingResidues)
fixer.findMissingAtoms() 
print('Missing terminal atoms found by PDBfixer:', fixer.missingTerminals)
fixer.addMissingAtoms()
fixer.removeHeterogens(False)
fixer.addMissingHydrogens(7.4)

filename = '%s_fixed.pdb'%(pdb_id)
PDBFile.writeFile(fixer.topology, fixer.positions, open(filename, 'w'))

When I check the resulting file with VMD's chirality and cispeptide plug-ins chirality error for residues 1, 2, 3, 4 and 583 are reported. Cis peptides were found between residues 1-2, 2-3, 3-4, 583-584. These residues are all added by PDBFixer.

@peastman
Copy link
Member

See the discussion above. Most likely it isn't adding them with the wrong chirality—it's totally failing to find a vaguely plausible conformation, and if you look at the structure you'll find mangled residues sitting on top of each other. The algorithm used by PDBFixer doesn't tend to fail in subtle ways. If it can't produce a reasonable conformation, it produces something severely broken that would blow up instantly if you tried to simulate it.

@jchodera
Copy link
Member

If it can't produce a reasonable conformation, it produces something severely broken that would blow up instantly if you tried to simulate it.

@rafwiewiora @MehtapIsik : Were you able to run a simulation with any of the chirality-flipped outputs?

@MehtapIsik
Copy link

I haven't run a simulation with these wrong-chirality outputs.

@peastman Do you think we should refrain from using PDBFixer when we get such problems when modeling missing terminal residues? I wonder if you have any suggestions to overcome this problem?

@peastman
Copy link
Member

The method used by PDBFixer to add missing residues isn't very sophisticated. It still manages to produce something reasonable in the vast majority of cases, but when you run into a case it can't handle, you need to try a different program that uses a more sophisticated algorithm.

@rafwiewiora
Copy link
Author

rafwiewiora commented Apr 20, 2017

If it can't produce a reasonable conformation, it produces something severely broken that would blow up instantly if you tried to simulate it.

@rafwiewiora @MehtapIsik : Were you able to run a simulation with any of the chirality-flipped outputs?

We were able to run simulations in at least 3 different systems, and because we didn't know that problem was there this had affected multiple milliseconds of data in fact. I think > 50% of my data where I had used PDBFixer in the setup is affected by this.

As for the blowing up - one of those systems (set up by Kyle) run ok with a 'normal' minimization and equilibration script, looking back at his scripts. Two others however needed a more involved minimization and equilibration scheme - e.g. a reaction-field minimization before switching to PME, or an annealing from 10K at very small time step. The confounding factor is we are always expecting big problems with minimization/equilibration because we have regions of the protein modeled in, high energy homology models, ligands modeled in or the multisite ion models. So you simply don't stop when your simulation blows up initially and you don't catch these problems.

I think this is very serious - I've been sitting on ms of broken data for various systems, now Mehtap discovers the same thing happens for her. I don't think people are even expecting this kind of problem from PDBFixer and so it might not be obvious that you have a broken model if it blows up - so you might just try to equilibrate it and go ahead. I think a chirality check in PDBFixer would be good - so that it doesn't even produce an output PDB if it discovers a problem.

@peastman
Copy link
Member

Please provide files that reproduce this problem then. The only one that's been posted so far is one that totally fails.

@rafwiewiora
Copy link
Author

My first post in here is one of those 3 cases I mentioned. Would you like the equilibration scripts too? Also just to clarify, you said in your first reply:

I tried running your script on that PDB structure, and didn't get anything nearly as sensible as you described. Just a mangled residue sitting right on top of another residue.

The pictures I posted up there only look nice because I made the residue that was modeled in small spheres to see anything, I would also describe it as a mangled mess on first look - so I'm not sure I necessarily had anything better looking than you.

For the other 2 - will provide shortly after I do an update of my stuff in my repo.

@jchodera
Copy link
Member

Tagging @rafwiewiora to provide files to reproduce this issue!

@zhang-ivy
Copy link
Contributor

@rafwiewiora and I have been trying to clean the SARS spike protein (so that we can homology model it to the 2019-nCoV spike protein) to simulate on folding at home, but we have found similar issues as mentioned above when capping with pdbfixer.

Summary of issues:

  • ACE has a very long N-C bond

  • Wrong chirality of two residues close to NME: C378 and N381

Code:

from simtk.openmm.app import PDBFile
import pdbfixer
fixer = pdbfixer.PDBFixer(filename="2ajf_rbd.pdb")
fixer.findMissingResidues()
# Add ACE or NME cap  
for chain in fixer.topology.chains():
    if chain.id == 'E':
        lastIndexInChain = [i for i, res in enumerate(chain.residues())][-1]
        fixer.missingResidues[(chain.index, lastIndexInChain + 1)] = ["NME"]
        fixer.missingResidues[(chain.index, 0)] = ["ACE"]
fixer.findMissingAtoms()
fixer.addMissingAtoms()
file = open("2ajf_rbd_capped.pdb", 'w')
PDBFile.writeFile(fixer.topology, fixer.positions, file, keepIds=True)
file.close()

Input file: 2ajf_rbd.pdb
Output capped file: 2ajf_rbd_capped.pdb
structures.zip

*I am usingpdbfixer 1.6

I think we should at the very least revisit checking for chirality/cis bond issues and warning the user of such issues before they move onto simulation (I can help look into how we would implement this). We might also discuss better ways for building in residues/caps (and mutating) in general, because these issues seem to keep resurfacing.
cc: @jchodera

@peastman
Copy link
Member

This is another case similar to the one above. It isn't just slightly wrong. It contains thoroughly messed up residues that will cause the simulation to explode as soon as you try to simulate it. Not ideal, but there's not much risk of someone not realizing they have a problem. Calling it "incorrect chirality" is like describing a wrecked car as "having a dent in the bumper"!

@zhang-ivy
Copy link
Contributor

I've equilibrated the structure above (2ajf_rbd_capped.pdb) and the simulation does not explode.

Another structure where I am seeing the same issues is attached here.
The original structure from rcsb is 5udc. This an antigen:antibody complex where the antigen is a trimer (chains F, A, D), and the antibody is composed of chains E and G. There is a long internal missing loop (> 40 residues) in chains F, A, D that I've removed (by splitting the chains) manually with my own code. Maybe splitting (and capping, if desired) chains at missing loops is a feature we could consider adding?

As input to pdbfixer, I'm providing the 5udc structure with chains split (chain F was split into F and X, chain A was split into A and Y, chain D was split into D and Z) -- 5udc_splitchain.pdb. The code I used to add missing residues is provided as clean.py. The output is 5udc_splitchain_clean.pdb. I then loaded this structure into vmd to check for maintenance of chirality and cis bonds. The cis bonds look fine.

Here are the issues I've found with pdbfixer:

  • Builds in Q26 in 5udc.F with wrong chirality (Note: I built the same missing residue for chains A, D and the chirality was not wrong)
  • Builds in S130 in 5udc.E with wrong chirality
  • Builds in T131 in 5udc.E with wrong chirality

Note: I equilibrated this structure for 50ns and nothing blew up.
Let me know if you want to see the trajectories of this structure or 2ajf_rbd_capped

5udc.zip

@peastman
Copy link
Member

What am I supposed to do with these files? They don't need any residues added to them, or even any heavy atoms. In your script, fixer.missingResidues.keys() is empty, so the loop does nothing.

@zhang-ivy
Copy link
Contributor

Sorry about that! I forgot to copy and paste the updated SEQRES to 5udc_splitchain. This is another issue we've been running into -- when pdbfixer faces a file that doesn't have a SEQRES (because it isn't a PDB directly from the rcsb), it doesn't issue a warning that the SEQRES is missing, so the user just assumes that no missing residues are present. I know that pdbfixer expects to manipulate raw rcsb pdb files, but since pdbfixer doesn't have a feature that allows me to split chains, I had to do this first using OpenMM's Topology, and I don't think writing the file using PDBFile allows me to keep SEQRES information.

5udc.zip

@peastman
Copy link
Member

#198 is an attempt at addressing this. Could you try it out and see whether it improves the behavior?

@jchodera
Copy link
Member

jchodera commented Mar 1, 2020

I think one of the major issues is that PDBFixer provides no warning when it may have inverted stereochemistry. In that sense, even an improved method for placing new residues will not be safe unless we also provide a way to check the stereochemistry and raise a warning or exception if this accidentally happens.

@peastman
Copy link
Member

peastman commented Mar 1, 2020

Do we have any evidence that it ever inverts stereochemistry while leaving everything else realistic? Or to use the same analogy as before, is that just the dent in the bumper of a car that has been completely wrecked and doesn't run?

@jchodera
Copy link
Member

jchodera commented Mar 1, 2020

Do we have any evidence that it ever inverts stereochemistry while leaving everything else realistic?

Considering that we don't have a test for that either, it doesn't matter, does it? Stereochemistry inversion is one of the easier ways to detect that something terrible has happened.

@jchodera
Copy link
Member

jchodera commented Mar 1, 2020

cc: @rafwiewiora @zhang-ivy to contribute more examples here.

@rafwiewiora
Copy link
Author

rafwiewiora commented Mar 5, 2020

Two general comments:

  1. You keep claiming, by visual inspection, that things are too bad to minimize and run. We wouldn't have to keep going in circles here, if you ever actually tried minimizing and running. In all cases you've claimed they wouldn't, we only post them here because they do minimize and run just fine.

  2. Even if you get things that look worse than what I had, as in the previous example, what does that change? (and I think you would get worse looking things if you tried running that a few times) The fact is that I have used pdbfixer to set up a structure, that minimized and equilibrated just fine, and I used it to collect milliseconds of data on Folding@home. It doesn't matter if you can reproduce that behavior or not, though I'm sure you can, the fact that it happened once and can happen again is a grave problem.

Do we have any evidence that it ever inverts stereochemistry while leaving everything else realistic?

I'm not sure how this matters -- as long as things minimize and run, and we don't have diagnostics and warnings to tell us to stop, that's the only thing that matters. Whatever exactly is broken, as long as something is, is not relevant.

  1. Just the fact that we can't do this right on any structure, no matter if it minimized or not, is a big problem. This is supposed to work for any structure, not just some easier ones.

Examples:

  1. The one that started this issue, like I said it run just fine and collected milliseconds, it's now part of a paper and all data is in this repo: https://osf.io/2h6p4/wiki/home/ -- we had to publish it because it was just the terminal residue and we already had the data, so again, this happened, one too many times at least.

  2. Here's another that also generated milliseconds on F@h, we just didn't end up using it in the paper: https://github.com/choderalab/pimento/tree/master/SETD8/setd8-p10478 -- pdbfixer script is https://github.com/choderalab/pimento/blob/master/SETD8/setd8-p10478/code/fix_pdb.py and output after equilibration, with wrong chirality on the terminus again: https://github.com/choderalab/pimento/blob/master/SETD8/setd8-p10478/equil/1ZKK_final_step.pdb

@rafwiewiora
Copy link
Author

For a more general test, I just went to the PDB and took the first 10 structures that show up for kinase "ERK2". I do:

import pdbfixer
from simtk.openmm.app import PDBFile
from glob import glob

files = glob('input/*.pdb')

for f in files:

    fixer = pdbfixer.PDBFixer(f)
    fixer.findMissingResidues()
    fixer.findMissingAtoms()
    fixer.removeHeterogens(False)
    fixer.addMissingAtoms()
    PDBFile.writeFile(fixer.topology, fixer.positions, open('{}'.format(f.split('/')[1]), 'w'))

-- this results in broken chirality for 3 out of 10 structures -- 30% failure rate on a random, general, problem.

One example, that minimizes just fine (pdb code 3c9w):

Minimizing energy...
  initial : 88073824.573 kcal/mol
  final   : -2619605.637 kcal/mol

has tens of wrong chirality residues:

image

image

image

@zhang-ivy
Copy link
Contributor

I tested #198 on 5udc.pdb, and am still seeing missing residues being built in mangled ways. See test_198.py in the attached files.

Additionally, I think it would be very helpful to users to check that atom/bond stereochemistry are biologically correct and disulfide bonds have been maintained, so I've written up some code to do that (check_stereo.py, and check_disulfides.py. The stereochemistry code relies on OpenEye, and I understand that not everyone has access to this, so it the user doesn't have a license, we could warn with something like "Warning: Unable to check atom/bond stereochemistry, please manually check, e.g. using VMD."

Note that these should be warnings, not errors, because the user may wish for the desired structure to contain D-amino acids.
Also note that I'm still working on the bond stereochemistry code -- I am having some issues with this.
to_upload.zip

@peastman
Copy link
Member

peastman commented Mar 6, 2020

Thanks! I'll take a look at 5udc and see if I can figure out what's happening.

The stereochemistry checks should be fairly simple geometric tests, so I think we can implement them ourselves instead of requiring users to have a commercial library.

@peastman
Copy link
Member

peastman commented Mar 6, 2020

Could you post the output of check_stereo.py? Since I don't have OEChem I'm trying to convert it to use RDKit instead. I want to make sure it matches your results.

@jchodera
Copy link
Member

jchodera commented Mar 7, 2020

Would it be easier to just implement the torsion checks as CustomTorsionForce or TorsionForce forces? For the standard amino acids, we can simply make a list of the four atom (names) involved in each improper torsion (the alpha carbon) or stereobond (the peptide bond), and then use these in a restraint as well. The OpenEye and RDKit code is going to be less useful in general since we can't easily deploy it without dependencies.

@peastman
Copy link
Member

peastman commented Mar 7, 2020

Yes, that's definitely what we'll want to do. I was just trying to reproduce the results from the script @zhang-ivy posted, and RDKit seemed like the easiest way to do it. But for adding restraints, torsions would be the cleanest way to do it.

@zhang-ivy
Copy link
Contributor

zhang-ivy commented Mar 8, 2020

The molecule is so mangled that OpenEye has trouble perceiving stereochemistry. Here is the output I get for check_stereo.py:

Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 1082 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 3321 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 3324 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 7904 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 8504 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 14704 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 14710 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 14727 of molecule ''
Warning: OE3DToAtomStereo is unable to perceive atom stereo from a flat geometry on atom 16986 of molecule ''
Traceback (most recent call last):
  File "check_stereo.py", line 54, in <module>
    main()
  File "check_stereo.py", line 40, in main
    assert oechem.OE3DToInternalStereo(molecule), f"the stereochemistry perception from 3D coordinates failed"
AssertionError: the stereochemistry perception from 3D coordinates failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants