Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when having more than 10 chain identifiers #287

Closed
FAOlivieri opened this issue Apr 11, 2024 · 7 comments
Closed

Error when having more than 10 chain identifiers #287

FAOlivieri opened this issue Apr 11, 2024 · 7 comments

Comments

@FAOlivieri
Copy link

I have a system with 9 chains. After solvating with pdbfixer, I end up with 11 chains (chain ids are 1 to 11). When pdbfixer gets to the following line:

if len(chains) == 1:
    chains[0].id = 'A'
else:
    chains[-1].id = chr(ord(chains[-2].id)+1)

it throws an error, because chains[-2].id is not parseable to 1 character. I fixed it by changing the code to

if len(chains) == 1:
    chains[0].id = 'A'
else:
    try:
        int(chains[-2].id)
    except:
        chains[-1].id = chr(ord(chains[-2].id)+1) #what happens if chain ID is more than one character? They can be two characters
    else:
        chains[-1].id = str(int(chains[-2].id)+1)

and now it works properly but what exactly is this part of the code trying to achieve? As far as I understand it's assigning to the last chain the chain ID that follows the previous one, but why is that necessary? In most systems that will be true by default. Also, as per the PDB specification, chain ID can be 2 characters, so the existing code will fail on 2 letters as well.

Couldn't this whole IF be deleted?

@peastman
Copy link
Member

Chain IDs are only allowed to be a single character. Usually they're letters. See the PDB spec.

@FAOlivieri
Copy link
Author

Ok. Could you clarify what the line in my post is trying to achieve?

@peastman
Copy link
Member

It increments the letter. If the last chain is A, the next one is called B.

@murfalo
Copy link
Contributor

murfalo commented Jun 20, 2024

Hey Peter,

I'm running into the same issue. This seems to occur because PDBFixer uses OpenMM's modeller.addSolvent, which in turn uses topology.addChain() in a number of locations without specifying an ID:

# ...creating solvent chain...
newChain = newTopology.addChain() # modeller.py line 571
# ...
# ...creating ion chain...
ionChain = modeller.topology.addChain() # modeller.py line 373

(from OpenMM modeller.py line 571 and line 373)

In these cases, the ID defaults to a string representing its index:

    def addChain(self, id=None):
        """Create a new Chain and add it to the Topology.

        Parameters
        ----------
        id : string=None
            An optional identifier for the chain.  If this is omitted, an id is
            generated based on the chain index.

        Returns
        -------
        Chain
             the newly created Chain
        """
        if id is None:
            id = str(len(self._chains)+1)
        chain = Chain(len(self._chains), self, id)
        self._chains.append(chain)
        return chain

(from OpenMM topology.py lines 120-138)

This ID will be as many characters as the index is digits, often more than a single character. In these cases, PDBFixer will throw the error seen above (i.e., ord("10") is undefined, as ord takes only single-character arguments).

Updating OpenMM to use an alphabetic chain descriptor would be straightforward, but would limit addChain to 26 chains. Maybe updating PDBFixer would be the better approach? Let me know either way and I'd be happy to contribute a fix!

Thanks,
Murphy

@FAOlivieri
Copy link
Author

Changing it to work with letters would definitely work, and would be better than my improvised solution to parse numbers and letters. Most software is already limited to 26 chain IDs anyway, so it's doubtful that it will introduce a limitation that wasn't already there

@peastman
Copy link
Member

The important thing is that it not crash. When you actually get around to saving the structure, writeFile() defaults to keepIds=False, which means it will generate new chain IDs anyway. We just need to keep it from crashing before it gets to that point. Also, the limit of 26 chains only applies to PDB format. PDBx/mmCIF allows multiple characters for the chain ID.

How about this as a behavior?

  • If the current last chain ID is a letter between A and Y, assign the next letter as the ID for the new chain.
  • If not, assign a multicharacter ID in any reasonable way.

murfalo added a commit to murfalo/pdbfixer that referenced this issue Jun 21, 2024
murfalo added a commit to murfalo/pdbfixer that referenced this issue Jun 21, 2024
murfalo added a commit to murfalo/pdbfixer that referenced this issue Jun 21, 2024
murfalo added a commit to murfalo/pdbfixer that referenced this issue Jun 21, 2024
murfalo added a commit to murfalo/pdbfixer that referenced this issue Jun 21, 2024
murfalo added a commit to murfalo/pdbfixer that referenced this issue Jun 21, 2024
peastman pushed a commit that referenced this issue Jun 26, 2024
* Bugfixed chain ID assignment in addSolvent (#287)

* Simplify logic for alphabetic ID extension

* Added `_renameNewChains()` for `addSolvent()` and `addMembrane()`
@peastman
Copy link
Member

Fixed by #294.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants