Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix parsing precursor charge in MGF files #13

Merged
merged 3 commits into from
Oct 31, 2024

Conversation

paretje
Copy link
Contributor

@paretje paretje commented Oct 24, 2024

This fixes two issues:

  • the charge value in the PEPMASS header (can) have the same suffixed format as the CHARGE header
  • the order of the headers isn't determined, so the charge value should still be saved correctly if the CHARGE header is before the PEPMASS header

NOTE: There could be multiple PEPMASS headers, and this is supported by the parser, but it's not clear to me how the CHARGE header should behave then: should the charge be set as part of the PEPMASS header then, should there be multiple CHARGE headers, or should there only be one CHARGE header, that would apply to all precursor ions. At the moment, I kept the current behaviour of just setting the last precursor ion, but it might make sense to make this a for loop, keeping behaviour consistent, at least assuming there can only be one CHARGE header. But I might be overthinking what seems to be an edge-case ...

This fixes two issues:
- the charge value in the PEPMASS header can have the same suffixed
  format as the CHARGE header
- the order of the headers isn't determined, so the charge value should
  still be saved correctly if the CHARGE header is before the PEPMASS
  header

NOTE: I'm not sure if there is a clear precedence between PEPMASS or
CHARGE here, but it might be better to only do this if charge isn't set
already, to keep the behaviour consistent regardless of order. There
could be multiple PEPMASS headers, but it's not clear to me how the
CHARGE header should behave then: should there be multiple of them, or
should we actually set the charge of all precursor ions, like we do now
if CHARGE is first.
@mobiusklein
Copy link
Owner

Thank you for adding this.

I'll review this in more detail soon, but can you please provide a toy example MGF file that would fail to parse properly without this change? The MGF parser was one of first things I wrote when I was learning Rust, so the state machine is convoluted and it would be easy to re-break things.

@paretje
Copy link
Contributor Author

paretje commented Oct 24, 2024

I added the test case that tripped in ms2rescore-rs to test/data/small.mgf and updated the assertions accordingly.

@mobiusklein
Copy link
Owner

Thank you.

The control flow looks like it handles the two cases correctly now that I've had time to read it.

While you're right that this code won't handle an MGF which includes both a CHARGE header and a charge in the PEPMASS line or an MGF with multiple CHARGE headers, they are less common today thanks to high resolution MS1 spectra. I don't see a need to rework mzdata's data model to support tracking that explicitly.

If you think it needs to be captured, we can devolve secondary charge states to Param using MS:1000633|possible charge state. That'd be purely for the purposes of tracking that information, it'd need explicit help being round-tripped. Otherwise, I can add a warning if we see multiple charge states, but otherwise do nothing with the second charge state unless a user.

@mobiusklein
Copy link
Owner

If you're happy with the state the PR is in now, please let me know and I'll merge it and cut a new release.

@paretje
Copy link
Contributor Author

paretje commented Oct 29, 2024

Yes, I'm happy with the state of the PR, I was just pondering about the theoretical correctness of my changes, also given that it doesn't seem that well defined as a standard. I looked at the pyteomics and OpenMS parser for reference, and both actually just assume there is only one precursor.

As I said, this could be change that might be worthwhile to consider, as it seems somewhat more consistent in behaviour:

--- a/src/io/mgf.rs
+++ b/src/io/mgf.rs
@@ -358,13 +358,7 @@ impl<R: io::Read, C: CentroidPeakAdapting, D: DeconvolutedPeakAdapting> MGFReade
                 }
                 "CHARGE" => {
                     builder.precursor_charge = self.parse_charge(value);
-                    if let Some(ion) = builder
-                        .description
-                        .precursor
-                        .get_or_insert_with(Precursor::default)
-                        .iter_mut()
-                        .last()
-                    {
+                    for ion in builder.description.precursor.get_or_insert_with(Precursor::default).iter_mut() {
                         if ion.charge.is_none() {
                             ion.charge = builder.precursor_charge
                         }

@mobiusklein mobiusklein merged commit 4709c37 into mobiusklein:main Oct 31, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants