-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix parsing precursor charge in MGF files #13
Conversation
This fixes two issues: - the charge value in the PEPMASS header can have the same suffixed format as the CHARGE header - the order of the headers isn't determined, so the charge value should still be saved correctly if the CHARGE header is before the PEPMASS header NOTE: I'm not sure if there is a clear precedence between PEPMASS or CHARGE here, but it might be better to only do this if charge isn't set already, to keep the behaviour consistent regardless of order. There could be multiple PEPMASS headers, but it's not clear to me how the CHARGE header should behave then: should there be multiple of them, or should we actually set the charge of all precursor ions, like we do now if CHARGE is first.
Thank you for adding this. I'll review this in more detail soon, but can you please provide a toy example MGF file that would fail to parse properly without this change? The MGF parser was one of first things I wrote when I was learning Rust, so the state machine is convoluted and it would be easy to re-break things. |
I added the test case that tripped in ms2rescore-rs to test/data/small.mgf and updated the assertions accordingly. |
Thank you. The control flow looks like it handles the two cases correctly now that I've had time to read it. While you're right that this code won't handle an MGF which includes both a If you think it needs to be captured, we can devolve secondary charge states to |
If you're happy with the state the PR is in now, please let me know and I'll merge it and cut a new release. |
Yes, I'm happy with the state of the PR, I was just pondering about the theoretical correctness of my changes, also given that it doesn't seem that well defined as a standard. I looked at the pyteomics and OpenMS parser for reference, and both actually just assume there is only one precursor. As I said, this could be change that might be worthwhile to consider, as it seems somewhat more consistent in behaviour: --- a/src/io/mgf.rs
+++ b/src/io/mgf.rs
@@ -358,13 +358,7 @@ impl<R: io::Read, C: CentroidPeakAdapting, D: DeconvolutedPeakAdapting> MGFReade
}
"CHARGE" => {
builder.precursor_charge = self.parse_charge(value);
- if let Some(ion) = builder
- .description
- .precursor
- .get_or_insert_with(Precursor::default)
- .iter_mut()
- .last()
- {
+ for ion in builder.description.precursor.get_or_insert_with(Precursor::default).iter_mut() {
if ion.charge.is_none() {
ion.charge = builder.precursor_charge
} |
This fixes two issues:
PEPMASS
header (can) have the same suffixed format as theCHARGE
headerCHARGE
header is before thePEPMASS
headerNOTE: There could be multiple
PEPMASS
headers, and this is supported by the parser, but it's not clear to me how theCHARGE
header should behave then: should the charge be set as part of thePEPMASS
header then, should there be multipleCHARGE
headers, or should there only be oneCHARGE
header, that would apply to all precursor ions. At the moment, I kept the current behaviour of just setting the last precursor ion, but it might make sense to make this a for loop, keeping behaviour consistent, at least assuming there can only be oneCHARGE
header. But I might be overthinking what seems to be an edge-case ...