-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read ion mobility from mzML and write to mzIdentML #32
Comments
I'm assuming that this will only matter for mzML input? (I can't think of how it would be encoded in other supported spectrum input formats). The important classes here are:
I think coding this could be done in less than an hour, testing the functionality is a different story. |
Mostly mzML for now but parsing it from MGF title is possibility as well (although getting the specific CV term and units would be tricky). I think all 3 IMS types should be supported, yes. An amusing implementation would be just to take the whole cvParam element (with one of the 3 supported accessions) as a string and plop in the mzIdentML rather than trying to parse it into value, type, and units. |
Well, I don't think the mzML parsing/mzid writing library used will let me just copy the whole string from one to the other, although I could possibly just store the cvParam object(s) to transfer them from one to the other; parsing the value, type, and units isn't hard due to that library. |
Indeed. And the jmzml and jmzidml models use distinct cvParams classes so you can't plop one into the other. So it seems 2 values will have to be carried through: the accession and the value (the unit is implied by the accession, i.e. the mobility value type). Let me know if you want me to test it. Thanks! |
I just made a commit that should provide this functionality. Do you want me to provide a binary, or are you okay with checking out and compiling the current master branch? |
I don't think I ever set up the build environment for this, so a binary would be nice (or I can wait for the next release). |
Didn't seem to work for a Waters HDDDA. Here's the input and output and the command I used. I just used a random FASTA I had around. Just needed to see one SpectrumIdentificationResult and that's all I got. Might need to be a pretty large FASTA to get a random hit. I'm not sure what species this sample is, or if it's even peptides. :) The first test I did was a legit search with TIMS PASEF data, but that failed at the end (see my PR to fix that). |
https://github.com/MSGFPlus/msgfplus/releases/tag/IMS_CV_Preview2 That HDDDA mzML file says that all spectra are profile, which MS-GF+ skips. However it does have an internal evaluation that might be saying that the spectra are centroided (it looks for a median difference of >=50 PPM between m/zs of consecutive peaks in the spectra), if you didn't see an error saying that it "skipped spectrum x since it is not centroided". Overall, if you're able to get meaningful information out of a MS-GF+ search on IMS/TIMS data, that would be great since MS-GF+ was never designed to work on such data (and if it does work reasonably well, then we might need to introduce some new scoring models to properly accommodate it). |
I'm still not seeing the CV term carried through. It says the build is from 6-28. Are you giving me the right binary? I know the data is ridiculous:
All I wanted was a single SIR to test whether the cvParam is getting carried through. It doesn't need to be a legit result. Ironically, when I ran the default CWT peak picker on this data so that they really were centroided, I got NO results. Only when I turn the SNR down to 0 then it gets better. The ion mobility spectra are very sparse even in profile mode so that's probably why. |
Well, the date that MS-GF+ outputs is only manually updated, and I haven't updated it yet. I will have to try that file with some fasta file here, while using the debugger, to figure out exactly what's going on. |
Well, found the main bug, which also affects other searches: MS-GF+ originally only checked the scanList in mzML spectra for the "[Thermo Trailer Extra]Monoisotopic M/Z:" userParam, so there was a check to only enter an if statement if there was at least one userParam in scanList:scan[0]. This bug also meant that the scan start time would not be output for a search on data from, say, an Agilent QTOF. https://github.com/MSGFPlus/msgfplus/releases/tag/v2018.07.17 fixes it, I did see the desired cvParam in the single search result I got (searching against a human refseq fasta file I had on hand). |
Excellent. Bruker TIMS results now have both ion mobility and scan time. I hadn't realized they were missing scan time previously. Thanks! |
The PSI CV has been tweaked to allow ion mobility terms to be put in the mzIdentML at the SpectrumIdentificationResult the same way
scan start time
already could be:https://sourceforge.net/p/psidev/mailman/message/36317835/
How hard would it be to get MS-GF+ to carry this attribute through to the output mzIdentML?
The text was updated successfully, but these errors were encountered: