Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research W ion usage #193

Closed
douweschulte opened this issue Oct 4, 2022 · 15 comments
Closed

Research W ion usage #193

douweschulte opened this issue Oct 4, 2022 · 15 comments
Labels
M-needs-investigation Misc: Needs looking into by the assignee M-possible-scoring-change Misc: Something that can affect the scoring M-user-question Misc: All user questions regarding running, usage and such to guide the documentation development

Comments

@douweschulte
Copy link
Member

Find out how W ions are annotated in Hecklib. Find some examples of the ion in the 'wild' (or generated datasets). This could prove useful for #81.

@douweschulte douweschulte added the M-needs-investigation Misc: Needs looking into by the assignee label Oct 4, 2022
@douweschulte
Copy link
Member Author

The used fragmentation model at this time does not search for W ions. Use the following snippets to build it in:

new AnnotatedSpectrumMatch(asm.Spectrum, asm.Match, new PeptideFragment.FragmentModel(asm.Model){ Y = null, Z = null })

And example code from the modelAll

W = new PeptideFragment.FragmentRange { MinPos = 0, MaxPos = PeptideFragment.SEQUENCELENGTH - 1, HigherCharges = true, MassShifts = PeptideFragment.MASSSHIFT_NEUTRALLOSS | PeptideFragment.MASSSHIFT_WATERLOSS | PeptideFragment.MASSSHIFT_AMMONIALOSS | PeptideFragment.MASSSHIFT_HYDROGENGAIN | PeptideFragment.MASSSHIFT_CAPITALGAIN | PeptideFragment.MASSSHIFT_HYDROGENLOSS | PeptideFragment.MASSSHIFT_CAPITALLOSS | PeptideFragment.MASSSHIFT_MODIFICATIONLOSS },

Currently used model: https://bms-developer.science.uu.nl/csharp/hecklib-core/-/blob/master/Source/HeckLib/chemistry/PeptideFragment.cs#L659

@douweschulte
Copy link
Member Author

After discussion with Richard it is clear that adding the definition for Y for W would be perfectly fine. That would result in the following code:

new AnnotatedSpectrumMatch(asm.Spectrum, asm.Match, new PeptideFragment.FragmentModel(asm.Model){ 
    W = new PeptideFragment.FragmentRange { 
        MinPos = 1, 
        MaxPos = PeptideFragment.SEQUENCELENGTH - 1, 
        HigherCharges = true, 
        MassShifts = PeptideFragment.MASSSHIFT_WATERLOSS | PeptideFragment.MASSSHIFT_AMMONIALOSS }})

@douweschulte douweschulte added the M-possible-scoring-change Misc: Something that can affect the scoring label Dec 20, 2022
douweschulte added a commit that referenced this issue Jan 10, 2023
@douweschulte
Copy link
Member Author

With the above implementation every L/I is automatically determined to be J/I/L based on w ion support in the raw data. This needs some more analysis, but looks promising.

douweschulte added a commit that referenced this issue Jan 13, 2023
@douweschulte
Copy link
Member Author

Some issues with this were fixed and the d ions were also included to provide more confidence.

@Ln9052
Copy link

Ln9052 commented Nov 27, 2023

Hi Schulte,
I'd like to confirm whether the "RawDataDirectory" parameter in Stitch for identifying I/L has been deprecated. If "RawDataDirectory" has indeed been deprecated, are there alternative methods available for I/L identification?
Thank you.

@douweschulte
Copy link
Member Author

No it has not, it has moved to be specified on each file separately. This was done to allow for opening multiple dataset with unrelated raw file locations. Please not that for the I/L disambiguation you will have to specify this as an additional parameter for each file as well. XleDisambiguation: True.

So here is a small example of how this would look:

Input ->
    Peaks -->
        Path: Some path
        RawDataDirectory: /the/path/
        XleDisambiguation: True
        - any further settings
    <-
<-

@Ln9052
Copy link

Ln9052 commented Nov 27, 2023

Hi Schulte, When attempting to use Stitch parameters "XleDisambiguation" and "RawDataDirectory", I encountered IO issues during execution. There seems to be a problem with opening and reading the file. Could you please provide suggestions on resolving this issue? Thank you.
Input ->
pNovo ->
Path :Path......\result\results.res
Param :Path......\param\pNovo.param
Name :1
CutoffScore :90
RawDataDirectory:F:\lining_data\MAHH_raw\1ul\HCl_4min.raw
XleDisambiguation :True
image

@douweschulte
Copy link
Member Author

As the parameter name implies RawDataDirectory assumes the path to be a directory. So you will have to change your batchfile line from RawDataDirectory:F:\lining_data\MAHH_raw\1ul\HCl_4min.raw into RawDataDirectory:F:\lining_data\MAHH_raw\. I will make a better error message for this case as I have seen other people getting stuck on the same error.

@douweschulte douweschulte added the M-user-question Misc: All user questions regarding running, usage and such to guide the documentation development label Nov 27, 2023
@Ln9052
Copy link

Ln9052 commented Nov 28, 2023

I tried using the latest nightly version of Stitch#374https://github.com/snijderlab/stitch/actions/runs/7006838956,When I modified the line in my batch file from 'RawDataDirectory:F:\lining_data\MAHH_raw\1ul\HCl_4min.raw' to either 'RawDataDirectory:F:\lining_data\MAHH_raw' or 'RawDataDirectory:F:\lining_data\MAHH_raw\1ul', I encountered an error during execution. Could you please take a look at this issue for me? Thank you.
image
image
image
Error: Could not open file
│ F:\lining_data\MAHH_raw\1ul
note: The file given is a directory.
help: Make sure you are trying to open the correct raw data for this dataset.

@douweschulte
Copy link
Member Author

The problem is that the output of pNovo makes it really hard to find the correct raw file. You seem to have run pNovo on a mgf file with a different format then I expected. For now I updated the code so that it should skip any raw data it did not find instead of giving this error. But this does not make the spectra work in your case. If you want the spectra to work you can send me the pNovo files so I can build in support for the slightly different format it has in your case.

@Ln9052
Copy link

Ln9052 commented Nov 28, 2023

Hi Schulte, I used pParse.exe to convert .RAW files into .mgf files and ran them through pNovo3. I have compressed the pNovo3-related results and parameter files. If you have time, could you please take a look? Thank you for your patient assistance in answering my questions.
HCl_4min_pNovo_files.zip

douweschulte added a commit that referenced this issue Nov 28, 2023
@douweschulte
Copy link
Member Author

I fixed the parsing to take this format as well. With the new nightly this should work from now on.

@Ln9052
Copy link

Ln9052 commented Nov 29, 2023

Hi Schulte, I've noticed three things. Firstly, in the Maxnovo section of the manual, there's a parameter called Fixedmodification, and it's generally considered that M+16 is a variable modification. I'm curious about your perspective on the use of adding M+16 in the Fixedmodification parameter.
image
Additionally, when using Stitch to read Maxnovo results, I receive a warning: 'Warning: Could not parse pro forma sequence │ CTAEETDIINNM[Oxidation (M)]YPR note: The program will continue, but the spectra will be missing for this peptide (MN:15469).' It seems that the appearance of M[Oxidation (M)] in the Maxnovo results sequence might interfere with parsing. Could you take a look at my batchfiles
monoclonal_maxnovo.txt
and msmsScans.txt to help identify the reason?
Furthermore, I've noticed that the RawDataDirectory parameter is not listed in the manual for Stitch when reading Novor files. Can you clarify whether Stitch can disambiguate I/L when reading Novor files?
image
Thank you very much.

@douweschulte
Copy link
Member Author

Regarding the fixed modification, it is indeed weird to have oxidation as fixed modification. Although it should be quite harmless to put it here as an example I think it more clear to remove it.

The parsing error you see is because I use ProForma for the definition of peptide sequences and the unimod definition for oxidation used in that format is written M[Oxidation]. These errors happen because it seems like every file format has devised their own names for modifications. As you see in the error though the failure to parse the sequence will only prevent this one peptide from having its spectrum shown, otherwise no ill effect is to be had. If this behaviour is unacceptable or you find many more of these modifications that break I could look into adding a list of known sloppy names to resolve in the parser.

For the Novor data, there is a parameter RawFile where you can give the path to the raw file used for this Novor run. The XleDisambiguation parameter is also working. I updated the manual. Thanks for raising the question!

douweschulte added a commit that referenced this issue Nov 29, 2023
@Ln9052
Copy link

Ln9052 commented Nov 29, 2023

Thank you for your patient assistance in answering my questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-needs-investigation Misc: Needs looking into by the assignee M-possible-scoring-change Misc: Something that can affect the scoring M-user-question Misc: All user questions regarding running, usage and such to guide the documentation development
Projects
None yet
Development

No branches or pull requests

2 participants