Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to grab compound name from metadata dictionary #249

Closed
wjcranda opened this issue Jul 30, 2024 · 10 comments
Closed

Failure to grab compound name from metadata dictionary #249

wjcranda opened this issue Jul 30, 2024 · 10 comments
Assignees

Comments

@wjcranda
Copy link

wjcranda commented Jul 30, 2024

Version 1.5.0 negative mode query failed at a specific spectra in .mgf file. Spectra prior to and after this spectrum were able to be ran with no issue.

Running the same .mgf file through a previous version (1.2.4) and older zenodo models, all spectra in .mgf were successfully queried. Specifically, this spectrum had an analog match (with low score).

It seems to me that new library additions in the latest zenodo files may have issues. Where this specific spectrum matched to a particular library compound with inadequate metadata to write the output.

Here is the full error message:


KeyError Traceback (most recent call last)
File :55

File :28, in run_ms2query(ion_mode)

File ~\AppData\Roaming\Python\Python310\site-packages\ms2query\run_ms2query.py:97, in run_complete_folder(ms2library, folder_with_spectra, results_folder, settings)
95 if os.path.isfile(file_path):
96 if os.path.splitext(file_name)[1].lower() in {".mzml", ".json", ".mgf", ".msp", ".mzxml", ".usi", ".pickle"}:
---> 97 run_ms2query_single_file(spectrum_file_name=file_name,
98 folder_with_spectra=folder_with_spectra,
99 results_folder=results_folder,
100 ms2library=ms2library, settings=settings)
101 folder_contained_spectrum_file = True
102 else:

File ~\AppData\Roaming\Python\Python310\site-packages\ms2query\run_ms2query.py:143, in run_ms2query_single_file(spectrum_file_name, folder_with_spectra, results_folder, ms2library, settings)
139 spectra = load_matchms_spectrum_objects_from_file(os.path.join(folder_with_spectra, spectrum_file_name))
140 analogs_results_file_name = return_non_existing_file_name(
141 os.path.join(results_folder,
142 os.path.splitext(spectrum_file_name)[0] + ".csv"))
--> 143 ms2library.analog_search_store_in_csv(spectra,
144 analogs_results_file_name,
145 settings)
146 print(f"Results stored in {analogs_results_file_name}")

File ~\AppData\Roaming\Python\Python310\site-packages\ms2query\ms2library.py:211, in MS2Library.analog_search_store_in_csv(self, query_spectra, results_csv_file_location, settings)
205 csv_file.write(",".join(
206 column_names_for_output(True, add_class_annotations, settings.additional_metadata_columns,
207 settings.additional_ms2query_score_columns)) + "\n")
209 results_df_generator = self.analog_search_yield_df(query_spectra, settings)
--> 211 for results_df in results_df_generator:
212 results_df.to_csv(results_csv_file_location, mode="a", header=False, float_format="%.4f", index=False)

File ~\AppData\Roaming\Python\Python310\site-packages\ms2query\ms2library.py:173, in MS2Library.analog_search_yield_df(self, query_spectra, settings, progress_bar)
171 else:
172 results_table = get_ms2query_model_prediction_single_spectrum(results_table, self.ms2query_model)
--> 173 results_df = results_table.export_to_dataframe(
174 settings.nr_of_top_analogs_to_save,
175 settings.minimal_ms2query_metascore,
176 additional_metadata_columns=settings.additional_metadata_columns,
177 additional_ms2query_score_columns=settings.additional_ms2query_score_columns)
178 yield results_df

File ~\AppData\Roaming\Python\Python310\site-packages\ms2query\results_table.py:136, in ResultsTable.export_to_dataframe(self, nr_of_top_spectra, minimal_ms2query_score, additional_metadata_columns, additional_ms2query_score_columns)
134 # For each analog the compound name is selected from sqlite
135 metadata_dict = self.sqlite_library.get_metadata_from_sqlite(list(selected_analogs["spectrum_ids"]))
--> 136 compound_name_list = [metadata_dict[analog_spectrum_id]["compound_name"]
137 for analog_spectrum_id
138 in list(selected_analogs["spectrum_ids"])]
139 smiles_list = [metadata_dict[analog_spectrum_id]["smiles"]
140 for analog_spectrum_id
141 in list(selected_analogs["spectrum_ids"])]
143 # Add inchikey and ms2query model prediction to results df
144 # results_df = selected_analogs.loc[:, ["spectrum_ids", "ms2query_model_prediction", "inchikey"]]

File ~\AppData\Roaming\Python\Python310\site-packages\ms2query\results_table.py:136, in (.0)
134 # For each analog the compound name is selected from sqlite
135 metadata_dict = self.sqlite_library.get_metadata_from_sqlite(list(selected_analogs["spectrum_ids"]))
--> 136 compound_name_list = [metadata_dict[analog_spectrum_id]["compound_name"]
137 for analog_spectrum_id
138 in list(selected_analogs["spectrum_ids"])]
139 smiles_list = [metadata_dict[analog_spectrum_id]["smiles"]
140 for analog_spectrum_id
141 in list(selected_analogs["spectrum_ids"])]
143 # Add inchikey and ms2query model prediction to results df
144 # results_df = selected_analogs.loc[:, ["spectrum_ids", "ms2query_model_prediction", "inchikey"]]

KeyError: 'compound_name'

@wjcranda
Copy link
Author

I have created a temporary fix in which 'no name' will be placed in compound name column which has allowed the queries to continue without erroring out when they reach a library match missing compound name metadata. This allowed both positive and negative mode scripts to be completed.

The fact that this fix worked means it could just be the 'compound name' metadata information is missing for some of the library entries. however, it is pure chance that my spectra matched to any of these library hits which are missing this.

If you would like the specific smiles and other info of the library matches that do not have a compound name I can give you my outputs as well.

results_table.txt

@niekdejonge
Copy link
Collaborator

Thanks for the clear issue and thanks for helping out others, while all developers were on holidays :) Missing compound names should indeed not result in breaking changes. I have not seen this issue before and don't know a solution directly, but I will look into it.

@niekdejonge niekdejonge self-assigned this Aug 6, 2024
@wjcranda
Copy link
Author

wjcranda commented Aug 6, 2024

No problem! I really enjoy using these tools in my research. Let me know if you need more information from my side.

@rafaypir
Copy link

rafaypir commented Aug 7, 2024

Hi, I'm also facing similar issue causing failure to grab compound name from metadata dictionary. Additionally, some errors in resultant file have been found.

example: cf_kingdom isn't supposed to be giving ['UDFMLCREIMSEIU', '', '', '', '', ''] in the column. Another column cf_direct_parent is found with incorrect information.

Attaching incomplete csv file for reference.

cleaned_query_spectra.csv

@niekdejonge
Copy link
Collaborator

#251 should fix this issue. I am surprised that this bug did not appear earlier, but it was an easy fix. After fixing @rafaypir s issue I will do a new release.

@niekdejonge
Copy link
Collaborator

#252 fixes the origin of the issue. New sqlite files still need to be generated and uploaded to zenodo, since the mistake was added to the sqlite files.

@MarJakubec
Copy link

@niekdejonge thank you for quick fix. Do you know when new libraries (sqlite files) will be generated? For now I need to use files all the way from 2022 to run ms2query (version 2).

@niekdejonge
Copy link
Collaborator

I did start the run for generating them. But for the positive mode the run crashed overnight, due to some server issues. I will do a new release today, this has the issue fixed with the compound names, so you can run the newest version of MS2Query again with the newest libraries. However, it still has the issue @rafaypir noted, but this is luckily not critical for using MS2Query.

@niekdejonge
Copy link
Collaborator

A new release has been made. To use it install (or update) to ms2query 1.5.2 you can use the latest model and library. The class annotations issue still happens, but this will be fixed in a later version.

@niekdejonge
Copy link
Collaborator

With version 1.5.3 also the issue mentioned by @rafaypir should be fixed. Make sure you redownload the library files (only the .sqlite file has been updated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants