Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The combined_ion.tsv format exported by FragPipe is used for analysis in DEqMS #47

Open
JXWKSO opened this issue Sep 26, 2024 · 5 comments

Comments

@JXWKSO
Copy link

JXWKSO commented Sep 26, 2024

I am using the FragPipe software for database searches, and the output file combined_ion.tsv is used for quantification with directLFQ. I plan to perform differential expression analysis using DEqMS, but DEqMS requires peptide counts for each protein to adjust the quantification. How can I use the GUI to add the peptide counts for each protein to the output file, given that this information is not present in the combined_ion.tsv? Additionally, I keep encountering an error 'Could not add additional columns to protein table, printing without additional columns' whenever I try to add gene names.

@ammarcsj
Copy link
Member

ammarcsj commented Oct 1, 2024

Hi, so unfortunately I have no default script for adding peptide counts to the results table. In case you are a little bit familiar with pandas, you could append the counts to the table. It would go something like the code below, you have to adapt the filenames and column names.
Concerning the error with adding the gene names, thanks for making me aware, I found and fixed the issue. Will be included in the next release.

import pandas as pd

# Read the TSV files
combined_ion_df = pd.read_csv("combined_ion.tsv", sep="\t")
directlfq_results_df = pd.read_csv("direclfq_results_file.tsv", sep="\t")

# Create a dictionary of protein to peptide count
protein2count = combined_ion_df.groupby("protein")["sequence"].size().to_dict()

# Add the peptide counts to directlfq_results_df
directlfq_results_df["peptide_counts"] = directlfq_results_df["protein"].map(protein2count)

@JXWKSO
Copy link
Author

JXWKSO commented Oct 2, 2024

Thanks for your response, and I look forward to your next update!

@ammarcsj
Copy link
Member

Hi, in the newest release, the error with the gene names should now be fixed

@enonimos
Copy link

enonimos commented Nov 13, 2024

Hi, I just wanted to add a quick comment to this thread. I observe the column annotation error still occurs when using the parquet format. It works as expected when using the tsv report. The parquet report was generated in DIA-NN 1.9.2

@ammarcsj
Copy link
Member

Thank you for this update! I will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants