How to handle duplicate document IDs? #50

mrdrozdov · 2024-05-24T21:39:07Z

What if I am predicting a ranked list with the same document ID multiple times in different positions. How can I evaluate nDCG for this using pytrec_eval, given that scores are represented as dictionaries?

seanmacavaney · 2024-05-25T08:12:07Z

Hey @mrdrozdov -- trec_eval itself checks for duplicate documents and raises an error if it finds any. So I'm not sure diverging from this behavior in the python wrapper would make sense.

Even so, many measures are not well-defined in the presence of duplicate documents. E.g., you could get an ndcg score > 1 when duplicates are present. So you'd have to think carefully about what measures are potentially suitable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle duplicate document IDs? #50

How to handle duplicate document IDs? #50

mrdrozdov commented May 24, 2024

seanmacavaney commented May 25, 2024

How to handle duplicate document IDs? #50

How to handle duplicate document IDs? #50

Comments

mrdrozdov commented May 24, 2024

seanmacavaney commented May 25, 2024