-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to compute trec_eval metrics directly on in-memory data structures #13
Comments
Hi Jimmy, Thanks for your interest in this project. Earlier versions of trectools were indeed using trec_eval (externally calling the program and parsing the results), but, currently, we have re-implemented many (unfortunately not all) of the trec_eval evaluation metrics. (We are trying to have them all implemented, along with new features, through an undergraduate student of Guido we will try to recruit). Please have a look at this this module For many of the implemented metrics, we have a flag "trec_eval" =[True/False] (default True) that mimics trec_eval (if set to true) in terms of (1) reranking the input based on the 'score' column, rather than the column with the ranking position (this is what you get instead if you set the flag to false); (2) using the same implementation as trec_eval, rather than alternative implementations, such as the case of nDCG (i.e. different gain functions). However there are many ways to make this tool more useful for the community (e.g. result visualization/comparison, web interface, more metrics, etc.). One straightforward thing that we look forward to having is more systematic unit tests. We are currently short of time to implement many of our extensions, so it would be incredible to see people interested in contributing to it -- and indeed that is why we are looking to have a new undergraduate/master student to work on this. Please let us know your ideas on how to proceed. |
Hi @joaopalotti - Looking at your code, However, IMO, we should try to directly build bindings between the C [1] The Impact of Score Ties on Repeatability in Document Ranking |
That is correct, Note that an effort to do what you are saying was conducted by those guys here: https://github.com/cvangysel/pytrec_eval However, we have decided to have our own implementation as a way to (1) have finer control on aspects such as tie-breaking, formula variation, etc; (2) quickly integrate evaluation metrics that we frequently use, but are not part of Our idea was to get exactly the same results when the parameter |
I'm happy to provide all the runs in Anserini for your unit tests! |
From you paper:
This is now pretty easy... with pyserini on PyPI.
But the real point of this issue is this: currently, as a I understand it, the input to evaluation is a file. Can we make it so that we can compute evaluation metrics directly from in-memory data structures?
Question is, what should the in-memory data structures look like? A Panda DF with the standard trec output format columns? A dictionary to support random access by qid? Something else?
If we can converge on something, I can even try to volunteer some of my students to contribute to this effort... :)
The text was updated successfully, but these errors were encountered: