-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! #840
Comments
Does this mean there are new Anserini RM3 results corrected? And is this 'bug' actually a feature? |
Nope, I am loathe to fix this bug because all the regression numbers will change slightly. Punting for now. |
It doesn't seem right to not fix a bug because it would change numbers. Isn't this the correct, desired outcome of a bug fix? Fix the bug, update the tests..? It doesn't seem right to use / cite an RM3 implementation that is incorrect...? |
I agree, this should be fixed, but it's a question of priorities... An additional consideration is that this fix will make a bunch of papers already published - both by Waterloo and others that have started to depend on Anserini - not reproducible on master branch. This will lead to a proliferation of different numbers as "baselines" - which will all be correct, just on different versions. Yes, I understand that a proliferation of slightly different numbers is inevitable, but I'd like to hold on as long as I can... |
I would have expected the effect to be small, as only low impact terms will be ignored from each document, and then the tie breaking behaviour is not really a bug but merely an undefined property in the whole algorithm. But you write "massive impact" so maybe the effect is not small? I think the point by @daltonj is that what is now called RM3 appears not to implement RM3. I think I disagree with your last comment @lintool - I do not think that future papers should use a buggy implementation simply because previous papers did; future papers should get the algorithm they think they will use! It seems much more reasonable to have it fixed on master, and then have a branch for buggy-old-version-that-we-once-thought-implemented-RM3 for reproducibility purpose? |
Can we also update this issue quantifying the impact on MAP, and other standard metrics. How big is it? I expect the tie breaking to be small. But what about the term selection issue? I would be happy to do a code review as well as provide sample expansion term weights from the Galago implementation to compare against. |
Sorry, to clarify - "massive regression impact" means that all the regression numbers for every collection will change (we now have 25 different collections that we have regressions for)... but the changes will be small. I will quantify. |
Okay, here are the results, on Robust04:
Note that the tuned "fixed" results use the old parameter settings, without retuning. cf: https://github.com/castorini/anserini/blob/master/docs/experiments-forum2018.md For the record, these are the commands:
|
So marginal differences, phew. |
Thanks. I appreciate the fast turn around. Is there a corresponding pull request / diff to review the RM3 changes? Maybe I could take a stab at reviewing the RM3 implementation. Beyond that, I would also like to try and sync other implementations to make sure they are consistent -- e.g. the QL + RM3 for Galago vs Anserini. The terms selected and weights should be "similar". |
I was trying to upgrade
fastutil
from version 6.5.6 (an ancient version from Jun 14, 2013) to the latest, version 8.3.0, when I came across a really insidious multi-part bug. The tl;dr is that there's a bug in RM3, which will affect all regressions. Here's the full story:The class
FeatureVector
is built around the fastutilObject2FloatOpenHashMap
class, which is used by the RM3 implementation to estimate relevance models. In the current implementation, when estimating the relevance model for the feedback docs, we truncate each individual feedback document:This is the first part of the bug. Just because we ultimately want to select
fbTerms
terms for feedback doesn't mean that we should only considerfbTerms
terms from each document. This was probably done for performance reasons, although query latency really isn't affected. I checked: on my iMac Pro, query latency doesn't increase with that line removed.Now this leads to the second part of the bug: the method
pruneToSize
sorts the features by weight, but it doesn't consistently perform tie breaking. This means tie breaking is implementation specific, which means that the fastutil upgrade changed the tie-breaking behavior, which means that different terms are selected from documents, which changes the results.Insert face plam here.
So to fix this, we need to:
FeatureVector
implementation.The text was updated successfully, but these errors were encountered: