Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! #840

lintool · 2019-10-25T21:57:49Z

I was trying to upgrade fastutil from version 6.5.6 (an ancient version from Jun 14, 2013) to the latest, version 8.3.0, when I came across a really insidious multi-part bug. The tl;dr is that there's a bug in RM3, which will affect all regressions. Here's the full story:

The class FeatureVector is built around the fastutil Object2FloatOpenHashMap class, which is used by the RM3 implementation to estimate relevance models. In the current implementation, when estimating the relevance model for the feedback docs, we truncate each individual feedback document:

docVector.pruneToSize(fbTerms);

This is the first part of the bug. Just because we ultimately want to select fbTerms terms for feedback doesn't mean that we should only consider fbTerms terms from each document. This was probably done for performance reasons, although query latency really isn't affected. I checked: on my iMac Pro, query latency doesn't increase with that line removed.

Now this leads to the second part of the bug: the method pruneToSize sorts the features by weight, but it doesn't consistently perform tie breaking. This means tie breaking is implementation specific, which means that the fastutil upgrade changed the tie-breaking behavior, which means that different terms are selected from documents, which changes the results.

Insert face plam here.

So to fix this, we need to:

Not prune selection from individual docs.
To prevent future issues along these lines, implement consistent tie-breaking behavior in the FeatureVector implementation.

The text was updated successfully, but these errors were encountered:

daltonj · 2019-12-10T17:44:03Z

Does this mean there are new Anserini RM3 results corrected? And is this 'bug' actually a feature?

lintool · 2019-12-10T17:49:05Z

Nope, I am loathe to fix this bug because all the regression numbers will change slightly. Punting for now.

daltonj · 2019-12-10T18:39:58Z

It doesn't seem right to not fix a bug because it would change numbers. Isn't this the correct, desired outcome of a bug fix? Fix the bug, update the tests..? It doesn't seem right to use / cite an RM3 implementation that is incorrect...?

lintool · 2019-12-10T18:47:23Z

I agree, this should be fixed, but it's a question of priorities...

An additional consideration is that this fix will make a bunch of papers already published - both by Waterloo and others that have started to depend on Anserini - not reproducible on master branch. This will lead to a proliferation of different numbers as "baselines" - which will all be correct, just on different versions. Yes, I understand that a proliferation of slightly different numbers is inevitable, but I'd like to hold on as long as I can...

arjenpdevries · 2019-12-10T19:30:44Z

I would have expected the effect to be small, as only low impact terms will be ignored from each document, and then the tie breaking behaviour is not really a bug but merely an undefined property in the whole algorithm. But you write "massive impact" so maybe the effect is not small?

I think the point by @daltonj is that what is now called RM3 appears not to implement RM3. I think I disagree with your last comment @lintool - I do not think that future papers should use a buggy implementation simply because previous papers did; future papers should get the algorithm they think they will use! It seems much more reasonable to have it fixed on master, and then have a branch for buggy-old-version-that-we-once-thought-implemented-RM3 for reproducibility purpose?

daltonj · 2019-12-10T21:54:44Z

Can we also update this issue quantifying the impact on MAP, and other standard metrics. How big is it? I expect the tie breaking to be small. But what about the term selection issue?

I would be happy to do a code review as well as provide sample expansion term weights from the Galago implementation to compare against.

lintool · 2019-12-10T22:21:38Z

Sorry, to clarify - "massive regression impact" means that all the regression numbers for every collection will change (we now have 25 different collections that we have regressions for)... but the changes will be small. I will quantify.

lintool · 2019-12-10T22:36:04Z

Okay, here are the results, on Robust04:

AP	Paper 1	Paper 2
BM25+RM3 (default)	0.2903	0.2903
BM25+RM3 (default): fixed	0.2920	0.2920
BM25+RM3 (tuned)	0.3043	0.3021
BM25+RM3 (tuned): fixed	0.3004	0.2989

Note that the tuned "fixed" results use the old parameter settings, without retuning.

cf: https://github.com/castorini/anserini/blob/master/docs/experiments-forum2018.md

For the record, these are the commands:

python src/main/python/fine_tuning/reconstruct_robus04_tuned_run.py \
 --index lucene-index.robust04.pos+docvectors+rawdocs \
 --folds src/main/resources/fine_tuning/robust04-paper1-folds.json \
 --params src/main/resources/fine_tuning/params/params.map.robust04-paper1-folds.bm25+rm3.json \
 --output run.robust04.bm25+rm3.paper1.txt


python src/main/python/fine_tuning/reconstruct_robus04_tuned_run.py \
 --index lucene-index.robust04.pos+docvectors+rawdocs \
 --folds src/main/resources/fine_tuning/robust04-paper2-folds.json \
 --params src/main/resources/fine_tuning/params/params.map.robust04-paper2-folds.bm25+rm3.json \
 --output run.robust04.bm25+rm3.paper2.txt


eval/trec_eval.9.0.4/trec_eval src/main/resources/topics-and-qrels/qrels.robust04.txt run.robust04.bm25+rm3.paper1.txt

eval/trec_eval.9.0.4/trec_eval src/main/resources/topics-and-qrels/qrels.robust04.txt run.robust04.bm25+rm3.paper2.txt

arjenpdevries · 2019-12-10T23:09:29Z

So marginal differences, phew.

daltonj · 2019-12-11T14:12:32Z

Thanks. I appreciate the fast turn around.

Is there a corresponding pull request / diff to review the RM3 changes? Maybe I could take a stab at reviewing the RM3 implementation.

Beyond that, I would also like to try and sync other implementations to make sure they are consistent -- e.g. the QL + RM3 for Galago vs Anserini. The terms selected and weights should be "similar".

Good time to fix this bug, given that Lucene 8->9 transition is already disruptive.

lintool self-assigned this Oct 25, 2019

lintool changed the title ~~Insidious fastutil, FeatureVector, and RM3: massive regression impact!~~ Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! Oct 25, 2019

lintool added a commit that referenced this issue Oct 26, 2019

Update of fastutil to 8.3.0; fixes to #840

50d4314

lintool mentioned this issue Oct 26, 2019

Trim dependencies: fastutil #835

Closed

lintool mentioned this issue Sep 18, 2022

Upgrade to fastutil 8.5.8 and fix FeatureVector bug in #840 #1975

Merged

lintool added a commit that referenced this issue Sep 18, 2022

Upgrade to fastutil 8.5.8 and fix FeatureVector bug in #840 (#1975)

a60e842

Good time to fix this bug, given that Lucene 8->9 transition is already disruptive.

lintool closed this as completed Sep 18, 2022

crystina-z pushed a commit to crystina-z/anserini that referenced this issue Oct 28, 2022

Fix error results in Vector-PRF reproduce instruction (castorini#840)

3e4c283

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! #840

Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! #840

lintool commented Oct 25, 2019

daltonj commented Dec 10, 2019

lintool commented Dec 10, 2019

daltonj commented Dec 10, 2019

lintool commented Dec 10, 2019

arjenpdevries commented Dec 10, 2019

daltonj commented Dec 10, 2019

lintool commented Dec 10, 2019

lintool commented Dec 10, 2019

arjenpdevries commented Dec 10, 2019

daltonj commented Dec 11, 2019

Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! #840

Insidious fastutil, FeatureVector, and RM3 bug: massive regression impact! #840

Comments

lintool commented Oct 25, 2019

daltonj commented Dec 10, 2019

lintool commented Dec 10, 2019

daltonj commented Dec 10, 2019

lintool commented Dec 10, 2019

arjenpdevries commented Dec 10, 2019

daltonj commented Dec 10, 2019

lintool commented Dec 10, 2019

lintool commented Dec 10, 2019

arjenpdevries commented Dec 10, 2019

daltonj commented Dec 11, 2019