Reference key in search results isn't of the same type as the actual key. #117

kkirby · 2014-11-13T15:23:37Z

In the result array after searching, the ref key isn't of the same type as the actual data. Take this example:

var index = lunr(function(){
    this.ref('id');
    this.field('firstName');
    this.field('lastName');
})
index.add({
    id: 1,
    firstName: 'John',
    lastName: 'Doe'
});
index.add({
    id: 2,
    firstName: 'Jane',
    lastName: 'Doe'
});

index.search('Doe');

The result is:

[
    {
        "ref": "1",
        "score": 0.5786669855009094
    },
    {
        "ref": "2",
        "score": 0.5786669855009094
    }
]

The ref key is of a string type, even though the actual key is a number.

The text was updated successfully, but these errors were encountered:

Updated to add the actual reference instead of relying on the map's key.

olivernn · 2014-11-17T18:44:00Z

I guess most people are using strings as keys for their documents, I'm actually a little surprised that this hasn't come up as an issue before.

I'll merge this in for the 0.6.0 release.

winniehell · 2015-08-18T22:53:33Z

Are there any plans when 0.6.0 will come? 😉
This issue is really a problem when using mongoDB ObjectIds as reference, because ObjectId.toString() can not be passed to the ObjectId constructor.

winniehell · 2015-08-18T23:09:49Z

For anybody having the same problem, here is my workaround:

Given

var searchIndex = lunr(function() {
  this.field('text');
  this.ref('_id');
});

replace

searchIndex.add(doc);

with

doc._id = doc._id._str;
searchIndex.add(doc);

and replace

var searchResults = searchIndex.search(input);
frobnicate(input[0].ref);

with

var searchResults = searchIndex.search(input);
frobnicate(new ObjectId(input[0].ref));

This lets lunr use the string representation of the ObjectIds and creates new ObjectIds when retrieving search results from lunr.

olivernn · 2015-09-28T19:49:00Z

I've thought about this a bit more and I've changed my mind on how lunr should behave here. I think the right thing to do is to insist that document references are strings, and trying to add a document with a non string ref should be an error.

The reason for this is that in several places through out the code base the reference is used as a property in an object being used as a key value store. This means that the key must be a string (or at least convert nicely to a string) because Object keys are always strings.

I think it is then more consistent with the way lunr wants to see the documents to be indexed. It assumes that by the time you do idx.add(doc) it is already in the right format for your index, i.e. lunr doesn't do any conversion etc.

If you have any other ideas or suggestions then I'd like to hear them, and sorry it took so long to respond!

kkirby · 2015-09-28T20:13:25Z

I think that's a fair idea, but I do think that in the instance that the ref is a number, lunr should treat is as a string. And for getting back the ref in the proper type, is there any reason my commit would not be acceptable?

olivernn · 2015-10-05T19:32:40Z

I tested your patch but the extra hasOwnProperty check and then property lookup did have a noticeable affect on the performance of the search. I don't think this is anything at fault with the patch, just the extra work it has to do had a non-negligible impact on the performance. This is what made me think that instead of having this overhead for all searches, rather do the check up front when adding documents to the index.

Lunr could continue as it is now by coercing the ref to a string, it is the least amount of change, i.e. none, but I'm currently leaning towards being more explicit with the requirement of string ref. It avoids any oddities where the object that is used as a ref gets coerced to a non-unique string.

Enforcing a string also is more inline with the way lunr treats the document being indexed, the idea being that before indexing the document it must be in the right format for lunr, i.e. lunr doesn't do any kind of conversion or munging of the document.

kkirby · 2015-10-05T20:37:23Z

Oh, okay. I actually thought my implementation would be faster since it's using a for loop rather than a forEach call on a function for every entity. At any rate, the lookup can't be that costly, can it? Simply changing this:

Object.keys(self.tokenStore.get(key)).forEach(function (ref) { set.add(ref) })

To this:

Object.keys(self.tokenStore.get(key)).forEach(function (ref) { set.add(documents[ref].ref) })

Shouldn't be too much overhead on the CPU.

I actually ended up doing a benchmark on it and as it turns out, my method as referenced above (using the lookup) is somehow faster on my machine: http://jsperf.com/lookupvsnolookup I found it a bit hard to believe, so I also setup a test environment on my machine using benchmark.js using this source:

import Benchmark from '/Users/kkirbatski/node_modules/benchmark';
let obj, set = null;

function setup(){
    var obj = {};
    for(var i = 0; i < 100; i++){
        obj[i] = {
            ref: i
        };
    }
    var set = {
        items: [],
        add: function(item){
            this.items.push(item);
        }
    }
}

let suite = new Benchmark.Suite('abc');
suite.add(
    'noLookup',
    function(){
        Object.keys(obj).forEach(function(ref){
            console.log(ref)
            set.add(ref);
        });
    },
    {setup}
).add(
    'lookup',
    function(){
        Object.keys(obj).forEach(function(ref){
            console.log(ref)
            set.add(obj[ref].ref);
        });
    },
    {setup}
).on('complete',function(){
    console.log('complete',this.filter('fastest').pluck('name'));
}).run();

So if I'm understanding the results correctly, if we used my method above, then we would have the best of both worlds. Retaining the reference key type, and increasing the speed of lunr. Though, I do find it very possible that I did something wrong in my tests so let me know your thoughts.

olivernn · 2015-10-26T18:03:34Z

After taking another look at this I don't know why I thought there was a performance issue. Anyway, I've pushed a fix for this in version 0.6.0 that preserves the type of the ref property in the results. The only caveat is that the ref type must be comparable, this is because of the way the ref is used in lunr.SortedSet.

Anyway, let me know if you have any issues with this, and thanks again for the patch and the issue, sorry it took so long to get a fix out for this!

kkirby · 2015-10-26T18:04:47Z

No worries, thanks!

kkirby added a commit to kkirby/lunr.js that referenced this issue Nov 13, 2014

Fixed olivernn#117

9364782

Updated to add the actual reference instead of relying on the map's key.

olivernn closed this as completed Oct 26, 2015

sherbondy mentioned this issue Jan 18, 2018

Unexpected behavior: numeric type refs get converted to strings once indexed. #326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference key in search results isn't of the same type as the actual key. #117

Reference key in search results isn't of the same type as the actual key. #117

kkirby commented Nov 13, 2014

olivernn commented Nov 17, 2014

winniehell commented Aug 18, 2015

winniehell commented Aug 18, 2015

olivernn commented Sep 28, 2015

kkirby commented Sep 28, 2015

olivernn commented Oct 5, 2015

kkirby commented Oct 5, 2015

olivernn commented Oct 26, 2015

kkirby commented Oct 26, 2015

Reference key in search results isn't of the same type as the actual key. #117

Reference key in search results isn't of the same type as the actual key. #117

Comments

kkirby commented Nov 13, 2014

olivernn commented Nov 17, 2014

winniehell commented Aug 18, 2015

winniehell commented Aug 18, 2015

olivernn commented Sep 28, 2015

kkirby commented Sep 28, 2015

olivernn commented Oct 5, 2015

kkirby commented Oct 5, 2015

olivernn commented Oct 26, 2015

kkirby commented Oct 26, 2015