Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference key in search results isn't of the same type as the actual key. #117

Closed
kkirby opened this issue Nov 13, 2014 · 9 comments
Closed

Comments

@kkirby
Copy link

kkirby commented Nov 13, 2014

In the result array after searching, the ref key isn't of the same type as the actual data. Take this example:

var index = lunr(function(){
    this.ref('id');
    this.field('firstName');
    this.field('lastName');
})
index.add({
    id: 1,
    firstName: 'John',
    lastName: 'Doe'
});
index.add({
    id: 2,
    firstName: 'Jane',
    lastName: 'Doe'
});

index.search('Doe');

The result is:

[
    {
        "ref": "1",
        "score": 0.5786669855009094
    },
    {
        "ref": "2",
        "score": 0.5786669855009094
    }
]

The ref key is of a string type, even though the actual key is a number.

kkirby added a commit to kkirby/lunr.js that referenced this issue Nov 13, 2014
Updated to add the actual reference instead of relying on the map's key.
@olivernn
Copy link
Owner

I guess most people are using strings as keys for their documents, I'm actually a little surprised that this hasn't come up as an issue before.

I'll merge this in for the 0.6.0 release.

@winniehell
Copy link

Are there any plans when 0.6.0 will come? 😉
This issue is really a problem when using mongoDB ObjectIds as reference, because ObjectId.toString() can not be passed to the ObjectId constructor.

@winniehell
Copy link

For anybody having the same problem, here is my workaround:

Given

var searchIndex = lunr(function() {
  this.field('text');
  this.ref('_id');
});

replace

searchIndex.add(doc);

with

doc._id = doc._id._str;
searchIndex.add(doc);

and replace

var searchResults = searchIndex.search(input);
frobnicate(input[0].ref);

with

var searchResults = searchIndex.search(input);
frobnicate(new ObjectId(input[0].ref));

This lets lunr use the string representation of the ObjectIds and creates new ObjectIds when retrieving search results from lunr.

@olivernn
Copy link
Owner

I've thought about this a bit more and I've changed my mind on how lunr should behave here. I think the right thing to do is to insist that document references are strings, and trying to add a document with a non string ref should be an error.

The reason for this is that in several places through out the code base the reference is used as a property in an object being used as a key value store. This means that the key must be a string (or at least convert nicely to a string) because Object keys are always strings.

I think it is then more consistent with the way lunr wants to see the documents to be indexed. It assumes that by the time you do idx.add(doc) it is already in the right format for your index, i.e. lunr doesn't do any conversion etc.

If you have any other ideas or suggestions then I'd like to hear them, and sorry it took so long to respond!

@kkirby
Copy link
Author

kkirby commented Sep 28, 2015

I think that's a fair idea, but I do think that in the instance that the ref is a number, lunr should treat is as a string. And for getting back the ref in the proper type, is there any reason my commit would not be acceptable?

@olivernn
Copy link
Owner

olivernn commented Oct 5, 2015

I tested your patch but the extra hasOwnProperty check and then property lookup did have a noticeable affect on the performance of the search. I don't think this is anything at fault with the patch, just the extra work it has to do had a non-negligible impact on the performance. This is what made me think that instead of having this overhead for all searches, rather do the check up front when adding documents to the index.

Lunr could continue as it is now by coercing the ref to a string, it is the least amount of change, i.e. none, but I'm currently leaning towards being more explicit with the requirement of string ref. It avoids any oddities where the object that is used as a ref gets coerced to a non-unique string.

Enforcing a string also is more inline with the way lunr treats the document being indexed, the idea being that before indexing the document it must be in the right format for lunr, i.e. lunr doesn't do any kind of conversion or munging of the document.

@kkirby
Copy link
Author

kkirby commented Oct 5, 2015

Oh, okay. I actually thought my implementation would be faster since it's using a for loop rather than a forEach call on a function for every entity. At any rate, the lookup can't be that costly, can it? Simply changing this:

Object.keys(self.tokenStore.get(key)).forEach(function (ref) { set.add(ref) })

To this:

Object.keys(self.tokenStore.get(key)).forEach(function (ref) { set.add(documents[ref].ref) })

Shouldn't be too much overhead on the CPU.

I actually ended up doing a benchmark on it and as it turns out, my method as referenced above (using the lookup) is somehow faster on my machine: http://jsperf.com/lookupvsnolookup I found it a bit hard to believe, so I also setup a test environment on my machine using benchmark.js using this source:

import Benchmark from '/Users/kkirbatski/node_modules/benchmark';
let obj, set = null;

function setup(){
    var obj = {};
    for(var i = 0; i < 100; i++){
        obj[i] = {
            ref: i
        };
    }
    var set = {
        items: [],
        add: function(item){
            this.items.push(item);
        }
    }
}

let suite = new Benchmark.Suite('abc');
suite.add(
    'noLookup',
    function(){
        Object.keys(obj).forEach(function(ref){
            console.log(ref)
            set.add(ref);
        });
    },
    {setup}
).add(
    'lookup',
    function(){
        Object.keys(obj).forEach(function(ref){
            console.log(ref)
            set.add(obj[ref].ref);
        });
    },
    {setup}
).on('complete',function(){
    console.log('complete',this.filter('fastest').pluck('name'));
}).run();

So if I'm understanding the results correctly, if we used my method above, then we would have the best of both worlds. Retaining the reference key type, and increasing the speed of lunr. Though, I do find it very possible that I did something wrong in my tests so let me know your thoughts.

@olivernn
Copy link
Owner

After taking another look at this I don't know why I thought there was a performance issue. Anyway, I've pushed a fix for this in version 0.6.0 that preserves the type of the ref property in the results. The only caveat is that the ref type must be comparable, this is because of the way the ref is used in lunr.SortedSet.

Anyway, let me know if you have any issues with this, and thanks again for the patch and the issue, sorry it took so long to get a fix out for this!

@kkirby
Copy link
Author

kkirby commented Oct 26, 2015

No worries, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants