Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer returns an empty string #178

Closed
seongjaelee opened this issue Sep 21, 2015 · 3 comments
Closed

Tokenizer returns an empty string #178

seongjaelee opened this issue Sep 21, 2015 · 3 comments

Comments

@seongjaelee
Copy link

When the given text is an empty string, it should return an empty array. Instead, it returns an array with an empty string.

This gives an error when searching if 1) lunr pipeline is off and 2) doc[field.name] is an empty string.

this._fields.forEach(function (field) {
    var fieldTokens = this.pipeline.run(lunr.tokenizer(doc[field.name]))
    console.log(fieldTokens); // when body is empty, it gives [""], not [].

    docTokens[field.name] = fieldTokens
    lunr.SortedSet.prototype.add.apply(allDocumentTokens, fieldTokens)
  }, this)
@olivernn
Copy link
Owner

It looks like that lunr.trimmer contains the logic that removes empty tokens. My guess is that the lunr.trimmer pipeline function gets removed due to non English use cases?

It seems that the tokenizer is probably the right place for this check, since it will always be performed while the trimmer may not.

On a slightly related note, it seems weird now looking at this that the tokenizer and pipeline are different objects. From what I remember they always are used together, in which case having them separate might not be the right split.

If you can put together a PR that moves the check for empty tokens from lunr.trimmer to lunr.tokenizer I think we can fix this for now, and I'll have a think about merging the pipeline and tokenzier.

@olivernn
Copy link
Owner

Doh, I see you already did open a PR will take a look.

@olivernn
Copy link
Owner

I've now pushed a fix for this in the version 0.6.0, please take a look and let me know if you are still seeing issues.

seongjaelee added a commit to seongjaelee/docquery that referenced this issue Apr 22, 2016
Lower version is causing nvatom unusable: seongjaelee/nvatom#32
The bug fix is here: olivernn/lunr.js#178
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants