Tokenizer returns an empty string #178

seongjaelee · 2015-09-21T03:19:27Z

When the given text is an empty string, it should return an empty array. Instead, it returns an array with an empty string.

This gives an error when searching if 1) lunr pipeline is off and 2) doc[field.name] is an empty string.

this._fields.forEach(function (field) {
    var fieldTokens = this.pipeline.run(lunr.tokenizer(doc[field.name]))
    console.log(fieldTokens); // when body is empty, it gives [""], not [].

    docTokens[field.name] = fieldTokens
    lunr.SortedSet.prototype.add.apply(allDocumentTokens, fieldTokens)
  }, this)

olivernn · 2015-09-28T20:13:18Z

It looks like that lunr.trimmer contains the logic that removes empty tokens. My guess is that the lunr.trimmer pipeline function gets removed due to non English use cases?

It seems that the tokenizer is probably the right place for this check, since it will always be performed while the trimmer may not.

On a slightly related note, it seems weird now looking at this that the tokenizer and pipeline are different objects. From what I remember they always are used together, in which case having them separate might not be the right split.

If you can put together a PR that moves the check for empty tokens from lunr.trimmer to lunr.tokenizer I think we can fix this for now, and I'll have a think about merging the pipeline and tokenzier.

olivernn · 2015-09-28T20:14:24Z

Doh, I see you already did open a PR will take a look.

olivernn · 2015-10-26T18:00:40Z

I've now pushed a fix for this in the version 0.6.0, please take a look and let me know if you are still seeing issues.

Lower version is causing nvatom unusable: seongjaelee/nvatom#32 The bug fix is here: olivernn/lunr.js#178

seongjaelee mentioned this issue Sep 21, 2015

Uncaught TypeError in lunr.js seongjaelee/nvatom#32

Closed

olivernn closed this as completed Oct 26, 2015

seongjaelee added a commit to seongjaelee/docquery that referenced this issue Apr 22, 2016

Update lunr version to 0.6.0

a238655

Lower version is causing nvatom unusable: seongjaelee/nvatom#32 The bug fix is here: olivernn/lunr.js#178

seongjaelee mentioned this issue Apr 22, 2016

Update lunr version to 0.6.0 jonmagic/docquery#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizer returns an empty string #178

Tokenizer returns an empty string #178

seongjaelee commented Sep 21, 2015

olivernn commented Sep 28, 2015

olivernn commented Sep 28, 2015

olivernn commented Oct 26, 2015

Tokenizer returns an empty string #178

Tokenizer returns an empty string #178

Comments

seongjaelee commented Sep 21, 2015

olivernn commented Sep 28, 2015

olivernn commented Sep 28, 2015

olivernn commented Oct 26, 2015