Ch. 06 schema.xml includes solr.PatternReplaceCharFilterFactory which breaks Tokenizer #4

nycjv321 · 2015-11-17T03:54:20Z

I've noticed that if I use the provided PatternReplaceCharFilterFactory config in chapter 6, then input strings aren't properly tokenized.

For example:

<!-- other config -->
<fieldType name="text_microblog" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory"
                pattern="([a-zA-Z])\1+"
                replacement="$1$1"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>

    <filter class="solr.WordDelimiterFilterFactory" 
    <!-- other config -->

with the request: http://localhost:8983/solr/#/example_collection/analysis?analysis.fieldvalue=1%202&analysis.fieldtype=text_microblog&verbose_output=1. In the response, SOLR does not properly tokenize the string "1 2" to a collection of characters "1" and "2". This is not how it is described in the book.

If I remove the PatternReplaceCharFilterFactory like so:

<!-- other config -->
<fieldType name="text_microblog" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>

    <filter class="solr.WordDelimiterFilterFactory" 
    <!-- other config -->

and reboot the SOLR instance, the query above produces the correct response where the string "1 2" is properly recognized as two characters "1" and "2".

Is this expected?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ch. 06 schema.xml includes solr.PatternReplaceCharFilterFactory which breaks Tokenizer #4

Ch. 06 schema.xml includes solr.PatternReplaceCharFilterFactory which breaks Tokenizer #4

nycjv321 commented Nov 17, 2015

Ch. 06 schema.xml includes solr.PatternReplaceCharFilterFactory which breaks Tokenizer #4

Ch. 06 schema.xml includes solr.PatternReplaceCharFilterFactory which breaks Tokenizer #4

Comments

nycjv321 commented Nov 17, 2015