Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ch. 06 schema.xml includes solr.PatternReplaceCharFilterFactory which breaks Tokenizer #4

Open
nycjv321 opened this issue Nov 17, 2015 · 0 comments

Comments

@nycjv321
Copy link

I've noticed that if I use the provided PatternReplaceCharFilterFactory config in chapter 6, then input strings aren't properly tokenized.

For example:

<!-- other config -->
<fieldType name="text_microblog" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.PatternReplaceCharFilterFactory"
                pattern="([a-zA-Z])\1+"
                replacement="$1$1"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>

    <filter class="solr.WordDelimiterFilterFactory" 
    <!-- other config -->

with the request: http://localhost:8983/solr/#/example_collection/analysis?analysis.fieldvalue=1%202&analysis.fieldtype=text_microblog&verbose_output=1. In the response, SOLR does not properly tokenize the string "1 2" to a collection of characters "1" and "2". This is not how it is described in the book.

If I remove the PatternReplaceCharFilterFactory like so:

<!-- other config -->
<fieldType name="text_microblog" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>

    <filter class="solr.WordDelimiterFilterFactory" 
    <!-- other config -->

and reboot the SOLR instance, the query above produces the correct response where the string "1 2" is properly recognized as two characters "1" and "2".

Is this expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant