- Lucene and hence ElasticSearch break strings into terms using built-in or custom tokenizers
- Some strings don't make sense to tokenize e.g. uuid or guid often used as equivalent of a primary key and/or unique identifier
- not_analyzed: ElasticSearch mapping option to suppress tokenization:
curl -XPUT 'localhost:9200/orders/orders/_mapping?pretty=true' \
-H 'Content-Type: application/json' \
-d '
{
"orders": {
"properties": {
"id": {
"type": "text",
"index": false
}
}
}
}'
- What you think difference will be searching or aggregating tokenized uuid/guid vs. non-tokenized uuid/guid property?
- What if I need both tokenized and non-tokenized option for the same field?
- Multi-Fields Mapping allows double indexing the same data:
curl -XPUT 'localhost:9200/ordering/orders/_mapping?pretty=true' -d '
{
"orders":{
"properties": {
"streetName": {
"type":"text",
"fields": {
"notparsed": {
"type":"keyword",
"index":"not_analyzed"
}
}
}
}
}
}'