Persistence layer for Ruby domain objects in Elasticsearch, using the Repository pattern.
This library is compatible with Ruby 3.1 and higher.
The version numbers follow the Elasticsearch major versions. Currently the main
branch is compatible with version 8.x
of the Elasticsearch stack.
Rubygem | Elasticsearch | |
---|---|---|
0.1 | → | 1.x |
2.x | → | 2.x |
5.x | → | 5.x |
6.x | → | 6.x |
7.x | → | 7.x |
8.x | → | 8.x |
main | → | 8.x |
Install the package from Rubygems:
gem install elasticsearch-persistence
To use an unreleased version, either add it to your Gemfile
for Bundler:
gem 'elasticsearch-persistence', git: 'git://github.com/elastic/elasticsearch-rails.git', branch: '6.x'
or install it from a source code checkout:
git clone https://github.com/elastic/elasticsearch-rails.git
cd elasticsearch-rails/elasticsearch-persistence
bundle install
rake install
The library provides the Repository pattern for adding persistence to your Ruby objects.
The Elasticsearch::Persistence::Repository
module provides an implementation of the
repository pattern and allows
you to save, delete, find and search objects stored in Elasticsearch, as well as configure
mappings and settings for the index. It's an unobtrusive and decoupled way of adding
persistence to your Ruby objects.
Let's have a simple plain old Ruby object (PORO):
class Note
attr_reader :attributes
def initialize(attributes={})
@attributes = attributes
end
def to_hash
@attributes
end
end
Let's create a default, "dumb" repository, as a first step:
require 'elasticsearch/persistence'
class MyRepository; include Elasticsearch::Persistence::Repository; end
repository = MyRepository.new
We can save a Note
instance into the repository...
note = Note.new id: 1, text: 'Test'
repository.save(note)
# PUT http://localhost:9200/repository/_doc/1 [status:201, request:0.210s, query:n/a]
# > {"id":1,"text":"Test"}
# < {"_index":"repository","_id":"1","_version":1,"created":true}
...find it...
n = repository.find(1)
# GET http://localhost:9200/repository/_doc/1 [status:200, request:0.003s, query:n/a]
# < {"_index":"repository","_id":"1","_version":2,"found":true, "_source" : {"id":1,"text":"Test"}}
=> <Note:0x007fcbfc0c4980 @attributes={"id"=>1, "text"=>"Test"}>
...search for it...
repository.search(query: { match: { text: 'test' } }).first
# GET http://localhost:9200/repository/_search [status:200, request:0.005s, query:0.002s]
# > {"query":{"match":{"text":"test"}}}
# < {"took":2, ... "hits":{"total":1, ... "hits":[{ ... "_source" : {"id":1,"text":"Test"}}]}}
=> <Note:0x007fcbfc1c7b70 @attributes={"id"=>1, "text"=>"Test"}>
...or delete it:
repository.delete(note)
# DELETE http://localhost:9200/repository/_doc/1 [status:200, request:0.014s, query:n/a]
# < {"found":true,"_index":"repository","_id":"1","_version":3}
=> {"found"=>true, "_index"=>"repository", "_id"=>"1", "_version"=>2}
The repository module provides a number of features and facilities to configure and customize the behavior:
- Configuring the Elasticsearch client being used
- Setting the index name, and object class for deserialization
- Composing mappings and settings for the index
- Creating, deleting or refreshing the index
- Finding or searching for documents
- Providing access both to domain objects and hits for search results
- Providing access to the Elasticsearch response for search results (aggregations, total, ...)
- Defining the methods for serialization and deserialization
There are two mixins you can include in your Repository class. The first Elasticsearch::Persistence::Repository
,
provides the basic methods and settings you'll need. The second, Elasticsearch::Persistence::Repository::DSL
adds
some additional class methods that allow you to set options that instances of the class will share.
For simple cases, you can just include the Elasticsearch::Persistence::Repository mixin to your class:
class MyRepository
include Elasticsearch::Persistence::Repository
# Customize the serialization logic
def serialize(document)
super.merge(my_special_key: 'my_special_stuff')
end
# Customize the de-serialization logic
def deserialize(document)
puts "# ***** CUSTOM DESERIALIZE LOGIC... *****"
super
end
end
client = Elasticsearch::Client.new(url: ENV['ELASTICSEARCH_URL'], log: true)
repository = MyRepository.new(client: client, index_name: :my_notes, klass: Note)
repository.settings number_of_shards: 1 do
mapping do
indexes :text, analyzer: 'snowball'
end
end
The custom Elasticsearch client will be used now, with a custom index, as well as the custom serialization and de-serialization logic.
We can create the index with the desired settings and mappings:
repository.create_index! force: true
# PUT http://localhost:9200/my_notes
# > {"settings":{"number_of_shards":1},"mappings":{ ... {"text":{"analyzer":"snowball","type":"string"}}}}}
Save the document with extra properties added by the serialize
method:
repository.save(note)
# PUT http://localhost:9200/my_notes/note/1
# > {"id":1,"text":"Test","my_special_key":"my_special_stuff"}
{"_index"=>"my_notes", "_id"=>"1", "_version"=>4, ... }
And deserialize
it:
repository.find(1)
# ***** CUSTOM DESERIALIZE LOGIC... *****
<Note:0x007f9bd782b7a0 @attributes={... "my_special_key"=>"my_special_stuff"}>
In some cases, you'll want to set some of the repository configurations at the class level. This makes most sense when the instances of the repository will use that same configuration:
require 'base64'
class NoteRepository
include Elasticsearch::Persistence::Repository
include Elasticsearch::Persistence::Repository::DSL
index_name 'notes'
klass Note
settings number_of_shards: 1 do
mapping do
indexes :text, analyzer: 'snowball'
# Do not index images
indexes :image, index: false
end
end
# Base64 encode the "image" field in the document
#
def serialize(document)
hash = document.to_hash.clone
hash['image'] = Base64.encode64(hash['image']) if hash['image']
hash.to_hash
end
# Base64 decode the "image" field in the document
#
def deserialize(document)
hash = document['_source']
hash['image'] = Base64.decode64(hash['image']) if hash['image']
klass.new hash
end
end
You can create an instance of this custom class and get each of the configurations.
client = Elasticsearch::Client.new(url: 'http://localhost:9200', log: true)
repository = NoteRepository.new(client: client)
repository.index_name
# => 'notes'
You can also override the default configuration with options passed to the initialize method:
client = Elasticsearch::Client.new(url: 'http://localhost:9250', log: true)
client.transport.logger.formatter = proc { |s, d, p, m| "\e[2m# #{m}\n\e[0m" }
repository = NoteRepository.new(client: client, index_name: 'notes_development')
repository.create_index!(force: true)
note = Note.new('id' => 1, 'text' => 'Document with image', 'image' => '... BINARY DATA ...')
repository.save(note)
# PUT http://localhost:9200/notes_development/_doc/1
# > {"id":1,"text":"Document with image","image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}
puts repository.find(1).attributes['image']
# GET http://localhost:9200/notes_development/_doc/1
# < {... "_source" : { ... "image":"Li4uIEJJTkFSWSBEQVRBIC4uLg==\n"}}
# => ... BINARY DATA ...
Each of the following configurations can be set for a repository instance.
If you have included the Elasticsearch::Persistence::Repository::DSL
mixin, then you can use the class-level DSL
methods to set each value. You can still override the configuration for any instance by passing options to the
#initialize
method.
Even if you don't use the DSL mixin, you can set the instance configuration with options passed the #initialize
method.
The repository uses the standard Elasticsearch client.
client = Elasticsearch::Client.new(url: 'http://search.server.org')
repository = NoteRepository.new(client: client)
repository.client.transport.logger = Logger.new(STDERR)
repository.client
# => Elasticsearch::Client
or with the DSL mixin:
class NoteRepository
include Elasticsearch::Persistence::Repository
include Elasticsearch::Persistence::Repository::DSL
client Elasticsearch::Client.new url: 'http://search.server.org'
end
repository = NoteRepository.new
repository.client
# => Elasticsearch::Client
The index_name
method specifies the Elasticsearch index to use for storage, lookup and search. The default index name
is 'repository'.
repository = NoteRepository.new(index_name: 'notes_development')
repository.index_name
# => 'notes_development'
or with the DSL mixin:
class NoteRepository
include Elasticsearch::Persistence::Repository
include Elasticsearch::Persistence::Repository::DSL
index_name 'notes_development'
end
repository = NoteRepository.new
repository.index_name
# => 'notes_development'
The klass
method specifies the Ruby class name to use when initializing objects from
documents retrieved from the repository. If this value is not set, a Hash representation of the document will be returned instead.
repository = NoteRepository.new(klass: Note)
repository.klass
# => Note
or with the DSL mixin:
class NoteRepository
include Elasticsearch::Persistence::Repository
include Elasticsearch::Persistence::Repository::DSL
klass Note
end
repository = NoteRepository.new
repository.klass
# => Note
The settings
and mappings
methods, provided by the
elasticsearch-model
gem, allow you to configure the index properties:
repository.settings number_of_shards: 1
repository.settings.to_hash
# => {:number_of_shards=>1}
repository.mappings { indexes :title, analyzer: 'snowball' }
repository.mappings.to_hash
# => { :note => {:properties=> ... }}
or with the DSL mixin:
class NoteRepository
include Elasticsearch::Persistence::Repository
include Elasticsearch::Persistence::Repository::DSL
mappings { indexes :title, analyzer: 'snowball' }
settings number_of_shards: 1
end
repository = NoteRepository.new
You can also use the #create
method to instantiate and set the mappings and settings on an instance
with a block in one call:
repository = NoteRepository.create(index_name: 'notes_development') do
settings number_of_shards: 1, number_of_replicas: 0 do
mapping dynamic: 'strict' do
indexes :foo do
indexes :bar
end
indexes :baz
end
end
end
The convenience methods create_index!
, delete_index!
and refresh_index!
allow you to manage the index lifecycle.
These methods can only be called on repository instances and are not implemented at the class level.
The serialize
and deserialize
methods allow you to customize the serialization of the document when it
is persisted to Elasticsearch, and define the initialization procedure when loading it from the storage:
class NoteRepository
include Elasticsearch::Persistence::Repository
def serialize(document)
Hash[document.to_hash.map() { |k,v| v.upcase! if k == :title; [k,v] }]
end
def deserialize(document)
MyNote.new ActiveSupport::HashWithIndifferentAccess.new(document['_source']).deep_symbolize_keys
end
end
The save
method allows you to store a domain object in the repository:
note = Note.new id: 1, title: 'Quick Brown Fox'
repository.save(note)
# => {"_index"=>"notes_development", "_id"=>"1", "_version"=>1, "created"=>true}
The update
method allows you to perform a partial update of a document in the repository.
Use either a partial document:
repository.update id: 1, title: 'UPDATED', tags: []
# => {"_index"=>"notes_development", "_id"=>"1", "_version"=>2}
Or a script (optionally with parameters):
repository.update 1, script: 'if (!ctx._source.tags.contains(t)) { ctx._source.tags += t }', params: { t: 'foo' }
# => {"_index"=>"notes_development", "_id"=>"1", "_version"=>3}
The delete
method allows you to remove objects from the repository (pass either the object itself or its ID):
repository.delete(note)
repository.delete(1)
The find
method allows you to find one or many documents in the storage and returns them as deserialized Ruby objects:
repository.save Note.new(id: 2, title: 'Fast White Dog')
note = repository.find(1)
# => <MyNote ... QUICK BROWN FOX>
notes = repository.find(1, 2)
# => [<MyNote... QUICK BROWN FOX>, <MyNote ... FAST WHITE DOG>]
When the document with a specific ID isn't found, a nil
is returned instead of the deserialized object:
notes = repository.find(1, 3, 2)
# => [<MyNote ...>, nil, <MyNote ...>]
Handle the missing objects in the application code, or call compact
on the result.
The search
method is used to retrieve objects from the repository by a query string or definition in the Elasticsearch DSL:
repository.search('fox or dog').to_a
# GET http://localhost:9200/notes_development/_doc/_search?q=fox
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
repository.search(query: { match: { title: 'fox dog' } }).to_a
# GET http://localhost:9200/notes_development/_doc/_search
# > {"query":{"match":{"title":"fox dog"}}}
# => [<MyNote ... FOX ...>, <MyNote ... DOG ...>]
The returned object is an instance of the Elasticsearch::Persistence::Repository::Response::Results
class,
which provides access to the results, the full returned response and hits.
results = repository.search(query: { match: { title: 'fox dog' } })
# Iterate over the objects
#
results.each do |note|
puts "* #{note.attributes[:title]}"
end
# * QUICK BROWN FOX
# * FAST WHITE DOG
# Iterate over the objects and hits
#
results.each_with_hit do |note, hit|
puts "* #{note.attributes[:title]}, score: #{hit._score}"
end
# * QUICK BROWN FOX, score: 0.29930896
# * FAST WHITE DOG, score: 0.29930896
# Get total results
#
results.total
# => 2
# Access the raw response as a Hashie::Mash instance.
# Note that a Hashie::Mash will only be created if the 'response' method is called on the results.
results.response._shards.failed
# => 0
# Access the raw response
results.raw_response
# => {...}
An example Sinatra application is available in examples/notes/application.rb
,
and demonstrates a rich set of features:
- How to create and configure a custom repository class
- How to work with a plain Ruby class as the domain object
- How to integrate the repository with a Sinatra application
- How to write complex search definitions, including pagination, highlighting and aggregations
- How to use search results in the application view
The ActiveRecord pattern has been deprecated as of version 6.0.0 of this gem. Please use the Repository Pattern instead. For more information on migrating 5.x ActiveRecord-based applications to use the Repository Pattern, please see this blog post.
This software is licensed under the Apache 2 license, quoted below.
Licensed to Elasticsearch B.V. under one or more contributor
license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright
ownership. Elasticsearch B.V. licenses this file to you under
the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.