NEW: See elasticstore fulfilling requests using firebase. [repo]
A pluggable integration with ElasticSearch to provide advanced content searches in Firestore.
This script can:
- monitor multiple firestore
collections
andsubcollections
and add/modify/remove indexed elasticsearch data in real time transform
,filter
,include
,exclude
andmapping
functionality for each document- communicates with client completely via Firebase (no elasticsearch client required, though a query builder is recommended)
- clean up old requests
Heavily Inspired by the Realtime Database implementation (Flashlight) by the Firebase Team
NOTE
For large firebase datasets, particularly when initially starting the script, a queuing setup has been put in place to prevent elasticsearch from responding with 429
requests. The provided QUEUE_CONCURRENT
and QUEUE_DELAY
config options are in place to mitigate that issue. Please adjust them according to your hardware.
- Install and run Elasticsearch either locally or with a service
git clone https://github.com/acupofjose/elasticstore
npm install
- Supply
.env
with variables OR define them in your environment (see supplied.env.sample
) - Edit
src/references.ts
to include your configuration (see below) npm run start
Option | Type | Parameters | Return | Description |
---|---|---|---|---|
collection |
string |
n/a | n/a | Represents a single collection in firestore |
subcollection |
string |
n/a | n/a | Represents a single subcollection of a document in firestore |
index |
string or function |
snap:Firestore.DocumentSnapshot , parentSnap:Firestore.DocumentSnapshot |
string |
Used by elasticsearch, index records will be placed under |
mappings |
object |
n/a | n/a | Used by elasticsearch, this should be an object containing {fieldName : {type : ELASTICSEARCH_FIELD_TYPE }} |
include |
Array<string> |
n/a | string[] |
Fields from firestore to be included in records passed to elasticsearch` |
exclude |
Array<string> |
n/a | string[] |
Fields from firestore to be excluded in records passed to elasticsearch` |
builder |
function |
Firestore.CollectionReference |
query | Builds the collection query that firestore will bind to and insert records to elasticsearch from |
subBuilder |
function |
Firestore.CollectionReference |
query | Builds the subcollection query that firestore will bind to and insert records to elasticsearch from |
filter |
function |
Firestore.DocumentData |
boolean | Run on an individual firestore record, if it returns false, the record will not be inserted |
transform |
function |
data: {[key: string]: any} , parentSnap:Firestore.DocumentSnapshot |
object | Transform data recieved from firestore to an object passed along to elasticsearch (run after filtering) |
onItemUpserted |
function or Promise |
data: {[key: string]: any} , parentSnap:Firestore.DocumentSnapshot , client: Elasticsearch.Client |
void | Callback after an item has been upserted to Elasticsearch |
So for instance, maybe I want to index a collection called groups
that does a tranform
ation on the data received from firestore, and maps a firestore geopoint
to an elasticsearch geo_point
// firestore (in the console)
groups: {
12341235: {
title: "Group Name", // string
description: "I'm a group", // string
location: "32,-74" // geo_point
createdAt: "9/4/2018 00:00:00 GMT-0" // date
}
}
// references (in ./src/references.ts)
{
....
{
collection: "groups",
index: "groups",
mappings: {
location: {
type: "geo_point" // elasticsearch's definition of a geopoint
}
},
transform: (data, parent) => ({
...data,
location: `${data.location._latitude},${data.location._longitude}` // transform from firestore's geopoint to elasticsearch's
})
},
....
}
When this repo was first made, Firestore had some limitations on how collections worked. You could have a root collection and every root Collection could have a Subcollection. Subcollections could not not have Subcollections.
This repo supports the use of subcollections with the note that this is expensive. Why? Because when Firestore returns a Collection Query it does not return the Subcollection's data, meaning, to listen to changes on a Subcollection requires a listener per Subcollection instance.
Have a need for a Subcollection from 400 Collection instances? That's 400 listeners. AFAIK there's not a better way to do this given Firebase's current API.
So with those caveats, the repo allows you to specify some ways to narrow how many listeners you register by narrowing your queries.
For Example
// Firestore Data Model
{
// Collection
"users" : {
"UUID-1" : {
email: "[email protected]",
isPremium: true,
// Subcollection
"profile": {
"firstName": "Romeo",
"public": false
....
}
}
"UUID-2" : {
email: "juliet@example.com",
isPremium: false,
// Subcollection
"profile": {
"firstName": "Juliet"
"public": true,
...
}
}
}
To listen to the profile
Subcollection on user
where the profile
is public , you'd create a reference like this:
{
collection: "users",
subcollection: "profile",
index: "user-profiles",
subBuilder: (ref) => ref.where('public', '==', true)
}
Or only profile
s where the user isPremium
and public (note that the index and type are changed, but that the change is arbitrary):
{
collection: "users",
subcollection: "profile",
index: "user-premium-profiles",
builder: (ref) => ref.where('isPremium', '==', false),
subBuilder: (ref) => ref.where('public', '==', true)
}
Elasticstore will listen to the search/
root collection for a new document containing a request
object key and a null
response
object key. Upon finding a request that is 'unfulfilled' (a null
response).
* Note that these keys are defined in the .env
file
Requests should be formed as new documents in the search collection.
Assuming you're using the node.js Firebase SDK, making an Elasticsearch request through Firebase would look something like this:
const result = await firebase
.firestore()
.collection("search")
.add({
request: {
index: "users",
q: "John", // Shorthand query syntax
},
response: null,
})
result.ref.onSnapshot((doc) => {
if (doc.response !== null) {
// Do things
}
})
Or with the normally expected Elasticsearch syntax body:
const result = await firebase.firestore().collection('search').add({
request: {
index: 'users',
body: {
query: {
match: {
"_all": "John"
}
}
},
response: null
})
result.ref.onSnapshot(doc => {
if (doc.response !== null) {
// Do things
}
})
Be aware that on large collection
s, this will need some tuning. Upon starting (and restarting) ALL data is re-indexed unless you choose to filter it yourself. This is a VERY expensive operation, as you will have to perform reads on every document you have in your collection
.
When dealing with subcollections, a listener is added for each collection
which then adds a listener to the specified subcollection
. If you don't filter these, you may end up with a large number of listeners for data that doesn't get changed very often.
Thanks goes to these wonderful people (emoji key):
Ruslan Petrov 💻 |
StoryStar 🐛 |
Y 🐛 |
Teju Nareddy 💻 |
This project follows the all-contributors specification. Contributions of any kind welcome!