When using nuxeo-elasticsearch we want to be sure that the repository content is in sync with the content indexed in Elasticsearch.
This tool enables to detect difference between the Nuxeo database repository and the indexed content in Elasticsearch.
Download the nuxeo-esync-VERSION-capsule-full.jar
from https://maven.nuxeo.org.
Esync Version | Nuxeo Version | Elasticsearch version |
---|---|---|
1.1.X | 7.10 | 1.5.2 |
2.0.X | 8.10 | 2.3.5 |
3.0.X | 9.10 | 5.6.4 |
4.0.X | 10.10 | 6.5.3 |
From esync version 3 the Elasticsearch rest client is used instead of the transport client.
Create the all in one jar:
mvn package
The jar is located here:
./target/nuxeo-esync-VERSION-capsule-full.jar
Create a /etc/esync.conf
or ~/.esync.conf
using one of the samples provided :
esync-postgresql.conf.example
esync-mssql.conf.example
esync-mongodb.conf.example
You will need to configure the database and Elasticsearch access.
Refer to the source for the full list of options available.
Esync requires Java 8 to be run:
# using a default conf located in /etc/esync.conf or ~/.esync.conf
java -jar /path/to/nuxeo-esync-$VERSION-capsule-full.jar
# using an another config file
java -jar /path/to/nuxeo-esync-$VERSION-capsule-full.jar /path/to/config-file.conf
# customizing the log
java -Dlog4j.configuration=file:mylog4j.xml -jar nuxeo-esync-$VERSION-capsule-full.jar
You can find the default log4.xml here
default log file is in /tmp/trace.log
.
The tool runs concurrently different checkers.
Checkers compare the reference database aka expected with the Elasticsearch content aka actual. You should run a full re-index on Elasticsearch before running the tool.
Checkers report different things:
- Errors like a different number of documents, total or per document type
- Missing or spurious document types in Elasticsearch
- Missing documents ids in Elasticsearch
- Spurious documents ids in Elasticsearch
- Difference in document properties like ACL, path...
Here is a list of available checkers.
This is a quick check to count the total number of documents in the db and Elasticsearch. There are 4 document counts:
- documents without version and proxy
- version documents
- proxy documents
- orphan documents other than version
False positive cases:
- this does not garantee that we have the same documents indexed, just the same number.
False negative cases:
- some system documents are not indexed (like CommentRelation or PublicationRelation)
Checks the number of each document types for documents and versions
False positive cases:
- this does not guarantee that we have the same documents indexed, just the same number for a primary type.
False negative cases:
- some system documents are not indexed and reported as missing type
When there is a difference raise by the Type Cardinality checker the list of ids for this type is compared, to gives the missing and spurious document ids.
False positive cases: None False negative cases: None
It can takes time and memory to list all doc ids from the database.
It performs 2 checks:
- Checks that all documents that hold an ACL are well indexed in ES
- Checks that all documents in ES have a correct ACL
False positive cases:
- some ACL can be more permissive on ES
False negative cases:
- none
Nuxeo dramatically improves how content-based applications are built, managed and deployed, making customers more agile, innovative and successful. Nuxeo provides a next generation, enterprise ready platform for building traditional and cutting-edge content oriented applications. Combining a powerful application development environment with SaaS-based tools and a modular architecture, the Nuxeo Platform and Products provide clear business value to some of the most recognizable brands including Verizon, Electronic Arts, Netflix, Sharp, FICO, the U.S. Navy, and Boeing. Nuxeo is headquartered in New York and Paris. More information is available at www.nuxeo.com.