Build or update Elastic indices from Amphora or arbitrary data.
Cli installation:
git clone https://github.com/nymag/clay-reindexer
cd clay-reindexer
npm install -g
The clayReindex
command will now be available.
Clay-reindex is a command line and programmatic utility for building or updating Elastic documents in bulk. It:
- Reads data from a source, e.g. a list of page URIs
- Transforms each datum into an Elastic doc using built-in or user-defined transform functions
- Inserts each resulting document into a specified Elastic index
Pass every line in my-uris.txt
into each transform
function and PUT the merged results into elastic index foo
at elastic host http://localhost:9200
:
clayReindex --elasticHost http://localhost:9200 --elasticIndex foo --transforms mytransforms < my-uris.txt
Populates the local foo
index with all pages in all sites, using built-in logic to infer some page document properties from Amphora data:
clayReindex pages --amphoraHost http://localhost:3001 --elasticHost http://localhost:9200 --elasticIndex foo --handlers myhandlers --transforms mytransforms
Do the same thing but only process the URIs inside my-uris.txt
:
clayReindex pages --amphoraHost http://localhost:3001 --elasticHost http://localhost:9200 --elasticIndex zar --handlers myhandlers --transforms mytransforms < my-uris.txt
When no subcommand is specified, clayReindex simply processes data from stdin
, transforms it, and upserts it into the specified index.
- batch: Max number of documents to PUT into Elastic in one request.
- elasticHost: String. Required. URL to Elastic Host root, e.g.
http://localhost:9200
. - elasticIndex: String. Required. Name of index to store new page docs.
- elasticPrefix: String. Optional. Name of the prefix of your Elastic indices.
- limit: Number. Optional. Limit the number of pages processed per site.
- prefix: String. Required. Clay IP or domain of any of its sites.
- transforms: String. Optional. Path to directory containing transforms. See "Transforms" below.
- verbose: Boolean. Optional. Log all HTTP requests.
The pages subcommand makes it easier to re-index the built-in pages
index provided by amphora-search. Using the subcommand specifies clayReindex
in two ways:
- It automatically process all page URIs, unless it detects data in
stdin
. - It automatically generates partial documents using built-in logic that applies before user-specified transforms. This logic generates the following page document properties:
published
:true
if published version of page existspublishTime
: inferred fromlastModified
of published pageurl
: inferred fromurl
of published pagescheduled
: inferred from presence of page in site schedulescheduledTime
: inferred from site schedulesiteSlug
: inferred from site slug as it apperas in thesites
index
In addition to the generic options described above, the pages
subcommand provides these options:
amphoraHost
: String. Required. URL from which to retrieve Amphora data. The command automatically appends site paths andx-forwarded-host
headers, so this could be the IP of your Clay server or simply the domain of any of your Clay sites.handlers
String. Optional. Path to directory with handler functions. See "Handlers" section below.limit
Number. Optional. Limit number of URIs in input that are processed.
Handlers allow you to populate fields of a page's Elastic doc based on components within the page.
Each file in the handlers folder:
- Should have a name matching a component name, e.g.
clay-paragraph.js
. - Should export a function that returns, streams, or resolves an object. This object will be merged into the Elastic document.
Each handler function has this signature:
ref
: String. Uri of component instancedata
: Object. Component instance data (does not have_ref
)context
: Object. See "Context Object."
Handlers are applied after custom transforms. The order of handler processing is not guaranteed.
Transforms allow you to describe your own logic for populating the fields of a page's Elastic document.
Each file in the transforms folder should export a function that returns, streams, or resolves an object.
Each transform function has this signature:
uri
: This is the input of the reindexing process.doc
: Object. The Elastic doc generated so far.
Note: The order of transform processing is not guaranteed.