You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a collection of following... let say "data structures":
{
"id": "abcd123455",
"title": "Some title",
"body": "Contents of the blog post..."
},
{
"id": "xyz986724",
"title": "Another great title",
"body": "another contents..."
}
These "data structures" are in my database, so I can export them in any format (HTML, text, JSON, YML...)
There are about 200-500 "data structures" per search index. They all have an unique ID (and this ID is not URL). The "body" is about one or two screens big. On the backend I have complete control, so can generate what is necessary, run the stork build command etc...
At the moment the search functionality on my site is implemented with the help of lunrjs (see lunrjs.com/guides/core_concepts.html). I am thinking about migration to Stork.
But... reading Stork documentation I get the impression that Stork is designed to index... let say 5-10 big (HTML?) pages.
So, the question is: how to use Stork to handle a list (collection? array?) of 200-500 documents?
I mean - how to use Stork in the "lunrjs scenario"?
(The first idea that comes into my mind is to generate the *.toml config file with one [[input.files]] entry for each document/"data structure" - and to put the documents into a separate file each (200-500 files!). Probably an overkill, I do not think Stork was designed for this)
The text was updated successfully, but these errors were encountered:
I use stork to index the text content of about 11,000 web pages and use your first idea of doing some "pre-processing" to create a .toml config that contains everything I want to be indexed. I have a PHP script that scans the relevant HTML files and produces something like this:
[input]
frontmatter_handling = "Omit"
stemming = "None"
minimum_indexed_substring_length = 4
files = [
{ url = "/gallery/000001", title = "Petre (from Boutell's Heraldry)", contents = "this shield was used by boutell as the primary [snip....] ", filetype="PlainText" },
{ url = "/gallery/000002", title = "Boyd Garrison", contents = "shield device of [snip..] ", filetype="PlainText" },
{ url = "/gallery/000003", title = "Example of Varying Edge Types", contents = "this example demonstrates [snip..]. ", filetype="PlainText" },
[snip 11,000 additional entries]
]
This has the advantage that I can also "pre-scan" the input to take out any terms that I don't want to have included in the index,
I have a collection of following... let say "data structures":
These "data structures" are in my database, so I can export them in any format (HTML, text, JSON, YML...)
There are about 200-500 "data structures" per search index. They all have an unique ID (and this ID is not URL). The "body" is about one or two screens big. On the backend I have complete control, so can generate what is necessary, run the
stork build
command etc...At the moment the search functionality on my site is implemented with the help of lunrjs (see lunrjs.com/guides/core_concepts.html). I am thinking about migration to Stork.
But... reading Stork documentation I get the impression that Stork is designed to index... let say 5-10 big (HTML?) pages.
So, the question is: how to use Stork to handle a list (collection? array?) of 200-500 documents?
I mean - how to use Stork in the "lunrjs scenario"?
(The first idea that comes into my mind is to generate the *.toml config file with one [[input.files]] entry for each document/"data structure" - and to put the documents into a separate file each (200-500 files!). Probably an overkill, I do not think Stork was designed for this)
The text was updated successfully, but these errors were encountered: