-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dogfooding OpenSearch (documentation search) #696
Comments
I'd actually say we should make this a separate repo from project-website. [for those who don't know, I maintain that repo] Just from a standpoint of contribution and commit control it seems weird to merge a Jekyll site (mostly markdown, html templates and yaml) that emits flat files with something to be deployed and ran actively. Maybe we can actually make this something that is useful beyond the project-website? Here is what I would propose (feel free to disagree):
My gut says the latter would be best using something serverless - maybe AWS Lambda (easy, not OSS) or OpenWhisk (harder, OSS). I have misgivings about running OpenSearch directly accessible to the wide world, especially if this is supposed to be a reference implementation. |
Right now, we have beta version of the docs available on https://docs-beta.opensearch.org . And I think it is mostly ported from opendistro.github.io with OpenSearch branding. Are we going to keep evolving the same docs over time? Or use the documentation that came with the fork. What about javadocs? Do we also want to make it accessible and searchable on opensearch.org or a subdomain? How are we currently generating the documentation on https://docs-beta.opensearch.org? I agree that all the code related to this reference implementation be in a separate repo. (build scripts, indexing scripts. cluster configuration etc)
@stockholmux I am not very clear about this Jekyll plugin, can you elaborate more on this. As for the security of the cluster, definitely a middleware is needed - I am more inclined towards a combination of API Gateway powered by Lambda and OpenSearch. I plan to create a more detailed proposal for the same, once we have most of the above questions answered. |
@abbashus You'll need to build a Jekyll plugin or some other script to manage sending data to the OpenSearch index every time the website is built. This script will also need to remove previous the index or mange it otherwise. If you make a proper Jekyll plugin, this can be generalized to anyone else using Jekyll. I would start with looking at Jekyll Generator. Perhaps a generator plugin could produce a single _bulk body? |
True, we would need a script to index docs when website is built. Other option I am thinking of is using a custom script with Github Actions that triggers on merge to main branch. I would explore both the options. |
Github Pages does not allow to run custom Jekyll plugins. |
Note that we do an actual build for the website, then publish it, so custom plugins will work. This is only relevant for gh-pages branches that get automatically built and deployed by GitHub. |
Since we want to index documentation and the documentation-website is currently built by Github pages (thus no custom Jekyll plugin) we need a custom script that runs after the HTML files are generated. |
@abbashus I believe that documentation is getting away from GH Pages as soon as possible. |
I think the thing you might also be missing is the advantages of frontmatters in indexing. If we're only using the HTML, we're missing out on some very important optimizations. That way the authors of individual pages can specify keywords or other metadata without them being rendered in the end HTML |
@stockholmux What is way forward for documentation? Will the documentation be co-hosted with project-website and therefore built on different infra? Given that - yes, we could use a Jekyll plugin.
Yes, indexing the frontmatter will help in search relevance (for ex: tags), but I need to research more if we miss anything indexing raw content (markdown files) only and not HTML. May be index both markdown and HTML - something I need to explore more. |
@abbashus That's the rough plan as far as I know. I know the documentation team hadn't settled on everything yet. Something else we may want to consider as far as a dogfooding exercise is search of the Discourse (fourms). A single universal search would be really keen. |
@abbashus I think I've found a hypersimple way to get the site content into OpenSearch. There is a way for Jekyll to output all content of the site into a single page. I can create a template that is in |
Do we want to close this or split it into some action items? |
Hi @abbashus , can you close this issue? since you already have an RFC submitted as separate issue. |
Is your feature request related to a problem? Please describe.
We want to make sure we're dogfooding OpenSearch even before we get the general release stage. As it happens, we have a large amount of documentation for the product that we'd like to be able to search. What I'm thinking is that we'd build a search function on our OpenSearch.org website using OpenSearch.
Describe the solution you'd like
The primary criteria of this solution is that is uses OpenSearch. There are a couple of decisions we need to make (like do you want to use a standalone version of OpenSearch or an AWS managed service?) but overall I'm pretty flexible on the implementation. However we put it together, we should to it in such a way that it can be a reference implementation.
Describe alternatives you've considered
There is nothing else on the planet I'd use for search ;)
Eventually this will likely move over to the project website.
The text was updated successfully, but these errors were encountered: