Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dogfooding OpenSearch (documentation search) #696

Closed
CEHENKLE opened this issue May 13, 2021 · 14 comments
Closed

Dogfooding OpenSearch (documentation search) #696

CEHENKLE opened this issue May 13, 2021 · 14 comments
Labels
enhancement Enhancement or improvement to existing feature or request

Comments

@CEHENKLE
Copy link
Member

Is your feature request related to a problem? Please describe.
We want to make sure we're dogfooding OpenSearch even before we get the general release stage. As it happens, we have a large amount of documentation for the product that we'd like to be able to search. What I'm thinking is that we'd build a search function on our OpenSearch.org website using OpenSearch.

Describe the solution you'd like
The primary criteria of this solution is that is uses OpenSearch. There are a couple of decisions we need to make (like do you want to use a standalone version of OpenSearch or an AWS managed service?) but overall I'm pretty flexible on the implementation. However we put it together, we should to it in such a way that it can be a reference implementation.

Describe alternatives you've considered
There is nothing else on the planet I'd use for search ;)

Eventually this will likely move over to the project website.

@CEHENKLE CEHENKLE added the enhancement Enhancement or improvement to existing feature or request label May 13, 2021
@stockholmux
Copy link
Member

I'd actually say we should make this a separate repo from project-website. [for those who don't know, I maintain that repo]

Just from a standpoint of contribution and commit control it seems weird to merge a Jekyll site (mostly markdown, html templates and yaml) that emits flat files with something to be deployed and ran actively. Maybe we can actually make this something that is useful beyond the project-website?

Here is what I would propose (feel free to disagree):

  • A Jekyll plugin that sends the content to OpenSearch at site build time.
  • An application with a REST interface that can be accessible client slide JS + any configuration needed to run OpenSearch and OpenSearch Dashboards.

My gut says the latter would be best using something serverless - maybe AWS Lambda (easy, not OSS) or OpenWhisk (harder, OSS). I have misgivings about running OpenSearch directly accessible to the wide world, especially if this is supposed to be a reference implementation.

@abbashus
Copy link
Contributor

abbashus commented Jun 1, 2021

Right now, we have beta version of the docs available on https://docs-beta.opensearch.org . And I think it is mostly ported from opendistro.github.io with OpenSearch branding. Are we going to keep evolving the same docs over time? Or use the documentation that came with the fork.

What about javadocs? Do we also want to make it accessible and searchable on opensearch.org or a subdomain?

How are we currently generating the documentation on https://docs-beta.opensearch.org?

I agree that all the code related to this reference implementation be in a separate repo. (build scripts, indexing scripts. cluster configuration etc)

A Jekyll plugin that sends the content to OpenSearch at site build time.

@stockholmux I am not very clear about this Jekyll plugin, can you elaborate more on this.

As for the security of the cluster, definitely a middleware is needed - I am more inclined towards a combination of API Gateway powered by Lambda and OpenSearch.

I plan to create a more detailed proposal for the same, once we have most of the above questions answered.

@stockholmux
Copy link
Member

@abbashus You'll need to build a Jekyll plugin or some other script to manage sending data to the OpenSearch index every time the website is built. This script will also need to remove previous the index or mange it otherwise.

If you make a proper Jekyll plugin, this can be generalized to anyone else using Jekyll.

I would start with looking at Jekyll Generator. Perhaps a generator plugin could produce a single _bulk body?

@abbashus
Copy link
Contributor

abbashus commented Jun 1, 2021

True, we would need a script to index docs when website is built. Other option I am thinking of is using a custom script with Github Actions that triggers on merge to main branch. I would explore both the options.

@abbashus
Copy link
Contributor

abbashus commented Jun 7, 2021

Github Pages does not allow to run custom Jekyll plugins.
https://jekyllrb.com/docs/plugins/installation/
jekyll/jekyll#5265

@dblock
Copy link
Member

dblock commented Jun 7, 2021

Github Pages does not allow to run custom Jekyll plugins.
https://jekyllrb.com/docs/plugins/installation/
jekyll/jekyll#5265

Note that we do an actual build for the website, then publish it, so custom plugins will work. This is only relevant for gh-pages branches that get automatically built and deployed by GitHub.

@abbashus
Copy link
Contributor

abbashus commented Jun 7, 2021

Since we want to index documentation and the documentation-website is currently built by Github pages (thus no custom Jekyll plugin) we need a custom script that runs after the HTML files are generated.
https://github.com/opensearch-project/documentation-website#how-we-build-the-website

@stockholmux
Copy link
Member

@abbashus I believe that documentation is getting away from GH Pages as soon as possible.

@stockholmux
Copy link
Member

I think the thing you might also be missing is the advantages of frontmatters in indexing. If we're only using the HTML, we're missing out on some very important optimizations. That way the authors of individual pages can specify keywords or other metadata without them being rendered in the end HTML

@abbashus
Copy link
Contributor

abbashus commented Jun 7, 2021

@stockholmux What is way forward for documentation? Will the documentation be co-hosted with project-website and therefore built on different infra? Given that - yes, we could use a Jekyll plugin.

That way the authors of individual pages can specify keywords or other metadata without them being rendered in the end HTML

Yes, indexing the frontmatter will help in search relevance (for ex: tags), but I need to research more if we miss anything indexing raw content (markdown files) only and not HTML. May be index both markdown and HTML - something I need to explore more.

@stockholmux
Copy link
Member

@abbashus That's the rough plan as far as I know. I know the documentation team hadn't settled on everything yet.

Something else we may want to consider as far as a dogfooding exercise is search of the Discourse (fourms). A single universal search would be really keen.

@stockholmux
Copy link
Member

@abbashus I think I've found a hypersimple way to get the site content into OpenSearch.

There is a way for Jekyll to output all content of the site into a single page. I can create a template that is in _bulk format and then we can just feed this into OpenSearch via curl as part of the site deployment.

@dblock
Copy link
Member

dblock commented Jul 16, 2021

Do we want to close this or split it into some action items?

@anasalkouz
Copy link
Member

Hi @abbashus , can you close this issue? since you already have an RFC submitted as separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

No branches or pull requests

5 participants