Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing Index Setups & Configuration in Microservices Environment #148

Open
rursprung opened this issue Oct 31, 2022 · 25 comments
Open

Managing Index Setups & Configuration in Microservices Environment #148

rursprung opened this issue Oct 31, 2022 · 25 comments
Labels
enhancement New feature or request

Comments

@rursprung
Copy link

rursprung commented Oct 31, 2022

Is your feature request related to a problem? Please describe.
when deploying OpenSearch as part of a larger application fleet in an environment (in our case: kubernetes) where any installation/update must be 100% hands-off (i.e. fully automated) and esp. when the connected applications are microservices (i.e. lots of them, various versions of the same in parallel due to canary upgrades or just in general rolling upgrades) it's very hard to actually set up the proper index structures & general settings on OpenSearch:

  • this cannot be done with the deployment of OpenSearch itself as it doesn't know anything about its consumers (it can only ship basic configuration like TLS, authentication, etc. but doesn't know about any indices, specific roles, etc.)
  • if the consumer application takes care of an update there's an issue if multiple replicas of the same application are running in parallel - they might all try to do the setup/update and then either block or break each other
  • this is both about managing indices as well as related settings (e.g. roles and role mappings)
  • 24/7 operations must be supported, i.e. scaling down everything, doing the upgrade, applying the config updates and then starting back up is not an option

Describe the solution you'd like
there should be a way for consumer applications to manage opensearch indices in a similar way as can be done with liquibase for RDBMS (SQL-based relational DBs). there it's possible to define upgrade scripts and liquibase then keeps track of what has already been applied and what hasn't (by storing that information in dedicated table(s) on the DB). this can be used both for DDL (data definition language; e.g. changing tables) as well as DML (data manipulation language; e.g. migrating data) and any mixture of the two (e.g. changing an existing table schema and migrating the data in the process).

Describe alternatives you've considered

  • manually triggering any action is not an option as the installation of any component (or upgrade of any component) - once started - must happen 100% automated
  • running the updates from outside (e.g. by the installation process) is not an option as it doesn't know when exactly which version of which application is booting up (this is handled by kubernetes)

Additional context

note: while this ticket has now been opened in the main OpenSearch repository i'm not sure whether the actual solution for this will be part of this repository. i could well imagine that the solution would be a dedicated application or an OpenSearch plugin.

@rursprung rursprung added enhancement New feature or request untriaged labels Oct 31, 2022
@dblock
Copy link
Member

dblock commented Nov 1, 2022

@rursprung Do you think https://github.com/opensearch-project/opensearch-devops is a better place for this issue?

@rursprung
Copy link
Author

@rursprung Do you think https://github.com/opensearch-project/opensearch-devops is a better place for this issue?

thanks for pointing to that repo! yes, it could well be that it's a better fit there - the question just is whether it'll garner as much interaction as it would here? but probably it'd make sense to move it.

i think this plays nicely into what i said in the meeting: there are so many repos, it's hard to keep track of where to open the ticket (and in the central ones there's so much going on that few people will check all new issues/PRs - i definitely don't have the time for that).

@dblock dblock transferred this issue from opensearch-project/OpenSearch Nov 2, 2022
@dblock
Copy link
Member

dblock commented Nov 2, 2022

I've moved it. I don't know what a solution is for better tracking progress. My own GitHub notifications are wild, this is what I do, 🤷

@rursprung
Copy link
Author

My own GitHub notifications are wild, this is what I do, 🤷

i went the opposite way: i've disabled all email notifications in GitHub and exclusively use the notifications page (which is a pinned tab for me in Firefox). it also offers some filter functionality, though i usually don't use it. i just have three states:

  • unread notifications: something new happened - how exciting!
  • read notifications: seen it but need to remember to do something there, so keep it visible as a constant nagging reminder
  • archived notifications: done, don't need to care about this anymore

i don't know why GitHub is hiding this behind the small notification icon in the top right corner and not advertising this any further. it took me a while to even find out that this exists.

(also, we've drifted a bit off-topic, but i thought it was interesting 😆)

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Nov 9, 2022

Hey @rursprung from reading the description and forum post, may I know what specific part you want to keep track in OpenSearch, like the a specific document, a specific data or since you mentioned the system scales the pods (the data ingestion system) to keep track of the POD_NAME (which is randomly generated by the k8s) or deployment/statefulset name, saying if the data ingested from this set do not re ingest again?
If so should the data ingestion system have a checkpoint from where to ingest again and not from beginning if a new pod scaled up ?
@bbarani

@rursprung
Copy link
Author

Hey @rursprung from reading the description and forum post, may I know what specific part you want to keep track in OpenSearch, like the a specific document, a specific data or since you mentioned the system scales the pods (the data ingestion system) to keep track of the POD_NAME (which is randomly generated by the k8s) or deployment/statefulset name, saying if the data ingested from this set do not re ingest again? If so should the data ingestion system have a checkpoint from where to ingest again and not from beginning if a new pod scaled up ? @bbarani

hm, maybe i should've given an example in the initial post of what the intention is. this isn't about managing the surrounding applications (e.g. we're consuming Kafka messages and Kafka keeps track of what you've already consumed, so restarting an importer service has no impact, it'll automatically pick up wherever it has to) or checkpoints for data ingestion. this is about managing indices, settings & co.

here's an example use-case:

  1. you deploy a fresh OpenSearch cluster
  2. you deploy your application which will continuously import some data from elsewhere (let's call this "importer")
    • due to this being the first start it needs to create the indices, set up roles, maybe create search templates, etc.
    • you'll probably run several replicas of your importer in parallel for speed & availability and all of them might start at the same time - but only one of them should do the initial setup (otherwise you're either doing too much (best case) or actively destroying something (worst case))
  3. after a while you deploy a new version of your importer which just changed something in the logic
    • this version has no changes to the indices & co., so it's just a normal application version rollover and nothing special is needed here. great!
  4. now you deploy yet another new version of your importer, but this time you had to change some stuff, e.g. add some new fields to existing indices, create a new index and set up the roles accordingly, maybe also do a reindex of some data
    • all these changes need to be applied when this version starts up for the first time. but on subsequent startups you mustn't run these operations again as they might not all be idempotent (and even if they were: it'd be an time-expensive startup!)
    • depending on how you're doing the update it might be that several replicas of the new version start at the same time - so they might all try to do the upgrade

i hope this makes things a bit clearer?

@rursprung
Copy link
Author

i've spent some time to draft a solution proposal for this and would like to get your feedback on it! (if there's a better way to post this - e.g. as a PR with a markdown document in some RFC repo - please let me know!)

the main questions will be:

  • do you think that this makes sense?
  • would it be ok for this functionality to reside under the opensearch-project?
  • do you have any input on anything in the document (esp. the open points)?

Solution proposal

High-Level Overview

  • This is roughly a "Liquibase for OpenSearch"
  • An application (called "tool" from now on) is provided which can manage the setup/updates of anything in OpenSearch
  • The tool is executed before the actual application and the latter only starts if the former finished successfully
  • Failures need to be analysed manually (OpenSearch is not transactional and automated rollbacks from failures are thus not possible)
  • The tool is provided with a set of files which contain the updates to be executed
  • The tool keeps track of what has already been executed and only executes the updates which haven't been executed yet
  • The tool prevents race conditions if multiple updates are trying to start in parallel
  • The tool ensures that updates are only executed if the baseline is compatible

Usage of the Tool

The tool should run before the main application and the latter should only be started if the former finished successfully.
This can e.g. be achieved by something as simple as ./updater && ./app on a Unix system.

Operation Modes of the Tool

The tool will support multiple operation modes:

  • Version check
    • Will not acquire a lock but just run the version check and report the result
  • Dry run
    • Same as the version check but will also go through the list of updates and report which ones would be executed
    • Optional: this could also be offered with acquiring a lock, but then it'd rather be a wet-dress rehearsal than a dry-run?

Open Points for Usage

  • Does it make sense to run the updater in an init container on Kubernetes (of course only relevant for use-cases running in Kubernetes)?

Update Process

This diagram shows how the update process could work:
image

Details on Locking

The lock is acquired before the version check to ensure that nobody else is in the process of doing the same thing.

If the lock cannot be acquired in a reasonable time the process will fail to avoid hanging endlessly. Depending on the environment the failure will either lead to a re-schedule (e.g. in Kubernetes) and/or will notify an operator (due to the main process not running / the failure being reported).

Open Points for Locking

  • Is this a global lock (i.e. if anyone has the lock then nobody else can start any other application, even if the two are just sharing a cluster but otherwise not interacting with each other) or a lock per application group?
  • Is special logic needed to avoid race conditions on locking? e.g. double mutex? Any built-in features in OpenSearch to support this?
  • Should the wait time be a fixed time or should it use a back-off logic?

Details on General Config File

Optionally, a configuration file can be provided which can contain following information:

  • Optional: minimum required version
    • This allows removing (very) old patch files at some point. E.g. an application which only supports one major release upgrade at a time could remove the 1.x scripts from its 3.x release and require 2.0.0 as the minimum.

Open Points for the General Config File

  • If we support the minimum required version then it must be possible to supply a script which brings an empty schema up to the status of that required release to still enable bootstrapping the application!

Details on Version Folders

Each version (e.g. 1.0.0, 1.0.1, 1.1.0, etc.) has its own folder. The version is the version of the schema, not of the application using it (though the two can be the same and probably will be in most cases for the sake of simplicity in case it's only a single application managing the whole schema). The version number must follow semantic versioning.

Details on Patch Files

  • The files are stored on the file system
  • The files are grouped into versions (each version can contain one or more files)
    • This is done by using one sub-folder per version
  • The versions are following semver
    • This gives them a clearly defined order in which they must be executed
  • The files are ordered within the version
    • This is done by convention by prefixing the filename with an increasing identifier (preferred: numeric)
      • Mustn't be changed once the version has been released
      • Can be a sparse list (to make it easier to add new migrations in-between while building a release)

Content of the Patch Files

The patch files will contain the following content:

  • Metadata
    • Mandatory: Unique identifier of the update (probably a UUID, though a string would also work if it's guaranteed to be unique forever)
  • Request details
    • Mandatory: HTTP method (GET, POST, etc.)
    • Mandatory: path incl. parameters (e.g. /reindex)
    • Optional: HTTP request body OR reference to file with the payload
      • The advantage of having the body directly in here is simplicity, however escaping will become an issue
      • The advantage of using an external file will be that no escaping will be needed (it can directly be the JSON / NDJSON used by OpenSearch)

The tool will then piece together a full HTTP request out of this and send it to OpenSearch. The unique identifier is used to check if the patch has already been applied in a previous version (doesn't have to be 1:1 the same patch as it could've happened in several steps in earlier versions and has been combined into one here).

Discarded Ideas for Content of the Patch Files

  • Alternative file format / logic
    • HTTP method + path + body, with mapping logic
      • The updater can be provided with a configuration file which defines mapping logic to map old requests to new ones (changing URL paths/methods, rewriting the body, etc.)
      • Advantage: The special case of old update files written for previous major releases running on newer OpenSearch releases with changed APIs can still work
      • Disadvantage: Very complicated, probably very error prone, still doesn't support all cases (if a call needs to be split into multiple calls, etc.) without using/implementing a full scripting language...
    • Abstraction layer
      • Either needs to be aware of every single operation which can be executed (incl. for all plugins - probably via a config file) or will be extremely close to the actual HTTP calls and thus not serve any additional benefit
  • Instead of the unique identifier old patch files could also contain metadata pointing to newer patch files which contain them and/or newer patch files could contain metadata pointing to older patch files contained by them
    • Advantage: this would allow for combining multiple old steps into one new step
    • Disadvantages:
      • If any one of the previous steps has been executed before then still all individual steps need to be executed and then there's the risk that the new migration might not run properly if the previous steps have already been partially executed
      • Could be seen as an optional performance improvement which can be added after v1 if the need for it arises and the details could be planned then

Open Points for Patch Files

  • What format should they have? YAML, TOML, JSON, XML, ...?
  • Is the following performance optimisation needed? We could add "contains"/"contained in" in the metadata to group previous updates together into one
    • If at least one of the contained updates has already been executed, all of them will have to be executed individually
    • The grouped update can only be done if none of the previous ones has been executed yet
    • IMHO not needed for v1, can be added if the need for it shows

Details on Patch File Processing

Open Points for Patch File Processing

  • Should it be possible for some upgrade steps to fail (i.e. can there be optional upgrade steps)? IMHO: no, and if there are they can be added in a latter version as an additional metadata attribute (would need additional definition of how they should be registered, if operators need an additional flag to mark them as handled, etc.)
  • Should failed patch files be registered with their error message to make analysis easier or is it enough if the tool reports (= probably prints to stderr) the patch file name and error message?

Details on Error Handling

  • On error the lock should not be removed as it's not possible to roll back on errors and the schema is then in an undefined state and needs manual analysis to define the proper way to resolve the situation.

Details on Security

Authentication

The tool needs to provide authentication when calling OpenSearch (unless OpenSearch allows anonymous access with the necessary rights or has no security plugin installed / activated).

Open Points for Authentication

  • What authentication mechanisms does the tool support? Presumably it should support all common ones (basic auth, HTTP header, certificate, ...?) and offer static config options (config file, ENV variables) as well as dynamic callbacks to acquire the values (e.g. JWTs might not be valid long enough for the upgrade process to finish)

Authorization

The user used by the tool needs very wide-ranging rights, most likely full admin rights, to execute all updates (which range anywhere from managing indices and data in the indices to changing system and security configurations).

Discarded Solution Ideas

  • One idea was to run the logic centrally (see also below) which would offer the possibility of keeping these admin credentials more centrally, however this is not a real security benefit as the tool still needs the rights to write the update scripts (be that to OpenSearch or anywhere else where it can then be picked up) and these scripts in turn are executed with the admin rights - so an attacker could run anything they want through the update scripts / the user inserting these, if they manage to get to that point where they'd also have access to the credentials (= are on the box running the software).

Details on Central Config Indices

Some information needs to be persisted, and it's best to do this directly in OpenSearch. For this, the following indices will be needed:

  • An index containing the lock(s)
    • Depending on the exact design this could well be an index which will only ever have zero or one entries (if it has more than one then we had a race condition and the other tool will abort) with an identifier of the tool currently running
    • Or it could be an index containing the name of the application group as well as the other information (similarly with zero or one entries per application group)
  • An index containing all previously installed versions (only successful installations are listed) incl. their timestamp
  • An index containing all previously installed patches (only successful installations are listed) incl. their timestamp and version as part of which they were installed

General Open Points

In no particular order:

  • Find appropriate names for everything (what's the name of the tool? do we call the files "patch files", "update scripts" or something else? etc.)
  • Is the tool OpenSearch specific? At this point it probably isn't and could just as well be used to work with an Elasticsearch cluster
  • Do we need a rate limiting feature to prevent flooding OpenSearch?
    • Does OpenSearch reject requests if it's being flooded and are these rejects identifiable (to then automatically start rate limiting when we hit that?)
  • Does the tool itself need to publish some form of metrics?
  • Is the tool using one global set of indices for its configuration in an OpenSearch cluster or is there one per application group?
    • Reason against using a global set: that'd require all application groups to upgrade to the same major version of the tool at the same time (otherwise all other applications would fail to start the moment one has executed the upgrade of the tool). We can't have indices per major version as that'd mean that two different versions of the same application might otherwise not see each other and e.g. run older upgrades on newer indices or otherwise cause conflicts.
  • Do we need some house-keeping of the central configuration indices?
    • We can't really delete old versions & patches as we need to know what has been installed to ensure that we're not executing the same script a second time (see note about versioning / unique identifier on the patch files)
  • Is the following performance optimisation needed? We could add the (optional) possibility to define a all-in-one-go setup script for the case where the application has not yet been deployed, so instead of going through a ton of update scripts this could be done in a single (or rather: a few) steps
    • All migrations present would then just be registered as having been executed (so that they're not run the next time) but wouldn't actually be executed
    • Advantages:
      • Performance for bootstrapping
      • This would probably be required / play well together with the feature to define a minimum version (see above)
    • Disadvantages:
      • It'll be error prone to keep the two paths in sync
  • Is a recon tool/feature needed?
    • How would that even work? Any call (incl. to any plugin) can be executed as part of the update scripts, so there's no easy way to check everything)
    • It might be better if the application itself has a set of smoke tests which it can execute after an upgrade (or even periodically while running)
  • In which programming language should the tool be implemented?
    • A language which needs a runtime (JVM, Node.JS, etc.)
      • Advantage: the OpenSearch project has lots of Java knowledge, thus most collaborators could also contribute to the tool
      • Disadvantage: a huge runtime needs to be shipped/installed just for the tool (not everyone is using JVM, Node.JS or whatever we'd be using)
    • A compiled language
      • Advantages:
        • minimal footprint
        • usually faster
      • Disadvantages:
        • depending on the language less knowledge might be present
        • depending on the language this might pose security risks (C, C++ & co. are not well suited to parse arbitrary text files!)
      • Personal preference/suggestion: implement this in Rust:
        • Compiled = small footprint (usually compiled with static linking to have one single binary file with no dependencies; cross-compile is supported => can be compiled for all target platforms on a single build machine if needed)
        • Compiler (LLVM based) supports all targets supported by OpenSearch (and many, many more)
        • The safety guarantees of Rust ensure that we don't run into the usually security issues encountered with C, C++, etc.
        • There's at least a OpenSearch Rust Client, so there's some Rust knowledge in the project and others might have Rust knowledge as well (other AWS teams are also using Rust, e.g. for Bottlerocket OS)
        • Rust also makes it easy to implement the logic as a library (and only a small application around it as a wrapper which parses command line options and reads files from the file system) and would also offer to compile an additional C-ABI based API so that it can then be used using "foreign function include" in other languages if anyone ever had the need to do so.

Discarded Alternative Solution Ideas

  • Use a central service which receives the update requests and processes them
    • Requires the service to run as a singleton to prevent race conditions without locks with various disadvantages:
      • Yet another application to manage
      • Needs to run 24/7 to wait for update requests
    • Otherwise it'd have to use locks and then there's no advantage of having this running all the time (waiting for update requests) - it'd just be a much more complicated landscape
      • Implementing it as an OpenSearch plugin would've meant that it's not running as a singleton (the plugin would potentially run on each node)
  • Provide the functionality as a library which can be used by consumer applications directly
    • There are a lot of different programming languages which are supported by OpenSearch libraries and theoretically any language can be used to interact with it by doing direct REST calls. Either the library would have to be ported to all of those languages or some couldn't make use of it.
    • There's no need to modify things at runtime, the functionality is only needed on startup, thus the application doesn't really need to have access to the functionality.

@prudhvigodithi
Copy link
Member

Hey @rursprung thanks again for putting this, I'm open for a deep dive to have a meeting (call) to go over the solution (Please let me know if that works for you). Following are questions I have.

  1. How do we handle the tool for normal installation other than k8s, is the plan to run manually (or with user scripts) ./updater && ./app and then start the application?

  2. The content of the patch file, with the current problem statement, the details on the patch file are index, reindex, role management, there might be lot of requests to add new settings example updating the password for new application version is it expected to take care all of this by the tool, if so there is not end for the content of the patch file.

  3. If the tool used in k8s should the tool be part of every application pod init setup? I feel its not as the single process once done all the required settings part of the patch file should be good?

  4. Also apart from the problem statement who are the other users who can be benefitted?, just checking if this can be served something like a product or just like the existing client. From the description added it looks like the cli client to me that can be used/not-used for problem statement added.

Thank you

@rursprung
Copy link
Author

rursprung commented Nov 10, 2022

thanks for your feedback!

I'm open for a deep dive to have a meeting (call) to go over the solution (Please let me know if that works for you).

that'd be great! the question is if there are others in the community who'd be interested in joining this? (if so: please speak up!)

  1. How do we handle the tool for normal installation other than k8s, is the plan to run manually (or with user scripts) ./updater && ./app and then start the application?

i'd presume that it'd just be used as ./updater && ./app

2. The content of the patch file, with the current problem statement, the details on the patch file are index, reindex, role management, there might be lot of requests to add new settings example updating the password for new application version is it expected to take care all of this by the tool, if so there is not end for the content of the patch file.

in my design proposal the patch file does not know any of the commands specifically. it'd be something like this:

- "method": "POST"
- "path": /_reindex
- "content": "{ ... }"

so the updater doesn't need to know what all of that means. it'll just put together the request and execute it.

3. If the tool used in k8s should the tool be part of every application pod init setup? I feel its not as the single process once done all the required settings part of the patch file should be good?

it has to be part of every startup/init (k8s or not) as we cannot know what the status of the environment is. if you're setting things up in an automated way (k8s, ansible, hand-woven shell scripts, etc.) then the code doesn't know whether it'll run in a clean dev environment, on an outdated test environment or on production, so it always has to run to ensure that the setup is correct.
another reason for it to always run is to make sure that there hasn't been another version of the app which has already updated the schema with a breaking change (in which case the updater would fail which in turn would prevent the app from starting and an operator could then check what's going on and remove the offending old version).

4. Also apart from the problem statement who are the other users who can be benefitted?, just checking if this can be served something like a product or just like the existing client. From the description added it looks like the cli client to me that can be used/not-used for problem statement added.

i'd say anyone running any 3rd party applications against OpenSearch (or Elasticsearch) in a distributed environment (i.e. it's possible that there's more than one replica of the application is running) or in diverse environments where they want to ensure that the cluster is automatically bootstrapped or brought to the correct version without having to encode this in their own application.

@peterzhuamazon
Copy link
Member

Hi @rursprung I am not sure if this is something you are looking for?
https://github.com/opensearch-project/opensearch-cli/

@rursprung
Copy link
Author

no, i don't think that this covers it:

  • it does not cover any of the version checks, locking, etc. features which would be needed
  • the CLI is built for human interaction, it does not provide a big advantage over just using direct HTTP calls in a fully scripted environment
  • i also don't think that extending the CLI with the features mentioned here would necessarily be a good idea: the two are for different usage patterns (CLI: use by hand to administrate some settings & plugins; this tool here: used in a fully autonomous mode where it automatically manages any form of updates of settings, indices, data, etc. within OpenSearch when needed)

@prudhvigodithi
Copy link
Member

Adding @dblock @CEHENKLE @elfisher @bbarani

@peterzhuamazon
Copy link
Member

Adding @wbeckler and the @opensearch-project/clients team into the conversation.
Please let us know your thoughts on this.

Thanks.

@bbarani
Copy link
Member

bbarani commented Feb 20, 2023

@wbeckler @opensearch-project/opensearch-migrations can you provide your inputs?

@wbeckler
Copy link

@rursprung There's been some movement on building automations for #29

I know it's not exactly where you're going, but I'd be curious if you felt this had an overlap with what you're thinking.

@rursprung
Copy link
Author

@rursprung There's been some movement on building automations for opensearch-project/opensearch-migrations#29

I know it's not exactly where you're going, but I'd be curious if you felt this had an overlap with what you're thinking.

thanks for this update @wbeckler! i think there's some overlap, though there are also vast differences:

  • the issue you linked primarily focuses on executing tests
  • it also manages the upgrade of the actual instance and runs with docker directly (this tool here doesn't change the software or its config, it purely executes REST API calls - the rest is taken care of e.g. using kubernetes)

thus i don't think that this can (and should) be the same tool. it has a different focus and a different target audience (yours: opensearch contributors, this: cluster admins)

i see in your issue that there was also some discussion around supporting certain cluster-admin features (e.g. validate if a cluster is ready to be upgraded). i think these things would correlate with the proposed tool here. i had not explicitly thought about this yet as in our setup everything is created in automated ways, so e.g. taking care of doing index upgrades (through re-indexing) is something you just have to do as an update script and then it'll work, but for setups where there could be indices created by users it'd be useful to explicitly check the indices for re-indexing.

@davidjlynn
Copy link

davidjlynn commented Apr 26, 2023

Hey @rursprung , I like the idea a lot in providing a method of schema migration for OpenSearch.

My thoughts would be this would give application developers a tool to reduce the amount of backwards compatible logic as their application evolves. So, while they could still handle very old versions of their schema, they can proactively choose to upgrade their schema and hence reduce a lot of historic debt. Nice.

My input would be, you have mentioned that this is along similar lines to Liquibase but for OpenSearch.
In the simplest sense, Liquibase deals with lists of changesets and applies them to databases, and some of these changesets can be applied over different database types.
I suggest we consider extending Liquibase via extension to support OpenSearch.

As an example, here is a Liquibase extension for MongoDB: https://github.com/liquibase/liquibase-mongodb
This started life out as a custom extension (see the repository this one is a fork from) and appears to be adopted by Liquibase now as it has reached maturity.

The motivation here would be to avoid reinventing the wheel when it comes to the management of changesets and relevant formats.
I would hope the extension would allow the solution to concentrate on:

This is an assumption that this is possible, but using this established framework may save effort in the long term.

@wbeckler
Copy link

This sounds like a really great approach. If you identify any gaps in the API of OpenSearch for this purpose, please raise an issue for that.

@dblock
Copy link
Member

dblock commented Apr 28, 2023

Looks like we narrowed this topic down to "live-migrating schema and data". I'll move the issue into opensearch-migrations (that didn't exist when this issue was created).

I really like the idea of reusing an existing framework like liquibase to describe the changes desired in a form or shape of a destination (e.g. current mapping -> new mapping). Applying the change could be implemented both in an external tool, and as an OpenSearch plugin/extension. If we go the latter route, the API could maybe be something like POST /_extensions/transform with source, target and options (e.g. all or nothing, background, live or marking an index read-only, etc.). Tasks could be queued with job scheduler. Would that be valuable?

@dblock dblock transferred this issue from opensearch-project/opensearch-devops Apr 28, 2023
@rursprung
Copy link
Author

this is a great suggestion @davidjlynn!

i'm currently looking a bit into this and from what i've found so far liquibase isn't really built to support NoSQL databases. both liquibase-mongodb and liquibase-cosmodb have to implement fake-shims for JDBC functionality (see e.g. MongoClientDriver). they also have some somewhat-generic liquibase.nosql wrappers which are however not upstreamed to liquibase itself (or otherwise factored out of the repositories). i've raised liquibase/liquibase#4236 for this.
liquibase-neo4j even seems to implement a full JDBC driver just for liquibase and then use that.
other NoSQL databases are integrated using their SQL APIs.

we could theoretically use the OpenSearch JDBC driver, however then we wouldn't be able to use native OpenSearch actions and i guess it would also not allow managing settings (esp. also of plugins)? (note: i've never used the OpenSearch JDBC driver and haven't looked into it yet).
using opensearch-java would IMHO make more sense than the JDBC driver.

i very much like the idea of going with Liquibase and extending it. but i feel that if we do it the way mongodb & cosmodb were done we'll have a hacky codebase. maybe it'd make sense to check with some Liquibase developers whether there'd be a way to add better NoSQL support to liquibase-core as part of this effort?

@dblock: how would you envision the transformations with source and target to look like? if you just pass two index definitions then the system can't know which columns map to which new ones if e.g. the name changes - or would you want to annotate that somehow?
i kind-of like the idea of offloading the logic to OpenSearch as that'd make it even more general (other, non-liquibase, use-cases might spring up and can then use this).
though in a first version we might not need anything additional depending on how we define the configuration API: if you just define which API you want to call, the method to call it and the body to pass along then that's a very generic approach which would support anything. so this could be an abstract liquibase-http and based on that we could then add liquibase-opensearch which in a first version just deals with the management aspects (storing which updates were applied; it needs OpenSearch knowledge to create/update this index) and later offer dedicated change types to manage the data (and settings) rather than having to specify HTTP methods directly. this will then also offer smoother upgrades (the HTTP calls might break in major releases, the liquibase change types can deal with that if done properly and abstract away from it).

something else i noticed: liquibase and its extensions sadly still target Java 8 (see also liquibase/liquibase#1677), but opensearch-java targets Java 11. so if we use that then our liquibase-opensearch extension would also target only Java 11. i hope that this is then all still compatible (except if somebody tries to run it on Java 8).

@nvoxland
Copy link

Hi, I'm the creator of Liquibase and found this thread from liquibase/liquibase#4236 I did expand that ticket to be the general "Make NoSQL databases easier to support epic, but even with the the slighly hackier work-arounds I think leveraging the existing Liquibase code for everything except the OpenSearch specific portions will be much easier and in the end more powerful/flexible than something from scratch. But I'm also biased :) And I'm also always available for any questions you'd have, either here or at [email protected].

We target Java 8 by default because there seems to be an unfortunately large percentage of people still running that and we don't want to cut them out. But extensions like OpenSearch which require 11 can certainly build with java 11, that's not a problem at all.

@wbeckler
Copy link

wbeckler commented Jun 30, 2023 via email

@bbarani bbarani removed the untriaged label Jun 30, 2023
@Npfries
Copy link

Npfries commented Jun 30, 2023

Hi! I'm the author of that post. We have been running a more sophisticated version of that implementation in production for almost 18 months. I'm not sure I have a whole lot to add here but if there are any questions about how we're using it or improvements we are making, I'd be happy to answer.

We are executing the migrations alongside deployment of microservices in k8s, and primarily for application search.

@rursprung
Copy link
Author

@wbeckler / @Npfries: this looks indeed interesting as well! however, for us it makes more sense to be based on liquibase to be more aligned and better integrated with our applications which already make use of liquibase.

@nvoxland: thanks for your reply! i had tried to contact you via email a while ago but never got a reply - could you please check your emails or drop me an email at [email protected] if you hadn't received it? we'd be interested in getting this started!

@rursprung
Copy link
Author

@nvoxland: i still haven't given up on this - it'd be great if you could contact me so that we can get this started!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants