-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Managing Index Setups & Configuration in Microservices Environment #148
Comments
@rursprung Do you think https://github.com/opensearch-project/opensearch-devops is a better place for this issue? |
thanks for pointing to that repo! yes, it could well be that it's a better fit there - the question just is whether it'll garner as much interaction as it would here? but probably it'd make sense to move it. i think this plays nicely into what i said in the meeting: there are so many repos, it's hard to keep track of where to open the ticket (and in the central ones there's so much going on that few people will check all new issues/PRs - i definitely don't have the time for that). |
I've moved it. I don't know what a solution is for better tracking progress. My own GitHub notifications are wild, this is what I do, 🤷 |
i went the opposite way: i've disabled all email notifications in GitHub and exclusively use the notifications page (which is a pinned tab for me in Firefox). it also offers some filter functionality, though i usually don't use it. i just have three states:
i don't know why GitHub is hiding this behind the small notification icon in the top right corner and not advertising this any further. it took me a while to even find out that this exists. (also, we've drifted a bit off-topic, but i thought it was interesting 😆) |
Hey @rursprung from reading the description and forum post, may I know what specific part you want to keep track in OpenSearch, like the a specific document, a specific data or since you mentioned the system scales the pods (the data ingestion system) to keep track of the POD_NAME (which is randomly generated by the k8s) or deployment/statefulset name, saying if the data ingested from this set do not re ingest again? |
hm, maybe i should've given an example in the initial post of what the intention is. this isn't about managing the surrounding applications (e.g. we're consuming Kafka messages and Kafka keeps track of what you've already consumed, so restarting an importer service has no impact, it'll automatically pick up wherever it has to) or checkpoints for data ingestion. this is about managing indices, settings & co. here's an example use-case:
i hope this makes things a bit clearer? |
i've spent some time to draft a solution proposal for this and would like to get your feedback on it! (if there's a better way to post this - e.g. as a PR with a markdown document in some RFC repo - please let me know!) the main questions will be:
Solution proposalHigh-Level Overview
Usage of the ToolThe tool should run before the main application and the latter should only be started if the former finished successfully. Operation Modes of the ToolThe tool will support multiple operation modes:
Open Points for Usage
Update ProcessThis diagram shows how the update process could work: Details on LockingThe lock is acquired before the version check to ensure that nobody else is in the process of doing the same thing. If the lock cannot be acquired in a reasonable time the process will fail to avoid hanging endlessly. Depending on the environment the failure will either lead to a re-schedule (e.g. in Kubernetes) and/or will notify an operator (due to the main process not running / the failure being reported). Open Points for Locking
Details on General Config FileOptionally, a configuration file can be provided which can contain following information:
Open Points for the General Config File
Details on Version FoldersEach version (e.g. 1.0.0, 1.0.1, 1.1.0, etc.) has its own folder. The version is the version of the schema, not of the application using it (though the two can be the same and probably will be in most cases for the sake of simplicity in case it's only a single application managing the whole schema). The version number must follow semantic versioning. Details on Patch Files
Content of the Patch FilesThe patch files will contain the following content:
The tool will then piece together a full HTTP request out of this and send it to OpenSearch. The unique identifier is used to check if the patch has already been applied in a previous version (doesn't have to be 1:1 the same patch as it could've happened in several steps in earlier versions and has been combined into one here). Discarded Ideas for Content of the Patch Files
Open Points for Patch Files
Details on Patch File ProcessingOpen Points for Patch File Processing
Details on Error Handling
Details on SecurityAuthenticationThe tool needs to provide authentication when calling OpenSearch (unless OpenSearch allows anonymous access with the necessary rights or has no security plugin installed / activated). Open Points for Authentication
AuthorizationThe user used by the tool needs very wide-ranging rights, most likely full admin rights, to execute all updates (which range anywhere from managing indices and data in the indices to changing system and security configurations). Discarded Solution Ideas
Details on Central Config IndicesSome information needs to be persisted, and it's best to do this directly in OpenSearch. For this, the following indices will be needed:
General Open PointsIn no particular order:
Discarded Alternative Solution Ideas
|
Hey @rursprung thanks again for putting this, I'm open for a deep dive to have a meeting (call) to go over the solution (Please let me know if that works for you). Following are questions I have.
Thank you |
thanks for your feedback!
that'd be great! the question is if there are others in the community who'd be interested in joining this? (if so: please speak up!)
i'd presume that it'd just be used as
in my design proposal the patch file does not know any of the commands specifically. it'd be something like this: - "method": "POST"
- "path": /_reindex
- "content": "{ ... }" so the updater doesn't need to know what all of that means. it'll just put together the request and execute it.
it has to be part of every startup/init (k8s or not) as we cannot know what the status of the environment is. if you're setting things up in an automated way (k8s, ansible, hand-woven shell scripts, etc.) then the code doesn't know whether it'll run in a clean dev environment, on an outdated test environment or on production, so it always has to run to ensure that the setup is correct.
i'd say anyone running any 3rd party applications against OpenSearch (or Elasticsearch) in a distributed environment (i.e. it's possible that there's more than one replica of the application is running) or in diverse environments where they want to ensure that the cluster is automatically bootstrapped or brought to the correct version without having to encode this in their own application. |
Hi @rursprung I am not sure if this is something you are looking for? |
no, i don't think that this covers it:
|
Adding @wbeckler and the @opensearch-project/clients team into the conversation. Thanks. |
@wbeckler @opensearch-project/opensearch-migrations can you provide your inputs? |
@rursprung There's been some movement on building automations for #29 I know it's not exactly where you're going, but I'd be curious if you felt this had an overlap with what you're thinking. |
thanks for this update @wbeckler! i think there's some overlap, though there are also vast differences:
thus i don't think that this can (and should) be the same tool. it has a different focus and a different target audience (yours: opensearch contributors, this: cluster admins) i see in your issue that there was also some discussion around supporting certain cluster-admin features (e.g. validate if a cluster is ready to be upgraded). i think these things would correlate with the proposed tool here. i had not explicitly thought about this yet as in our setup everything is created in automated ways, so e.g. taking care of doing index upgrades (through re-indexing) is something you just have to do as an update script and then it'll work, but for setups where there could be indices created by users it'd be useful to explicitly check the indices for re-indexing. |
Hey @rursprung , I like the idea a lot in providing a method of schema migration for OpenSearch. My thoughts would be this would give application developers a tool to reduce the amount of backwards compatible logic as their application evolves. So, while they could still handle very old versions of their schema, they can proactively choose to upgrade their schema and hence reduce a lot of historic debt. Nice. My input would be, you have mentioned that this is along similar lines to Liquibase but for OpenSearch. As an example, here is a Liquibase extension for MongoDB: https://github.com/liquibase/liquibase-mongodb The motivation here would be to avoid reinventing the wheel when it comes to the management of changesets and relevant formats.
This is an assumption that this is possible, but using this established framework may save effort in the long term. |
This sounds like a really great approach. If you identify any gaps in the API of OpenSearch for this purpose, please raise an issue for that. |
Looks like we narrowed this topic down to "live-migrating schema and data". I'll move the issue into opensearch-migrations (that didn't exist when this issue was created). I really like the idea of reusing an existing framework like liquibase to describe the changes desired in a form or shape of a destination (e.g. current mapping -> new mapping). Applying the change could be implemented both in an external tool, and as an OpenSearch plugin/extension. If we go the latter route, the API could maybe be something like |
this is a great suggestion @davidjlynn! i'm currently looking a bit into this and from what i've found so far liquibase isn't really built to support NoSQL databases. both liquibase-mongodb and liquibase-cosmodb have to implement fake-shims for JDBC functionality (see e.g. we could theoretically use the OpenSearch JDBC driver, however then we wouldn't be able to use native OpenSearch actions and i guess it would also not allow managing settings (esp. also of plugins)? (note: i've never used the OpenSearch JDBC driver and haven't looked into it yet). i very much like the idea of going with Liquibase and extending it. but i feel that if we do it the way mongodb & cosmodb were done we'll have a hacky codebase. maybe it'd make sense to check with some Liquibase developers whether there'd be a way to add better NoSQL support to @dblock: how would you envision the transformations with something else i noticed: liquibase and its extensions sadly still target Java 8 (see also liquibase/liquibase#1677), but |
Hi, I'm the creator of Liquibase and found this thread from liquibase/liquibase#4236 I did expand that ticket to be the general "Make NoSQL databases easier to support epic, but even with the the slighly hackier work-arounds I think leveraging the existing Liquibase code for everything except the OpenSearch specific portions will be much easier and in the end more powerful/flexible than something from scratch. But I'm also biased :) And I'm also always available for any questions you'd have, either here or at We target Java 8 by default because there seems to be an unfortunately large percentage of people still running that and we don't want to cut them out. But extensions like OpenSearch which require 11 can certainly build with java 11, that's not a problem at all. |
I just found this implementation of a JavaScript migrations library that is
not liquibase but which does start to think about repeatable and reversible
schema changes: https://nathanfries.com/posts/opensearch-migrations/
…On Tue, May 16, 2023, 4:03 PM Nathan Voxland ***@***.***> wrote:
Hi, I'm the creator of Liquibase and found this thread from
liquibase/liquibase#4236
<liquibase/liquibase#4236> I did expand that
ticket to be the general "Make NoSQL databases easier to support epic, but
even with the the slighly hackier work-arounds I think leveraging the
existing Liquibase code for everything except the OpenSearch specific
portions will be much easier and in the end more powerful/flexible than
something from scratch. But I'm also biased :) And I'm also always
available for any questions you'd have, either here or at
***@***.***
We target Java 8 by default because there seems to be an unfortunately
large percentage of people still running that and we don't want to cut them
out. But extensions like OpenSearch which require 11 can certainly build
with java 11, that's not a problem at all.
—
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA5PRLRKRWJZ27GXQB5JGVDXGPMSPANCNFSM6AAAAAAXPONRKM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Hi! I'm the author of that post. We have been running a more sophisticated version of that implementation in production for almost 18 months. I'm not sure I have a whole lot to add here but if there are any questions about how we're using it or improvements we are making, I'd be happy to answer. We are executing the migrations alongside deployment of microservices in k8s, and primarily for application search. |
@wbeckler / @Npfries: this looks indeed interesting as well! however, for us it makes more sense to be based on liquibase to be more aligned and better integrated with our applications which already make use of liquibase. @nvoxland: thanks for your reply! i had tried to contact you via email a while ago but never got a reply - could you please check your emails or drop me an email at |
@nvoxland: i still haven't given up on this - it'd be great if you could contact me so that we can get this started! |
Is your feature request related to a problem? Please describe.
when deploying OpenSearch as part of a larger application fleet in an environment (in our case: kubernetes) where any installation/update must be 100% hands-off (i.e. fully automated) and esp. when the connected applications are microservices (i.e. lots of them, various versions of the same in parallel due to canary upgrades or just in general rolling upgrades) it's very hard to actually set up the proper index structures & general settings on OpenSearch:
Describe the solution you'd like
there should be a way for consumer applications to manage opensearch indices in a similar way as can be done with liquibase for RDBMS (SQL-based relational DBs). there it's possible to define upgrade scripts and liquibase then keeps track of what has already been applied and what hasn't (by storing that information in dedicated table(s) on the DB). this can be used both for DDL (data definition language; e.g. changing tables) as well as DML (data manipulation language; e.g. migrating data) and any mixture of the two (e.g. changing an existing table schema and migrating the data in the process).
Describe alternatives you've considered
Additional context
note: while this ticket has now been opened in the main OpenSearch repository i'm not sure whether the actual solution for this will be part of this repository. i could well imagine that the solution would be a dedicated application or an OpenSearch plugin.
The text was updated successfully, but these errors were encountered: