-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Ingest Manager to handle Packages with ElasticSearch Transform #75153
Comments
Pinging @elastic/ingest-management (Team:Ingest Management) |
Pinging @elastic/endpoint-management (Team:Endpoint Management) |
We have an early checklist to add custom/new types on the package elastic/package-spec#27 |
@ph It looks like a request to document what is needed for different assets in the package specs. I think the next step is to create some implementation tickets in the different repos. |
@ruflin Can we get some implementation so we can start knocking off some of them off. I will add some constraints to the ticket about the source index has to exist for the transform to be successfully applied. Constraints and Notes:
|
There are two main issues I see at the moment:
To get things moving here I suggest the following:
The above sidesteps a few issue:
@nnamdifrankie Could you get the above 2 PR's started and link them here? |
Currently our best case is where data exist e.g. 7.9 to 7.10 upgrade. We have to consider the best option for getting a document into the into the source index. Currently we ignore certain document with certain attributes, we can create a similar document to use as a seed. But we have to decide who will send this document.
We always test the registry locally while developing using the docker, but we can also use the code https://github.com/elastic/endpoint-package/blob/master/Makefile#L161 |
@elastic/ingest-management @elastic/endpoint-management Following my exploration of the EPM code, I have been able to install a transform but not start it. Starting it requires the source index to exist, we will explore options with the Elastic Search team. I however want to get you input on installation strategy for Transforms. Transforms in a dataset in a package have the cases that influence how we perform the installation.
Candidate Solutions:
Assuming current state is the desired state, we can delete the old transform with the dataset prefix and version along with its reference in SO after we have successfully installed the new transform if any exist (a move case). Before installation capture all Transform and Object reference with the dataset prefix. After successful installation delete using the information from the capture. The consideration is that we continue to support any rollback agreements in the case of failure. Deleting after installing may help satisfy this requirement. Also deleting may fail which result in multiple version and old transforms.
Calculate the diffs, drift and change cases and apply install and delete actions. The consideration here is catching all the cases and also ensuring that we maintain rollback agreements in the case of failure. Also have to consider delete failures which can result in an inconsistent and undesired state. Your comments are welcomed. |
Thanks @nnamdifrankie for the spike, @skh or @neptunian can you look at this? |
@nnamdifrankie I think I don't fully understand your example of a transform moving to an other dataset which unfortunately is important for all the follow ups. Could you share an example? To get started, I would keep it as simple as possible. Would the following flow be possible?
@nnamdifrankie Could this cause any side effects? You mention above that parts of the transform can change. My assumption is that these changes always mean a new version of the package? If we follow the above, also rollback should be pretty straight forward as it is just the same in reverse. An other open question for me is where in the chain of asset installation does transform fit it on install / upgrade. I assume after the templates and ingest pipelines have been loaded but before UI elements? An other thing I learned yesterday when talk to the ML team is that the source index can be a patter. So it can be |
@ruflin Sorry I was not clear earlier.
Given we have the transform in previous version. Endpoint Package 0.15.0: We have a transform in the metadata dataset Endpoint Package 0.16.0: We move the transform to metadata_current dataset |
It is possible that the wildcard picks up disjoint documents that matches the query and pivot of the transform. The documents will be transferred to the destination index. But the mapping of the destination index determines how usefulness of the documents. If the document maps correctly then it will be retrieved in queries, else it will not. |
My plan is was to
The goal is to have one transform per purpose because any slight difference in the code could mean different documents in the target. |
@nnamdifrankie Isn't the version in the name enough without having the timestamp to allow a rollback in case of a failure? Can you explain the need for the timestamp? Thanks. |
It is for the forced install where the versions are the same. |
I think I'm also still missing the part around the force install and the timestamp. When is this exactly happening? You mention above
What does this exactly mean? Could you share an example? You also mention:
Are you referring here to multiple transform per dataset? My current assumptions are and please let me know which ones are wrong:
|
@ruflin Sorry it was not clear. First let me answer in the context of your steps here
My proposal will do which is similar to your step but just a change in order.
With my steps above then we will do not plan to install over any transforms. Transform are only removed after we have successfully installed the new transforms. Hence the need for timestamps or unique identifier even in a forced version update.
Yes if we have dataset1.transform1 and dataset2.transform2 that have the same code and update the same index and run at different time. I believe this not a desirable state. What do you think?
Do you currently have a rollback handler in case of failure that tries to install the previous version?
I can only answer this question by testing. Everything is timing with this setup.
Let talk about this for clarity. Wiping the destination index is technical a service outage. |
Looks like we are mostly on the same page. The only different is if transform is overwritten (what we do for index templates) or if we use versions (what we do for ingest pipelines). I guess both will work. If we use a version for the transform, lets use the package version. Could you test on what the maximum time is we have to get the new transform in place? For the rollback, @neptunian can share here more. What is our naming convention for the case where we have multiple transforms in a single dataset? Will we postfix the file name without the |
{ Where |
Two questions:
@nnamdifrankie It seems the force install you need for development purpose or is this something you expect to see in production? I'm worried we build something for dev that we should potentially solve in a different way. For example we could have a special method for "overwrite / force install" that does the right thing for each asset and in the case of transform, it would be delete and install it again. |
If by forced install, if you mean "reinstall" and the versions are the same, we don't delete the previous ingest pipeline if the version is the same or it's a reinstall. Since we PUT to the ingest pipeline it would update the existing versionized one if it exists. @ruflin We don't have a rollback currently for unknown errors that aren't handled. I think it was mentioned here and that we'd improve on it but I realize some default behavior should happen. Currently it will just error out and when you refresh kibana it will do a "package health check" and try to reinstall whatever package you were trying to install. This behaviour is mainly to handle unknown errors that cause kibana to crash, though. We should add a case for catching any unknown error and trying to reinstall the previous version. This should be a minor change where we try to install the previous in an else clause here. There is no special rollback handler. |
@neptunian @ruflin let make sure there is a ticket for this rollback so it does not fall through the cracks. Right now I have only handled the happen path with the belief that rollback will be handled by the main handler. |
@nnamdifrankie I'm good with closing it but lets make sure we follow up with the issues like the index problem. |
@nnamdifrankie Can you create an issue for index problem And close this one? |
@ph I will probably have to create in ML issues board and link this one. |
Describe the feature:
Given I have a package that contains an ElasticSearch Transform specification, we should be able to install, update, delete and start the transform based on the state of the package and prior history of the package.
Describe a specific use case for the feature:
The text was updated successfully, but these errors were encountered: