Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt es-archiver to call
_migrate
endpoint instead of creating a migrator #56971Adapt es-archiver to call
_migrate
endpoint instead of creating a migrator #56971Changes from 23 commits
1943e3a
b3d6019
b8820a0
110187e
ec1a4ab
d72d774
41f56b3
250ff7c
72cdb3d
6f02890
1f8b861
39b07f7
a793695
06f2138
4038539
c354dc7
f62f9ae
b467cfa
2217872
89ab46e
0ea85e0
801189a
10b8fac
96b1315
26a15e5
a79485d
318fa14
4fdb39d
502c042
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a
@param
to the comment to explain what this option does / how / when it should be used?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is very naive: if a migration is already running and we call
runMigration
again withreturn=true
, multiple migrations can occur at the same time.We should maybe wait for the pending migration to finish before triggering the next one. But that adds some complex logic to handle edge cases such as multiple
rerun=true
calls when a migration is already running. We should wait for the migration to finish but only re-execute ONCE and return the same promise for every calls torerun
that occurred during the initial migration.As this API is internal I though current implementation was acceptable. Tell me if we should fireproof it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, in theory, it could lead to a race condition if 2 migrations run over the same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it could yes, even if I don't think it could leads to data corruption, as in worse case, both 'threads' will access the data before any actually migrated it, and the document would be migrated then updated 'twice' from the same original data, which would only impact performances.
I can try to improve that by 'pooling' all the calls to
runMigration(rerun=true)
while one migration is running and only execute the rerun once, as explained in previous comment, if we think current implementation is too fragile.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay as long as this API is used only for tests internally. Could you add a comment about this known limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we really need this explicit timeout? AFAIK we set large enough global timeout for all integration jest tests unless something has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it was added not to wait for 10 minutes to fail the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂️ if we think Kibana startup and CI are supposed to be faster now we can decrease global timeout and set it to something more reasonable once and for all (out of scope of this PR, just thinking aloud).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that was it.
+1, but changing the global timeout is not always a pleasant surprise in term of results, this is way outside of the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I'm wondering if it'd make sense to at least tie this to
savedObjectsManagement:all
privilege and use something likeaccess:migrateSavedObjects
tag here.The only thing is that I can't recall a case when we did something like this for internal APIs. @kobelb what do you think? Am I missing something and there is a better way to manage access to internal APIs that perform actions on behalf of internal Kibana user (SO migration)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're only using this endpoint internally for the functional test runner and don't intend for end-users in production to use this endpoint, I wouldn't tie this to the
savedObjectsManagement:all
privilege. End-users who get all access to the saved-object management feature would then be able to use this.This route is rather different than our other "internal" routes. Our other internal routes are intended to be used from the browser, but end-users are authorized to access them. We don't really want this route to be used in production at all. We definitely
canshould use theaccess:
tag on the route to restrict which users should have access to this route. However, we really only want it to be used by the functional test runner, which I don't believe has a real user to represent. If we used anaccess:migrateSavedObjects
tag here, and then didn't add it to any privileges only asuperuser
would be able to invoke this API, as they're authorized to do everything.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other stipulation is that relying on the security plugin's authorization won't apply when the user hasn't enabled security, which is the default. What are the repercussions of any user being able to start migrations in these scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to come up with a bad scenario and coming up empty. Migrations only get run on objects from older versions, which the only way you could create objects from older versions is manually via direct Elasticsearch API calls, which only superusers and kibana_system should have access to do.
Adding
access:migrateSavedObjects
tag makes sense to me as an extra layer of precaution, but I don't think we need to do anything extra for the no-security scenario. Does that work for y'all?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having just
access:migrateSavedObjects
sounds like a reasonable trade-off to me assuming it works withesArchiver
and there is no reliable way no detect "FTR" scenario usingenv.packageInfo.dist/buildNum
(to not register this endpoint at all otherwise or register it via custom test plugin or something along these lines).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were no
/internal
routes when this was written, and we decided to only run that route middleware when the url began with/api
. We can likely remove that restriction entirely, as we're only performing the authorization logic on route which have theaccess:
tag. I created #58351 to see the impact this has. @legrego do you recall any reason why we included the restriction that the route must start with/api
originally?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kobelb nope I don’t think we had a great reason, beyond what you already mentioned that we didn’t have any /internal routes at that time. I don’t see a problem removing this restriction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed the commit adding the access tag to the route, to see if that doesn't introduce another 'surprise' from the FTR call. Waiting for #58351 to change the endpoint from
api
tointernal
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI just passed on #58351, so I think we should be good, just waiting on a review at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see #58351 has been merged. Thanks a lot to the whole security team 😄 Will move the endpoint to the
internal
prefix tomorrow, and we should be good to go.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: how unexpected migration errors look like (5xx)? May they include something that we don't want consumer to see? If so, we may need to try/catch errors in the handler, log them and return something more opaque.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly here to catch the elasticsearch service errors, that are currently thrown as boom errors. These errors are already thrown as-is from other SO endpoints, so I think there is no risk here, but I can try/catch and return a generic error instead if we think this is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, then yeah, it's fine as it's right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we log a warning if this endpoint is used in production env? (and maybe if the CI env var isn't set?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there aren't any tests on this endpoint / logic. I know this is an internal API, but it still feels like we should test that we don't break this by accident. Though CI would probably fail if this broke, it would be much better if we had more insight into what broke to cause the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are running the FTR suites against production cloud, so I'm not sure we should be logging a warning? At least we should not check the CI var env?
Yea, I know. I can add basic unit test to check that 1/ the route is registered and 2/ calling the route properly calls the migrator. However I have no idea how to go further than that, as the API call is actually testing in the FTR. We can't really functional test a component used by the FTR framework itself, can we? Any insight on how to test that on a 'higher' level that unit is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added integration tests on the endpoint to ensure it properly calls
migrator.runMigrations
with the correct parametersThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Platform going to promote any naming convention to clearly distinguish public (with BWC guarantees and documentation) and internal (no BWC guarantees, implicit
use at your own risk
warning and optional documentation) API endpoints? In Spaces/Security we started to define internal routes with/internal/
prefix instead of/api/
as an experiment that has been working well enough so far.I'm a bit concerned that users will quickly discover this endpoint and may start using it in any imaginable context without realizing all the consequences leading to hard-to-handle/diagnose support cases....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone against
/internal/saved_objects/_migrate
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to introduce another convention without a broad discussion. as of me,
internal
could work, yes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, does it mean that
env.packageInfo
(dist
,buildNum
etc.) in this case is completely indistinguishable from the one we have when Kibana is run in production outside of functional tests?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/internal/saved_objects/_migrate
kinda breaks the SO route prefix. This is very minor, but forces to use a/
based router when declaring routes instead of a/saved_objects
one. Would/saved_objects/internal/_migrate
be acceptable, or do we really want the route to starts with/internal
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, did not understand the
internal
prefix was a replacement for theapi
one. Please forget my previous question