-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt es-archiver to call _migrate
endpoint instead of creating a migrator
#56971
Adapt es-archiver to call _migrate
endpoint instead of creating a migrator
#56971
Conversation
Pinging @elastic/kibana-platform (Team:Platform) |
294813f
to
1943e3a
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on the modified data.json.gz files:
The only behavioural difference between the previous and new mechanism is that the server migrator only got knowledge of the types from the enabled plugins, where the embed es-archiver
migrator was registering every types to the migrator's schema. This normally has no impact, however some specific test suites included data with migrationVersion
for spaces
, when the test suite had the spaces
plugin disabled. This caused the bug logged here: #56731. Only 3 data files, all from the reporting
plugin, are impacted.
@tsullivan This is the cause of the changes in the x-pack/test/functional/es_archives/reporting
datas. I only removed the default space
document, that wasn't used anyway as all the tests loading the data are running without spaces
.
packages/kbn-pm/dist/index.js
Outdated
/** | ||
* Run the saved objects migration | ||
*/ | ||
async migrate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL we need to manually generate and commit this file, and best part is it got mixed LF and CRLF line feeds.
public runMigrations(): Promise<Array<{ status: string }>> { | ||
if (this.migrationResult === undefined) { | ||
public runMigrations({ rerun = false }: { rerun?: boolean } = {}): Promise< | ||
Array<{ status: string }> | ||
> { | ||
if (this.migrationResult === undefined || rerun) { | ||
this.migrationResult = this.runMigrationsInternal(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is very naive: if a migration is already running and we call runMigration
again with return=true
, multiple migrations can occur at the same time.
We should maybe wait for the pending migration to finish before triggering the next one. But that adds some complex logic to handle edge cases such as multiple rerun=true
calls when a migration is already running. We should wait for the migration to finish but only re-execute ONCE and return the same promise for every calls to rerun
that occurred during the initial migration.
As this API is internal I though current implementation was acceptable. Tell me if we should fireproof it more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, in theory, it could lead to a race condition if 2 migrations run over the same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it could yes, even if I don't think it could leads to data corruption, as in worse case, both 'threads' will access the data before any actually migrated it, and the document would be migrated then updated 'twice' from the same original data, which would only impact performances.
I can try to improve that by 'pooling' all the calls to runMigration(rerun=true)
while one migration is running and only execute the rerun once, as explained in previous comment, if we think current implementation is too fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay as long as this API is used only for tests internally. Could you add a comment about this known limitation?
registerRoutes({ | ||
http: setupDeps.http, | ||
migratorPromise: this.migrator$.pipe(first()).toPromise(), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration is done during setup
, the migrator is not yet created. Had to use a subject/promise as workaround.
…iver-call-migrate-endpoint
@@ -95,8 +95,10 @@ export class KibanaMigrator { | |||
* The promise resolves with an array of migration statuses, one for each | |||
* elasticsearch index which was migrated. | |||
*/ | |||
public runMigrations(): Promise<Array<{ status: string }>> { | |||
if (this.migrationResult === undefined) { | |||
public runMigrations({ rerun = false }: { rerun?: boolean } = {}): Promise< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a @param
to the comment to explain what this option does / how / when it should be used?
packages/kbn-dev-utils/src/kbn_client/kbn_client_saved_objects.ts
Outdated
Show resolved
Hide resolved
validate: false, | ||
}, | ||
async (context, req, res) => { | ||
const migrator = await migratorPromise; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we log a warning if this endpoint is used in production env? (and maybe if the CI env var isn't set?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there aren't any tests on this endpoint / logic. I know this is an internal API, but it still feels like we should test that we don't break this by accident. Though CI would probably fail if this broke, it would be much better if we had more insight into what broke to cause the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we log a warning if this endpoint is used in production env? (and maybe if the CI env var isn't set?)
We are running the FTR suites against production cloud, so I'm not sure we should be logging a warning? At least we should not check the CI var env?
Also there aren't any tests on this endpoint / logic
Yea, I know. I can add basic unit test to check that 1/ the route is registered and 2/ calling the route properly calls the migrator. However I have no idea how to go further than that, as the API call is actually testing in the FTR. We can't really functional test a component used by the FTR framework itself, can we? Any insight on how to test that on a 'higher' level that unit is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added integration tests on the endpoint to ensure it properly calls migrator.runMigrations
with the correct parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an internal API
Is Platform going to promote any naming convention to clearly distinguish public (with BWC guarantees and documentation) and internal (no BWC guarantees, implicit use at your own risk
warning and optional documentation) API endpoints? In Spaces/Security we started to define internal routes with /internal/
prefix instead of /api/
as an experiment that has been working well enough so far.
I'm a bit concerned that users will quickly discover this endpoint and may start using it in any imaginable context without realizing all the consequences leading to hard-to-handle/diagnose support cases....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone against /internal/saved_objects/_migrate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to introduce another convention without a broad discussion. as of me, internal
could work, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are running the FTR suites against production cloud
By the way, does it mean that env.packageInfo
(dist
, buildNum
etc.) in this case is completely indistinguishable from the one we have when Kibana is run in production outside of functional tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/internal/saved_objects/_migrate
kinda breaks the SO route prefix. This is very minor, but forces to use a /
based router when declaring routes instead of a /saved_objects
one. Would /saved_objects/internal/_migrate
be acceptable, or do we really want the route to starts with /internal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind, did not understand the internal
prefix was a replacement for the api
one. Please forget my previous question
…iver-call-migrate-endpoint
…iver-call-migrate-endpoint
@elastic/kibana-security We could probably use some feedback on this from a security perspective. More context in the parent issue, but I just want to make sure there aren't security risks from this that we don't realize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. Would like to wait on Security team before merging.
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM securitywise.
await root.setup(); | ||
await root.start(); | ||
migratorInstanceMock.runMigrations.mockClear(); | ||
}, 30000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we really need this explicit timeout? AFAIK we set large enough global timeout for all integration jest tests unless something has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it was added not to wait for 10 minutes to fail the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷♂️ if we think Kibana startup and CI are supposed to be faster now we can decrease global timeout and set it to something more reasonable once and for all (out of scope of this PR, just thinking aloud).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it was added not to wait for 10 minutes to fail the test.
Yea, that was it.
if we think Kibana startup and CI are supposed to be faster now we can decrease global timeout and set it to something more reasonable once and for all
+1, but changing the global timeout is not always a pleasant surprise in term of results, this is way outside of the scope of this PR.
src/core/server/saved_objects/migrations/kibana/kibana_migrator.ts
Outdated
Show resolved
Hide resolved
src/core/server/saved_objects/migrations/kibana/kibana_migrator.ts
Outdated
Show resolved
Hide resolved
validate: false, | ||
}, | ||
async (context, req, res) => { | ||
const migrator = await migratorPromise; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an internal API
Is Platform going to promote any naming convention to clearly distinguish public (with BWC guarantees and documentation) and internal (no BWC guarantees, implicit use at your own risk
warning and optional documentation) API endpoints? In Spaces/Security we started to define internal routes with /internal/
prefix instead of /api/
as an experiment that has been working well enough so far.
I'm a bit concerned that users will quickly discover this endpoint and may start using it in any imaginable context without realizing all the consequences leading to hard-to-handle/diagnose support cases....
Co-Authored-By: Aleh Zasypkin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to not asking these questions during my initial review, just drifted away while thinking about public vs internal APIs and lost the track 🙈
) => { | ||
router.post( | ||
{ | ||
path: '/_migrate', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I'm wondering if it'd make sense to at least tie this to savedObjectsManagement:all
privilege and use something like access:migrateSavedObjects
tag here.
The only thing is that I can't recall a case when we did something like this for internal APIs. @kobelb what do you think? Am I missing something and there is a better way to manage access to internal APIs that perform actions on behalf of internal Kibana user (SO migration)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're only using this endpoint internally for the functional test runner and don't intend for end-users in production to use this endpoint, I wouldn't tie this to the savedObjectsManagement:all
privilege. End-users who get all access to the saved-object management feature would then be able to use this.
This route is rather different than our other "internal" routes. Our other internal routes are intended to be used from the browser, but end-users are authorized to access them. We don't really want this route to be used in production at all. We definitely can should use the access:
tag on the route to restrict which users should have access to this route. However, we really only want it to be used by the functional test runner, which I don't believe has a real user to represent. If we used an access:migrateSavedObjects
tag here, and then didn't add it to any privileges only a superuser
would be able to invoke this API, as they're authorized to do everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other stipulation is that relying on the security plugin's authorization won't apply when the user hasn't enabled security, which is the default. What are the repercussions of any user being able to start migrations in these scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to come up with a bad scenario and coming up empty. Migrations only get run on objects from older versions, which the only way you could create objects from older versions is manually via direct Elasticsearch API calls, which only superusers and kibana_system should have access to do.
Adding access:migrateSavedObjects
tag makes sense to me as an extra layer of precaution, but I don't think we need to do anything extra for the no-security scenario. Does that work for y'all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having just access:migrateSavedObjects
sounds like a reasonable trade-off to me assuming it works with esArchiver
and there is no reliable way no detect "FTR" scenario using env.packageInfo.dist/buildNum
(to not register this endpoint at all otherwise or register it via custom test plugin or something along these lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kobelb was that intentional decision for some reason or we just didn't have any /internal routes that would require authorization check at that time.
There were no /internal
routes when this was written, and we decided to only run that route middleware when the url began with /api
. We can likely remove that restriction entirely, as we're only performing the authorization logic on route which have the access:
tag. I created #58351 to see the impact this has. @legrego do you recall any reason why we included the restriction that the route must start with /api
originally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kobelb nope I don’t think we had a great reason, beyond what you already mentioned that we didn’t have any /internal routes at that time. I don’t see a problem removing this restriction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed the commit adding the access tag to the route, to see if that doesn't introduce another 'surprise' from the FTR call. Waiting for #58351 to change the endpoint from api
to internal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI just passed on #58351, so I think we should be good, just waiting on a review at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see #58351 has been merged. Thanks a lot to the whole security team 😄 Will move the endpoint to the internal
prefix tomorrow, and we should be good to go.
}, | ||
router.handleLegacyErrors(async (context, req, res) => { | ||
const migrator = await migratorPromise; | ||
const result = await migrator.runMigrations({ rerun: true }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is result
going to be consumed by the client somehow? If not, maybe it'd be safer to not return it at all for now (ideal option, we can log though) or add type annotation here (so that TS complain if signature changes) or just pick specific fields in case migrator starts returning something more sensitive? I know it sounds a bit paranoid, but you never know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not atm. I can just return a {status: 'ok'}
dummy response instead
path: '/_migrate', | ||
validate: false, | ||
}, | ||
router.handleLegacyErrors(async (context, req, res) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: how unexpected migration errors look like (5xx)? May they include something that we don't want consumer to see? If so, we may need to try/catch errors in the handler, log them and return something more opaque.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly here to catch the elasticsearch service errors, that are currently thrown as boom errors. These errors are already thrown as-is from other SO endpoints, so I think there is no risk here, but I can try/catch and return a generic error instead if we think this is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors are already thrown as-is from other SO endpoints, so I think there is no risk here
If so, then yeah, it's fine as it's right now.
…iver-call-migrate-endpoint
…er-call-migrate-endpoint' into kbn-55860-es-archiver-call-migrate-endpoint
…iver-call-migrate-endpoint
…iver-call-migrate-endpoint
Last changes have been done. If anyone wants to take a final look, please do it today. I will merge the PR tomorrow |
…iver-call-migrate-endpoint
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
…igrator (elastic#56971) * es-archiver call _migrate endpoint instead of creating a migrator * fix crlf.... * use promise instead of callback accessor * attempt with disabled authent * enable security again * set mapping to dynamic before calling migration endpoint * rename space data from non-spaced tests * add documentation on the `rerun` flag * create router with the `/api/saved_objects` prefix * add unit test about strict mapping * add integration test on migrate endpoint * wrap route handler with handleLegacyErrors * add remark about limitations of the rerun flag * Apply suggestions from code review Co-Authored-By: Aleh Zasypkin <[email protected]> * do not return detailed index result * add access tag to migrate route * use /internal prefix for migrate endpoint * fix integ tests Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Josh Dover <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]>
…igrator (#56971) (#58575) * es-archiver call _migrate endpoint instead of creating a migrator * fix crlf.... * use promise instead of callback accessor * attempt with disabled authent * enable security again * set mapping to dynamic before calling migration endpoint * rename space data from non-spaced tests * add documentation on the `rerun` flag * create router with the `/api/saved_objects` prefix * add unit test about strict mapping * add integration test on migrate endpoint * wrap route handler with handleLegacyErrors * add remark about limitations of the rerun flag * Apply suggestions from code review Co-Authored-By: Aleh Zasypkin <[email protected]> * do not return detailed index result * add access tag to migrate route * use /internal prefix for migrate endpoint * fix integ tests Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Josh Dover <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]> Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Josh Dover <[email protected]> Co-authored-by: Aleh Zasypkin <[email protected]>
Summary
Fix #55860
Due to the fact that the savedObject types can now be registered in New Platform, and because these registrations are now made using programmatic API instead of static descriptors, the way the
es-archiver
worked until now (by manually parsing the plugin manifest to gather the types and schema, and then creates a migrator to migrate the index) is no longer an option. (See #55860 for full context).This PR provides a replacement by doing the following:
_migrate
endpoint as an internal savedObject API.es-archiver
to call this endpoint instead of manually performing the migration.Prerequisite for #50309, #50311, #50313
Checklist
For maintainers