Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt es-archiver to call _migrate endpoint instead of creating a migrator #56971

Merged

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Feb 6, 2020

Summary

Fix #55860

Due to the fact that the savedObject types can now be registered in New Platform, and because these registrations are now made using programmatic API instead of static descriptors, the way the es-archiver worked until now (by manually parsing the plugin manifest to gather the types and schema, and then creates a migrator to migrate the index) is no longer an option. (See #55860 for full context).

This PR provides a replacement by doing the following:

  • Expose a new _migrate endpoint as an internal savedObject API.
  • Modify the es-archiver to call this endpoint instead of manually performing the migration.

Prerequisite for #50309, #50311, #50313

Checklist

For maintainers

@pgayvallet pgayvallet added Feature:New Platform Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.7.0 v8.0.0 labels Feb 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@pgayvallet pgayvallet added the release_note:skip Skip the PR/issue when compiling release notes label Feb 6, 2020
@pgayvallet pgayvallet force-pushed the kbn-55860-es-archiver-call-migrate-endpoint branch from 294813f to 1943e3a Compare February 6, 2020 13:11
@elasticmachine

This comment has been minimized.

@elastic elastic deleted a comment from kibanamachine Feb 6, 2020
@elasticmachine

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor Author

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on the modified data.json.gz files:

The only behavioural difference between the previous and new mechanism is that the server migrator only got knowledge of the types from the enabled plugins, where the embed es-archiver migrator was registering every types to the migrator's schema. This normally has no impact, however some specific test suites included data with migrationVersion for spaces, when the test suite had the spaces plugin disabled. This caused the bug logged here: #56731. Only 3 data files, all from the reporting plugin, are impacted.

@tsullivan This is the cause of the changes in the x-pack/test/functional/es_archives/reporting datas. I only removed the default space document, that wasn't used anyway as all the tests loading the data are running without spaces.

Comment on lines 43784 to 43787
/**
* Run the saved objects migration
*/
async migrate() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL we need to manually generate and commit this file, and best part is it got mixed LF and CRLF line feeds.

Comment on lines -98 to 103
public runMigrations(): Promise<Array<{ status: string }>> {
if (this.migrationResult === undefined) {
public runMigrations({ rerun = false }: { rerun?: boolean } = {}): Promise<
Array<{ status: string }>
> {
if (this.migrationResult === undefined || rerun) {
this.migrationResult = this.runMigrationsInternal();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is very naive: if a migration is already running and we call runMigration again with return=true, multiple migrations can occur at the same time.

We should maybe wait for the pending migration to finish before triggering the next one. But that adds some complex logic to handle edge cases such as multiple rerun=true calls when a migration is already running. We should wait for the migration to finish but only re-execute ONCE and return the same promise for every calls to rerun that occurred during the initial migration.

As this API is internal I though current implementation was acceptable. Tell me if we should fireproof it more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, in theory, it could lead to a race condition if 2 migrations run over the same data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it could yes, even if I don't think it could leads to data corruption, as in worse case, both 'threads' will access the data before any actually migrated it, and the document would be migrated then updated 'twice' from the same original data, which would only impact performances.

I can try to improve that by 'pooling' all the calls to runMigration(rerun=true) while one migration is running and only execute the rerun once, as explained in previous comment, if we think current implementation is too fragile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay as long as this API is used only for tests internally. Could you add a comment about this known limitation?

Comment on lines 244 to 247
registerRoutes({
http: setupDeps.http,
migratorPromise: this.migrator$.pipe(first()).toPromise(),
});
Copy link
Contributor Author

@pgayvallet pgayvallet Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration is done during setup, the migrator is not yet created. Had to use a subject/promise as workaround.

src/es_archiver/lib/indices/kibana_index.ts Show resolved Hide resolved
@pgayvallet pgayvallet marked this pull request as ready for review February 7, 2020 15:32
@pgayvallet pgayvallet requested review from a team as code owners February 7, 2020 15:32
@@ -95,8 +95,10 @@ export class KibanaMigrator {
* The promise resolves with an array of migration statuses, one for each
* elasticsearch index which was migrated.
*/
public runMigrations(): Promise<Array<{ status: string }>> {
if (this.migrationResult === undefined) {
public runMigrations({ rerun = false }: { rerun?: boolean } = {}): Promise<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a @param to the comment to explain what this option does / how / when it should be used?

src/core/server/saved_objects/routes/index.ts Outdated Show resolved Hide resolved
src/es_archiver/lib/indices/kibana_index.ts Show resolved Hide resolved
validate: false,
},
async (context, req, res) => {
const migrator = await migratorPromise;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log a warning if this endpoint is used in production env? (and maybe if the CI env var isn't set?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there aren't any tests on this endpoint / logic. I know this is an internal API, but it still feels like we should test that we don't break this by accident. Though CI would probably fail if this broke, it would be much better if we had more insight into what broke to cause the failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log a warning if this endpoint is used in production env? (and maybe if the CI env var isn't set?)

We are running the FTR suites against production cloud, so I'm not sure we should be logging a warning? At least we should not check the CI var env?

Also there aren't any tests on this endpoint / logic

Yea, I know. I can add basic unit test to check that 1/ the route is registered and 2/ calling the route properly calls the migrator. However I have no idea how to go further than that, as the API call is actually testing in the FTR. We can't really functional test a component used by the FTR framework itself, can we? Any insight on how to test that on a 'higher' level that unit is welcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added integration tests on the endpoint to ensure it properly calls migrator.runMigrations with the correct parameters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an internal API

Is Platform going to promote any naming convention to clearly distinguish public (with BWC guarantees and documentation) and internal (no BWC guarantees, implicit use at your own risk warning and optional documentation) API endpoints? In Spaces/Security we started to define internal routes with /internal/ prefix instead of /api/ as an experiment that has been working well enough so far.

I'm a bit concerned that users will quickly discover this endpoint and may start using it in any imaginable context without realizing all the consequences leading to hard-to-handle/diagnose support cases....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone against /internal/saved_objects/_migrate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to introduce another convention without a broad discussion. as of me, internal could work, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are running the FTR suites against production cloud

By the way, does it mean that env.packageInfo (dist, buildNum etc.) in this case is completely indistinguishable from the one we have when Kibana is run in production outside of functional tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/internal/saved_objects/_migrate kinda breaks the SO route prefix. This is very minor, but forces to use a / based router when declaring routes instead of a /saved_objects one. Would /saved_objects/internal/_migrate be acceptable, or do we really want the route to starts with /internal ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, did not understand the internal prefix was a replacement for the api one. Please forget my previous question

@joshdover
Copy link
Contributor

@elastic/kibana-security We could probably use some feedback on this from a security perspective. More context in the parent issue, but I just want to make sure there aren't security risks from this that we don't realize.

Copy link
Contributor

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Would like to wait on Security team before merging.

@rudolf
Copy link
Contributor

rudolf commented Feb 17, 2020

@elasticmachine merge upstream

@azasypkin azasypkin self-requested a review February 17, 2020 12:30
Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM securitywise.

await root.setup();
await root.start();
migratorInstanceMock.runMigrations.mockClear();
}, 30000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we really need this explicit timeout? AFAIK we set large enough global timeout for all integration jest tests unless something has changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it was added not to wait for 10 minutes to fail the test.

Copy link
Member

@azasypkin azasypkin Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️ if we think Kibana startup and CI are supposed to be faster now we can decrease global timeout and set it to something more reasonable once and for all (out of scope of this PR, just thinking aloud).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it was added not to wait for 10 minutes to fail the test.

Yea, that was it.

if we think Kibana startup and CI are supposed to be faster now we can decrease global timeout and set it to something more reasonable once and for all

+1, but changing the global timeout is not always a pleasant surprise in term of results, this is way outside of the scope of this PR.

validate: false,
},
async (context, req, res) => {
const migrator = await migratorPromise;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an internal API

Is Platform going to promote any naming convention to clearly distinguish public (with BWC guarantees and documentation) and internal (no BWC guarantees, implicit use at your own risk warning and optional documentation) API endpoints? In Spaces/Security we started to define internal routes with /internal/ prefix instead of /api/ as an experiment that has been working well enough so far.

I'm a bit concerned that users will quickly discover this endpoint and may start using it in any imaginable context without realizing all the consequences leading to hard-to-handle/diagnose support cases....

@joshdover joshdover self-assigned this Feb 18, 2020
Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to not asking these questions during my initial review, just drifted away while thinking about public vs internal APIs and lost the track 🙈

) => {
router.post(
{
path: '/_migrate',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I'm wondering if it'd make sense to at least tie this to savedObjectsManagement:all privilege and use something like access:migrateSavedObjects tag here.

The only thing is that I can't recall a case when we did something like this for internal APIs. @kobelb what do you think? Am I missing something and there is a better way to manage access to internal APIs that perform actions on behalf of internal Kibana user (SO migration)?

Copy link
Contributor

@kobelb kobelb Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only using this endpoint internally for the functional test runner and don't intend for end-users in production to use this endpoint, I wouldn't tie this to the savedObjectsManagement:all privilege. End-users who get all access to the saved-object management feature would then be able to use this.

This route is rather different than our other "internal" routes. Our other internal routes are intended to be used from the browser, but end-users are authorized to access them. We don't really want this route to be used in production at all. We definitely can should use the access: tag on the route to restrict which users should have access to this route. However, we really only want it to be used by the functional test runner, which I don't believe has a real user to represent. If we used an access:migrateSavedObjects tag here, and then didn't add it to any privileges only a superuser would be able to invoke this API, as they're authorized to do everything.

Copy link
Contributor

@kobelb kobelb Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other stipulation is that relying on the security plugin's authorization won't apply when the user hasn't enabled security, which is the default. What are the repercussions of any user being able to start migrations in these scenarios?

Copy link
Contributor

@joshdover joshdover Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to come up with a bad scenario and coming up empty. Migrations only get run on objects from older versions, which the only way you could create objects from older versions is manually via direct Elasticsearch API calls, which only superusers and kibana_system should have access to do.

Adding access:migrateSavedObjects tag makes sense to me as an extra layer of precaution, but I don't think we need to do anything extra for the no-security scenario. Does that work for y'all?

Copy link
Member

@azasypkin azasypkin Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having just access:migrateSavedObjects sounds like a reasonable trade-off to me assuming it works with esArchiver and there is no reliable way no detect "FTR" scenario using env.packageInfo.dist/buildNum (to not register this endpoint at all otherwise or register it via custom test plugin or something along these lines).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kobelb was that intentional decision for some reason or we just didn't have any /internal routes that would require authorization check at that time.

There were no /internal routes when this was written, and we decided to only run that route middleware when the url began with /api. We can likely remove that restriction entirely, as we're only performing the authorization logic on route which have the access: tag. I created #58351 to see the impact this has. @legrego do you recall any reason why we included the restriction that the route must start with /api originally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kobelb nope I don’t think we had a great reason, beyond what you already mentioned that we didn’t have any /internal routes at that time. I don’t see a problem removing this restriction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed the commit adding the access tag to the route, to see if that doesn't introduce another 'surprise' from the FTR call. Waiting for #58351 to change the endpoint from api to internal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI just passed on #58351, so I think we should be good, just waiting on a review at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see #58351 has been merged. Thanks a lot to the whole security team 😄 Will move the endpoint to the internal prefix tomorrow, and we should be good to go.

},
router.handleLegacyErrors(async (context, req, res) => {
const migrator = await migratorPromise;
const result = await migrator.runMigrations({ rerun: true });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is result going to be consumed by the client somehow? If not, maybe it'd be safer to not return it at all for now (ideal option, we can log though) or add type annotation here (so that TS complain if signature changes) or just pick specific fields in case migrator starts returning something more sensitive? I know it sounds a bit paranoid, but you never know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not atm. I can just return a {status: 'ok'} dummy response instead

path: '/_migrate',
validate: false,
},
router.handleLegacyErrors(async (context, req, res) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how unexpected migration errors look like (5xx)? May they include something that we don't want consumer to see? If so, we may need to try/catch errors in the handler, log them and return something more opaque.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly here to catch the elasticsearch service errors, that are currently thrown as boom errors. These errors are already thrown as-is from other SO endpoints, so I think there is no risk here, but I can try/catch and return a generic error instead if we think this is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors are already thrown as-is from other SO endpoints, so I think there is no risk here

If so, then yeah, it's fine as it's right now.

@joshdover joshdover removed their assignment Feb 24, 2020
@pgayvallet
Copy link
Contributor Author

pgayvallet commented Feb 25, 2020

Last changes have been done. If anyone wants to take a final look, please do it today. I will merge the PR tomorrow

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit a4e82c4 into elastic:master Feb 26, 2020
pgayvallet added a commit to pgayvallet/kibana that referenced this pull request Feb 26, 2020
…igrator (elastic#56971)

* es-archiver call _migrate endpoint instead of creating a migrator

* fix crlf....

* use promise instead of callback accessor

* attempt with disabled authent

* enable security again

* set mapping to dynamic before calling migration endpoint

* rename space data from non-spaced tests

* add documentation on the `rerun` flag

* create router with the `/api/saved_objects` prefix

* add unit test about strict mapping

* add integration test on migrate endpoint

* wrap route handler with handleLegacyErrors

* add remark about limitations of the rerun flag

* Apply suggestions from code review

Co-Authored-By: Aleh Zasypkin <[email protected]>

* do not return detailed index result

* add access tag to migrate route

* use /internal prefix for migrate endpoint

* fix integ tests

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Josh Dover <[email protected]>
Co-authored-by: Aleh Zasypkin <[email protected]>
pgayvallet added a commit that referenced this pull request Feb 26, 2020
…igrator (#56971) (#58575)

* es-archiver call _migrate endpoint instead of creating a migrator

* fix crlf....

* use promise instead of callback accessor

* attempt with disabled authent

* enable security again

* set mapping to dynamic before calling migration endpoint

* rename space data from non-spaced tests

* add documentation on the `rerun` flag

* create router with the `/api/saved_objects` prefix

* add unit test about strict mapping

* add integration test on migrate endpoint

* wrap route handler with handleLegacyErrors

* add remark about limitations of the rerun flag

* Apply suggestions from code review

Co-Authored-By: Aleh Zasypkin <[email protected]>

* do not return detailed index result

* add access tag to migrate route

* use /internal prefix for migrate endpoint

* fix integ tests

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Josh Dover <[email protected]>
Co-authored-by: Aleh Zasypkin <[email protected]>

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: Josh Dover <[email protected]>
Co-authored-by: Aleh Zasypkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:New Platform release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.7.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapt ES-Archiver to use savedObject types defined in the new platform
10 participants