Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Make it easy to create/consume massive changes without breaking major version #2749

Closed
peternied opened this issue Oct 17, 2022 · 27 comments
Labels
enhancement New Enhancement

Comments

@peternied
Copy link
Member

Is your feature request related to a problem? Please describe

Recently OpenSearch took a new version of apache http client, this has disrupted many plugins including the security plugin and has caused those plugins main builds to stop working. This is blocks the standard practice of making commits to main, then going to the next minor/next patch release.

Describe the solution you'd like

When releasing 2.0.0 there was an -alpha version of this build. This created an opportunity to adopt the different version and check if there were breaking changes or not. As I remember this was complex and required a lot of work, if this was easy to do and pickup it might make an easier process for making larger breaking changes available for testing.

Could we support building from a feature branch of OpenSearch to create a -{FEATURE_NAME} version of the build?

Describe alternatives you've considered

Alternative, we do nothing and things are working as expected.

Additional context

@reta This was built off the proposal you had when we triaged opensearch-project/security#2165

@peternied peternied added enhancement New Enhancement untriaged Issues that have not yet been triaged labels Oct 17, 2022
@saratvemulapalli
Copy link
Member

Thanks @peternied for opening this up. I'll move this to opensearch-plugins.

@saratvemulapalli saratvemulapalli transferred this issue from opensearch-project/opensearch-build Oct 17, 2022
@saratvemulapalli saratvemulapalli transferred this issue from opensearch-project/opensearch-plugins Oct 17, 2022
@saratvemulapalli
Copy link
Member

Just had a chat with @peternied. Looks like the question mostly is around the feature of supporting feature branches for the build.

@peternied
Copy link
Member Author

why are breaking changes on 3.x causing problems in 2.x,1.x in plugins

Our main branches are pointing to 3.0.0 , so we default to accept work on main and then backport it to 2.x , etc...

Which makes sense, but the breaking changes will only go in Majors, so shouldn’t the changes be limited to 3.0 and not backport them ?

True, but the change is so disruptive that we are still working on handling it with no firm ETA.


Hopefully this is something we solve with extensions.

I think this kind of break is a classic case for a mono-repo, its easy to measure impact of a change if you can see all of it at once before merging.

In distributed environments being really careful about breaking interfaces is the name of the game. In this case there are direct dependencies passed via public interfaces and internal systems.

Extensions is a much healthy place for managing these interfaces in a more abstract way.

Recapped from an offline conversation with @saratvemulapalli

@cliu123
Copy link
Member

cliu123 commented Oct 18, 2022

@opensearch-project/engineering-effectiveness Is it possible to have a build including OpenSearch core and all plugins for every PR of OpenSearch core before merging to surface up the potential failures in plugins earlier? Breaking changes really hold off plugins and make development on main impossible, and this could happen more frequently as more plugins start building 3.0.
Cc: @dblock @nknize

@reta
Copy link
Contributor

reta commented Oct 18, 2022

@cliu123 @saratvemulapalli @peternied I believe we should think beyond just core plugins but the ecosystem as a whole. By releasing milestones (3.0.0-M1, 3.0.0-M2, ... ) we allow not only core plugins to catch up with the changes (depending on SNAPSHOT is always unstable), but allow external plugin authors to gradually prepare for what is coming. With these regards, we should not only rely on the CI/build infrastructure but keep in mind that it is not available for external plugins.

@cliu123
Copy link
Member

cliu123 commented Oct 18, 2022

@cliu123 @saratvemulapalli @peternied I believe we should think beyond just core plugins but the ecosystem as a whole. By releasing milestones (3.0.0-M1, 3.0.0-M2, ... ) we allow not only core plugins to catch up with the changes (depending on SNAPSHOT is always unstable), but allow external plugin authors to gradually prepare for what is coming. With these regards, we should not only rely on the CI/build infrastructure but keep in mind that it is not available for external plugins.

Thanks for pointing that out! Yes, by plugins, I was referring to all OpenSearch plugins(core + external), not any particular ones. Is there a way to have a build bundling core + all plugins on each OpenSearch core PR before merging?

@reta
Copy link
Contributor

reta commented Oct 18, 2022

Is there a way to have a build bundling core + all plugins on each OpenSearch core PR before merging?

That the problem :) we don't know what are those external plugins (in general, only the ones people publish here https://opensearch.org/community_projects/)

@cliu123
Copy link
Member

cliu123 commented Oct 18, 2022

Is there a way to have a build bundling core + all plugins on each OpenSearch core PR before merging?

That the problem :) we don't know what are those external plugins (in general, only the ones people publish here https://opensearch.org/community_projects/)

I see. IMO, If there would be a way to do that for some plugins, that would be better than not doing that for any plugins. 😄

@reta
Copy link
Contributor

reta commented Oct 19, 2022

I see. IMO, If there would be a way to do that for some plugins, that would be better than not doing that for any plugins. smile

Yeah, no disaligment here, we could employ both approaches (build core + plugins and milestones) at the same time

@dblock
Copy link
Member

dblock commented Oct 19, 2022

@opensearch-project/engineering-effectiveness Is it possible to have a build including OpenSearch core and all plugins for every PR of OpenSearch core before merging to surface up the potential failures in plugins earlier? Breaking changes really hold off plugins and make development on main impossible, and this could happen more frequently as more plugins start building 3.0. Cc: @dblock @nknize

This was suggested in opensearch-project/OpenSearch#3740. Please feel free to contribute.

@peterzhuamazon
Copy link
Member

I think each plugin repo can have a daily workflow running to compile against the latest maven snapshot of OpenSearch core, so we can detect the issues earlier.

@gaiksaya
Copy link
Member

@opensearch-project/engineering-effectiveness Is it possible to have a build including OpenSearch core and all plugins for every PR of OpenSearch core before merging to surface up the potential failures in plugins earlier? Breaking changes really hold off plugins and make development on main impossible, and this could happen more frequently as more plugins start building 3.0. Cc: @dblock @nknize

Running the entire distribution build on each PR is going to slow down the development process for OpenSearch core since it takes almost an hour to finish it. One way is to run the distribution build as an optional CI check. Make it a required CI check if something like breaking change label is added.

@rishabh6788 rishabh6788 removed the untriaged Issues that have not yet been triaged label Oct 25, 2022
@bbarani
Copy link
Member

bbarani commented Oct 25, 2022

One other option we are exploring is to copy the snapshots associated with successful builds (with plugins) to a new folder (ex: /lastSuccessfulBuild). Basically any distribution failure would end up publishing the core snapshot only on /snapshots/latest and not under /snapshots/lastSuccessfulBuild. Plugin teams have an option either to use the /latest or /lastSuccessfulBuild depending on their needs. @dblock @peternied @cliu123 @reta

@dblock
Copy link
Member

dblock commented Oct 26, 2022

I understand that semantically this makes sense to add lastSuccessfulBuild (we should call it latest-successful to match), but should we just instead change the existing latest to be the last complete distribution build? I believe that was its original intent. It would be a lot less work for plugins that already rely on latest, instantly improving their workflows. When do we need the last incomplete distribution build as latest?

Some other naming ideas: we could have latest/success and latest/failure to make it prettier ;)

@gaiksaya
Copy link
Member

If we combine them (use latest) and only publish when build is successful, wouldn't we be binding the snapshot publishing to our distribution build entirely? Also how would the plugin teams know that some change in core broke the plugin compilation too until the component build fails at distribution build level?
Right now we are pro-active in detecting the failures as CIs are failing. Might be little late when it starts failing at distribution build level?
The latest can point to snapshots irrespective of success or failure of distribution build (we can move this to new workflow to decouple if required)
The success sounds good to me. Plugin teams can add CI check for both in their repo that won't stop active development but will also check for build failures before they add the component to the manifest(optional).

@dblock
Copy link
Member

dblock commented Oct 27, 2022

@gaiksaya I think we're saying alomst the same thing. Do you mean that a successful build of OpenSearch Engine (core) or common-utils = code builds + snapshot published? And a successful distribution build would be that all those parts built together into a .zip/.tar distribution? If so I am 100% there, we just need the Jenkins job to do that. In which case I then don't know why I'd want "latest" to mean anything other than "latest successful" other than for debugging when it's failed.

@gaiksaya
Copy link
Member

gaiksaya commented Oct 27, 2022

@dblock Sorry for not being clear.

  • When I say successful I mean OpenSearch Engine (core/min) + all plugins in the manifest.
  • When I say latest I mean only OpenSearch Engine (core/min) and nothing else.

We can go with only just successful but one scenario I can think about is for user wanting to use min snapshot for whatever reason (eg external plugin development or wants just the engine for some reason)
We are blocking that by saying we want all plugin members in manifest to fix their issues first and then when everything is green we will publish a snapshot which can take days. Is that okay?

@dblock
Copy link
Member

dblock commented Oct 28, 2022

I can't think of any time we would want "latest" as defined above, hence I was proposing to change the definition of "latest" (rename "successful" to "latest" in your example). You could probably try that and see what happens. And if we need the last build redirect, we could call that "current", which I think semantically would be more correct.

@gaiksaya
Copy link
Member

In that case (if we all agree) I believe we just need to move the uploading snapshots part to only successful scenario. Nothing else changes.

@bbarani
Copy link
Member

bbarani commented Oct 31, 2022

@gaiksaya @dblock I am not sure if I completely understand here. Are we saying that we will upload snapshots to/latest only after successful full build? There are cases when the core snapshot breaks just few plugins but doesn't affect other plugins and we should provide them ways to use the /latest snapshots and not just the/successfulversion for non-impacted plugins

@gaiksaya
Copy link
Member

That was exactly my point @bbarani. @dblock @peternied @saratvemulapalli @reta What do you think?

@reta
Copy link
Contributor

reta commented Nov 1, 2022

In that case (if we all agree) I believe we just need to move the uploading snapshots part to only successful scenario. Nothing else changes.

Just to reiterate, successful scenario is (core/min) + all plugins in the manifest, correct? I think it does not help with large scale change, it is chicken and egg problem: because we don't publish feature / branch artifacts, it is not possible to make the successful pull requests against any plugin to incorporate the large change (change in core + change in all plugins). Or I am missing something here?

@peternied
Copy link
Member Author

Recapturing the scenarios around OpenSearch Core merges a change and then...:

  • There is no negative impact downstream
  • There is an impact to build workflow of the OpenSearch distribution
  • There is an impact to test workflow of the OpenSearch distribution
  • There is an impact to build workflow of plugin(s) repositories
  • There is an impact to test workflow of plugin(s) repositories

I think we have strong mechanism for the detecting / driving issues impacting OpenSearch distribution - Major props to @opensearch-project/engineering-effectiveness

Where I am seeing struggles are those individual repositories. While having more build options can help prevent getting stuck, it doesn't stop the fundamental reason we are using a unified branching strategy - no matter when there are changes to OpenSearch Core / Dashboards that need to be resolved and if they cannot be worked through we need to bubble that message back up to Core/Dashboards to work through.

I doubt that there is a mechanism to manage incoming breaking changes that would result in a the same or less work later in the development lifecycle. Leaning in creating issues upward and trying to find who can help fix with issues seems like the best mechanism with the lowest overhead.

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Nov 2, 2022

Hey from my opinion, I dont think so its worth to have separate paths and allow plugin teams to consume them, there are chances consuming core from either paths can also break the dependent plugins (may be it can limit).
But from term -SNAPSHOT, its expected to consider something can break if consuming the SNAPSHOT artifact.

Coming to the main point once the changes are pushed to OpenSearch, its curated carefully to backport based on if its a breaking change or not (so the downstream and releases branches are covered from pushing a breaking change). main is always on active development and where code moves fast which is true as per our branching strategy, so I feel its expected (but not always) to see some compatibility issues for the dependent repos. Even lets say its anticipated to cause a breaking change when pushed to main as @reta mentioned there are lot of internal and external plugins which is not worth to start a campaign and wait to push the code to main.

@peterzhuamazon
Copy link
Member

This PR temporarily resolve the issue to some degree.
Basically it allows us always have the best latest copy in S3 being pointed to /latest/.
If there is anything broken or user needs to update maven, we can remove the toggle so latest wont get affected. Once issue fixed, we can re-run with the toggle so latest can be updated to the actual fully corrected new version.

#2881

@bbarani
Copy link
Member

bbarani commented Jun 28, 2023

We are exploring options to surface breaking changes as part of this issue

@gaiksaya
Copy link
Member

Closing this issue in favor of opensearch-project/opensearch-devops#114
We implemented compatibility-checker that indeed gave appropriate results after few weeks of false negatives. Next milestone to notify the concerned repo about breaking change will be tracked in opensearch-project/opensearch-devops#114

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New Enhancement
Projects
None yet
Development

No branches or pull requests

10 participants