Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32468][rpc] Switch from Akka to Pekko #22996

Merged
merged 12 commits into from
Jul 25, 2023
Merged

Conversation

zentol
Copy link
Contributor

@zentol zentol commented Jul 14, 2023

Switch to the Apache fork of Akka. Akka support ends in September, so 1.18.x would eventually be affected.

I decided not to rename classes/modules because it feels like a lot of noise in the git history for virtually no gain.
May also make it easier to revert things if we do run into a problem during release testing.

@zentol zentol requested a review from XComp July 14, 2023 09:58
@flinkbot
Copy link
Collaborator

flinkbot commented Jul 14, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flink-rpc/flink-rpc-akka-loader:pom.xml -> "Prevent akka and scala from being visible ..."
flink-rpc/flink-rpc-akka:pom.xml -> "For dependency convergence in Akka 2.6.20"
AkkaBasedEndpoint JavaDoc: "Interface for Akka based rpc gateways."
AkkaInvocationhandler JavaDoc: "The Akka (RPC) address of {@link #rpcEndpoint} ..."
AkkaInvocationHandler log: "... This is usually caused by: 1) Akka failed sending ..."
PriorityThreadsDispatcher JavaDoc: "Akka dispatcher threads creates..."
RemoteAddressExtension JavaDoc: "{@link akka.actor.ActorSystem} {@link Extension} used ..."
RobustActorSystem comment: "some parts of the akka shutdown procedure ..."
flink-table/flink-table-planner-loader/pom.xml: "Prevent akka and scala from being visible to ..."

I guess, the comments should be properly converted, though. Don't you think?

@XComp
Copy link
Contributor

XComp commented Jul 14, 2023

I decided not to rename classes/modules because it feels like a lot of noise in the git history for virtually no gain.
May also make it easier to revert things if we do run into a problem during release testing.

I see your point. But in my opinion, there are even more examples (besides the ones I mentioned in my previous comment) where we should switch to pekko, e.g. see AkkaOptions:361 which should actually point to https://pekko.apache.org/docs/pekko/current/remoting-artery.html#failure-detector

@zentol zentol force-pushed the 32468 branch 2 times, most recently from de5e40f to adc88d4 Compare July 20, 2023 13:17
@zentol
Copy link
Contributor Author

zentol commented Jul 20, 2023

I did another pass over all references containing akka that aren't modules/classes/variables; this was mostly cosmetic but there were also genuine issues.
I also renamed all akka related config keys (with deprecated akka keys of course)

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming everything would make the review easier. 😇

Anyway, I'm wondering whether we could add a README to the flink-rpc-akka module where we could explain the history behind the Akka classes and Pekko documentation. But I understand that this could be also covered by the git history. I just think that having akka and pekko in the codebase side by side might be confusing to readers in the future.

Other code location I found:

  • There is a config parameter TaskManagerOptions.EXIT_ON_FATAL_AKKA_ERROR that contains akka
  • ActorSystemExtension and AkkaActorSystemTest have invalid references in its JavaDoc
  • RpcEndpoint's JavaDoc mentions Akka
  • nit: MessageSerializationTest mentions akka

@zentol
Copy link
Contributor Author

zentol commented Jul 24, 2023

There is a config parameter TaskManagerOptions.EXIT_ON_FATAL_AKKA_ERROR that contains akka

Which is deprecated.

@zentol zentol requested a review from XComp July 24, 2023 13:00
Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any other occurrences except for the AkkaOptions which is @PublicEvolving. Shall we add a deprecation flag here and plan to rename it with 2.0? A spotless:apply run is missing. But it looks good otherwise.

On the other note: Have we thought of providing a separate module flink-rpc-pekko and providing that one along flink-rpc-akka in 1.18 with pekko being the default one? Or is this too much of an effort and we're comfortable enough that Pekko works as is?

And we might want to brief the 1.18 release managers on that PR considering that we missed the feature freeze for 1.18 last weekend.

@zentol
Copy link
Contributor Author

zentol commented Jul 24, 2023

Have we thought of providing a separate module flink-rpc-pekko and providing that one along flink-rpc-akka in 1.18 with pekko being the default one?

Yes, but that implies duplicating all classes and increasing the dist size by ~30mb.

Or is this too much of an effort and we're comfortable enough that Pekko works as is?

I wouldn't say I'm comfortable, but I think it's better to potentially break things now than doing this in a bugfix release or running with an unsupported Akka version,

Shall we add a deprecation flag here and plan to rename it with 2.0?

That's a separate discussion I believe because it's an API breaking change. But the idea itself is good.

@XComp
Copy link
Contributor

XComp commented Jul 24, 2023

The changes look good now. I will bring this PR up in tomorrows 1.18 release sync. Would we break anything if we also rename pom modules? ...now that we've renamed the packages and classes as well.

@zentol
Copy link
Contributor Author

zentol commented Jul 24, 2023

Would we break anything if we also rename pom modules?

I think it'd be annoying for devs, because that implies having to rebuild the rpc system whenever you switch between the pre and post pekko state of the project.

Copy link
Contributor

@XComp XComp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. Nothing to add from my side. 👍 Let's wait for CI to become green and the release sync tomorrow.

@XComp
Copy link
Contributor

XComp commented Jul 25, 2023

fyi: the e2e1 test failure is fixed with FLINK-32632 in master

@XComp
Copy link
Contributor

XComp commented Jul 25, 2023

@mdedetrich pointed out today that there is a Pekko 1.0.1 release on the way. It includes a fix in pekko-remote which we use. It's not 100% clear whether the issue actually affects Flink users (it's about Scala class names with $). But considering that we have to do another CI run, anyway (due to the runtime-web test instability), it might be easy enough to bump the version to 1.0.1 right away. They triggered the publishing of the jars today (ML thread) and expect it to be done by tomorrow morning latest. WDYT?

On the topic of merging the PR: We got the verbal 👍 to merge this issue by the release managers in today's release sync. I was waiting for a summary of the meeting in the ML which didn't happen, yet.

@zentol
Copy link
Contributor Author

zentol commented Jul 25, 2023

I already restarted CI 30 minutes ago and would like to get this in today. If we haven't run into any issue so far I doubt we're affected by it (which seems more likely if you actually use Scala).

@zentol
Copy link
Contributor Author

zentol commented Jul 25, 2023

We can still bump it in the following days if needed.

@pjfanning
Copy link
Contributor

pjfanning commented Jul 25, 2023

I will release Pekko 1.0.1 jars early tomorrow (UTC). They are up for 24 hours for review. The pekko-remote issue is in an internal class but it is better to use the new release.

@zentol zentol merged commit c8ae39d into apache:master Jul 25, 2023
@pjfanning
Copy link
Contributor

@zentol Pekko 1.0.1 is available now

@mdedetrich
Copy link
Contributor

PR that bumps Pekko from 1.0.0 to 1.0.1 has been created at #23080

@He-Pin
Copy link
Member

He-Pin commented Jul 31, 2023

Hi team, I want to upgrade the current classic transport from netty 3 to netty 4 ,do you think that's ok?
I just asked on maillist https://lists.apache.org/thread/grzp3jw398rtpc6oqfybzxnry2mtyh64

@pjfanning
Copy link
Contributor

Pekko 1.0.x will be maintained for a while. It is only Pekko 1.1.x where we are considering changing to netty 4 (or even dropping netty and just keeping Artery remoting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants