Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

[Conjure Java Runtime] Part 26: Story of Thanatosis (Removing AtlasDB-Feign Entirely) #4561

Merged
merged 16 commits into from
Mar 10, 2020

Conversation

jeremyk-91
Copy link
Contributor

Goals (and why):

  • Remove the atlasdb-feign project from the codebase.
  • Generally leave people fewer complications to think about in the remoting stack (e.g. James thought that we could still fallback when beginning to write the Conjure migration). I thought I'd give a 1 hour stab at this, and it looks like it wasn't as hard as I thought.

Implementation Description (bullets):

  • Remove atlasdb-feign, fix compile and checkstyle issues that arise from that.

Testing (What was existing testing like? What have you done to improve it?):

  • Existing tests still pass.
  • The code compiling serves as verification that it isn't referenced any more.

Concerns (what feedback would you like?):

  • I wanted to eliminate RemotingClientConfig as well, but wasn't too happy with the API break/mess there.
  • Should the experiment running proxy code be retained?
  • There are a few trouble spots:
    • Leader block mechanism to elide remote calls in a 1 node cluster used Feign but have now been switched to CJR. I've updated the failure mechanism to match, but is this correct?
    • Benchmarking infrastructure had some fancy magic with FeignOkHttpClients, which of course isn't used in CJR. Does this still work?
  • Are there some edge cases that I've missed out?

Where should we start reviewing?: Top down, looking at places which are actually +/- as opposed to just - is probably a good approach.

Priority (whenever / two weeks / yesterday): Whenever

@jeremyk-91 jeremyk-91 requested a review from gmaretic February 11, 2020 11:44
@changelog-app
Copy link

changelog-app bot commented Feb 11, 2020

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

The atlasdb-feign project is no longer published. Remoting within AtlasDB has already been going through Conjure Java Runtime for multiple versions. Direct users of classes in this project will need to find suitable alternatives - please contact the AtlasDB team if this is an issue.

Check the box to generate changelog(s)

  • Generate changelog entry

() -> AtlasDbFeignTargetFactory.DEFAULT.createProxyWithFailover(serverListConfig, type, parameters),
type,
parameters);
return VersionSelectingClients.instrumentWithClientVersionTag(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read the PR, but we need to make sure we don't miss the code in ExperimentRunningProxy:

this.refreshingExperimentalServiceSupplier = Suppliers.memoizeWithExpiration(
                experimentalServiceSupplier::get, CLIENT_REFRESH_INTERVAL.toMinutes(), TimeUnit.MINUTES);  

Without this we loose the 30 minute client reload thing, meaning we'll get the lockups back again.

Copy link
Contributor

@felixdesouza felixdesouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mainly good, couple of questions.

I do think we should push through with the deletion/stop deserializing remoting config. It's used in 2 places internally, and if we're keeping the ALWAYS_USE_CONJURE bit, then they will continue to work no problem. But just to clean up/require less cognitive load when perusing through the remoting layer, we should remove the configs and the VersionSelectingClient.

() -> AtlasDbFeignTargetFactory.DEFAULT.createProxyWithFailover(serverListConfig, type, parameters),
type,
parameters);
Supplier<T> clientFactory = () -> VersionSelectingClients.instrumentWithClientVersionTag(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove VersionSelectingClients? I don't think it's relevant anymore.

* the existing instance. Once the refresh interval has expired, a new instance is not guaranteed to be created until
* a method is invoked on the proxy.
*/
public class SelfRefreshingProxy<T> extends AbstractInvocationHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, in my hack branch, I have the same thing, although a bit condensed, it's just the Suppliers.memoizeWithExpiration piece etc.

log.warn("Failed to read remote timestamp server ID", e);
logAfter = LOGGING_INTERVAL;
} catch (UnknownRemoteException e) {
if (400 <= e.getStatus() && e.getStatus() <= 499) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you go into a bit more detail as to why it has to be done this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added a comment. Basically it looks like the treatment of an unmapped 4xx has changed: this was a ClientErrorException in the Feign world, but looks like that has been changed to be a UnknownRemoteException in the CJR world.

@@ -48,7 +48,11 @@ default boolean shouldSupportBlockingOperations() {
return true;
}

/**
* @deprecated Conjure is now always used, and supplying alternative configs is not possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true? still feels like you can have something that gets deserialised into a RemotingClientConfig no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go through and purge the config stuff properly. Deprecations don't really do anything if we never remove them.

I also don't think we want to deprecate this anyway. Since if we do need to open up other configurations i.e. if we want to add some configurability for our own testing purposes, this is the right place to do it.

Starting from the config RemotingClientConfig, we can delete the two properties, and if we're super concerned we can do what we did in AtlasDbConfig, where as a class level annotation, we do @JsonIgnoreProperties({"maximumConjureRemotingProbability", "enableLegacyClientFallback"} and then we delete those properties.

We should probably add a RemotingClientConfigs.DEFAULT configuration, and then have the old thing RemotingClientConfigs.ALWAYS_USE_CONJURE refer to that, and change all of our references to that.

Then we can propagate the deletion of the java properties everywhere (i.e. and continue to get rid of VersionSelectingClients.

@@ -188,7 +187,7 @@ private static HealthCheckPinger getMockOfPingableLeaderWherePingReturns(Set<Cli

private static HealthCheckPinger getMockOfPingableLeaderWherePingThrows() {
HealthCheckPinger mockLeader = mock(HealthCheckPinger.class);
when(mockLeader.apply(ALL_CLIENTS)).thenThrow(mock(AtlasDbRemoteException.class));
when(mockLeader.apply(ALL_CLIENTS)).thenThrow(mock(IllegalStateException.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, sorry about this 😞

Copy link
Contributor Author

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the easier-to-fix bits first, later part will come soon but have other priorities at the moment. Thanks for the review.

log.warn("Failed to read remote timestamp server ID", e);
logAfter = LOGGING_INTERVAL;
} catch (UnknownRemoteException e) {
if (400 <= e.getStatus() && e.getStatus() <= 499) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added a comment. Basically it looks like the treatment of an unmapped 4xx has changed: this was a ClientErrorException in the Feign world, but looks like that has been changed to be a UnknownRemoteException in the CJR world.

Copy link
Contributor

@felixdesouza felixdesouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably get rid of the extra instrumentation since we don't need it. But other than that looks good. Will leave it to you to merge when you're ready.

metricsManager.getTaggedRegistry(),
ConjureJavaRuntimeTargetFactory.DEFAULT.createProxyWithQuickFailoverForTesting(
serverListConfig, type, parameters),
type);
return SelfRefreshingProxy.create(clientFactory, type);
}

private static <T> T instrument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, instrumentation has always been separate from creating clients. i.e. ServiceCreator and in timelock (before and after my changes). Unless they're all changed, we're going to double report, and likely we'll have wrong metrics on our dashboards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: This is messy, because just removing will mean that we lose some metrics - not true that all ServiceCreator created things are instrumented. Will do as a follow up to this PR.

@@ -21,8 +21,5 @@ private RemotingClientConfigs() {
// Constants
}

public static final RemotingClientConfig ALWAYS_USE_CONJURE = ImmutableRemotingClientConfig.builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self/you: you'll have to change the places where this is referred to explicitly namely big internal product and it's related projects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right. Thanks

@jeremyk-91 jeremyk-91 merged commit 5fa9117 into develop Mar 10, 2020
@delete-merged-branch delete-merged-branch bot deleted the jkong/cjr-burn branch March 10, 2020 19:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants