Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

ABR 18: More ETEs #5911

Merged
merged 32 commits into from
Feb 21, 2022
Merged

ABR 18: More ETEs #5911

merged 32 commits into from
Feb 21, 2022

Conversation

gsheasby
Copy link
Contributor

Goals (and why):
Add further ETE tests, fixing a few problems along the way.
Notably:

  • The restore workflow needs to use TimelockManagementService while the namespace is disabled. To achieve this, I added TimelockNamespaces.getIgnoringDisabled, to bypass the check in client creation. This means that other code paths using TMS would also work while the namespace is disabled; but since this interface only contains ping and fastForwardTimestamp, this should be safe (for reasonable timestamp values).
  • In CoordinationServiceRecorder/CassandraRepairHelper, we use a KeyValueService instance. In the production codepath, these will be newly created, and thus we must use try-with-resources to avoid leaking resources. However, the ETE test codepath uses the KVS from the TransactionManager, which must not be closed. To fix this issue, I pulled out a little KvsRunner class.

==COMMIT_MSG==
Fixed an issue where the AtlasRestoreClient would be unable to perform necessary operations on timelock (fast forwarding the timestamp) during the restore process.
==COMMIT_MSG==

Implementation Description (bullets):

  • Added ETE tests
  • Fixed the exposed issues

Testing (What was existing testing like? What have you done to improve it?):

  • New ETE tests

Concerns (what feedback would you like?): See the notable things above

Where should we start reviewing?: For maximum 🌶️ , KvsRunner + friends, followed by TimelockNamespaces.

Priority (whenever / two weeks / yesterday): this week

@changelog-app
Copy link

changelog-app bot commented Feb 17, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Fixed an issue where the AtlasRestoreClient would be unable to perform necessary operations on timelock (fast forwarding the timestamp) during the restore process.

Check the box to generate changelog(s)

  • Generate changelog entry

Copy link
Contributor

@gmaretic gmaretic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, but i have a couple of concerns

import com.palantir.atlasdb.transaction.api.TransactionManager;
import java.util.function.Function;

final class TransactionManagerKvsRunner implements KvsRunner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransactionManagerScopedKvsRunner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant more agnostic and descriptive of behaviour/intent: SharedKvsRunner or even NonClosingKvsRunner?


@Override
public <T> T run(Namespace namespace, Function<KeyValueService, T> function) {
try (KeyValueService kvs = keyValueServiceFactory.apply(namespace)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit excessive to create and close on every call -- unless we really only do it once. We should Ideally just make the service closeable and close the KVS on close if necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a given namespace, the KVS is indeed created and fetched exactly once per backup or restore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then gtg

@@ -79,7 +80,7 @@ public TimeLockServices get(String namespace) {
}

public TimeLockServices getIgnoringDisabled(String namespace) {
return services.computeIfAbsent(namespace, ns -> createNewClient(ns, true));
return Optional.ofNullable(services.get(namespace)).orElseGet(() -> createNewClient(namespace, true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the correctness issue, but it potentially creates a client on every call and closes any of them. One way to fix it is just to have a separate ConcurrentMap for the ignoring services and not reuse the existing normal services.
Alternatively, and maybe better, we can keep the previous behaviour where an explicit disable still kills the existing service (because it does all the locks etc cleanup for us) and let the ignoring version create a new one but make sure that a get does a precondition check to verify it's not disabled after calling services.computeIfAbsent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting - yeah, that would work. So we'd create a new service the first time we call getIgnoringDisabled, then future get calls with throw until the service is re-enabled.

@@ -76,11 +75,16 @@ public TimelockNamespaces(
}

public TimeLockServices get(String namespace) {
Preconditions.checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must go after we computeIfAbsent, to avoid race conditions where this passes, we disable and call getIgnoring and then this calls computeIfAbsent

@bulldozer-bot bulldozer-bot bot merged commit 06dbd5b into develop Feb 21, 2022
@bulldozer-bot bulldozer-bot bot deleted the gs/abr-more-etes branch February 21, 2022 16:28
@svc-autorelease
Copy link
Collaborator

Released 0.548.0

sudiksha27 added a commit that referenced this pull request Feb 24, 2022
bulldozer-bot bot pushed a commit that referenced this pull request Feb 24, 2022
gsheasby added a commit that referenced this pull request Feb 24, 2022
@gsheasby gsheasby mentioned this pull request Feb 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants