-
Notifications
You must be signed in to change notification settings - Fork 15
Conversation
Generate changelog in
|
for consistency - do we still need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a high-level comment, before I do a more detailed review -- we should define the backup service as a conjure API. This does require a new version of timelock to be picked up, but it decouples the server-side logic from the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, I would structure this differently with the idea that we make calls as efficient as possible and move the logic into timelock itself while exposing a minimal interface needed for backups and not relying on the implementation.
Happy to go into more detail offline if that helps
} | ||
|
||
// lock immutable timestamp | ||
public PrepareBackupResponse prepareBackup(AuthHeader authHeader, String namespace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make this endpoint accept a set of namespaces for efficiency
} | ||
|
||
// check immutable ts lock | ||
public boolean checkBackupIsValid(AuthHeader authHeader, NamespacedLockToken namespacedLockToken) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even go as far as not having the NamespacedLockToken
directly, but some BackupToken
that is generated by timelock. Its validity might be tied to the validity of the immutable ts lock for now, but it could change in the future and the client should not have to care about that
import java.util.Set; | ||
import org.jetbrains.annotations.NotNull; | ||
|
||
public class AtlasBackupService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a thick client, most of this logic should live in timelock server
@@ -146,6 +174,11 @@ services: | |||
namespace: string | |||
request: ConjureLockRequest | |||
returns: ConjureLockResponse | |||
lockImmutableTimestamp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not expose this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate: this is not generally something we want to allow users to do -- it is an implementation detail of how transactions work, and exposing it both makes further changes more difficult and could cause reckless users to adversely affect things they don't know about. For example, if someone calls this and holds it for an indefinite amount of time, that blocks sweep from progressing.
atlasdb-backups/src/main/java/com/palantir/atlasdb/backup/AtlasBackupTask.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments, but almost there!
atlasdb-backups/src/main/java/com/palantir/atlasdb/backup/AtlasBackupTask.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
public static AtlasBackupTask create(Refreshable<ServicesConfigBlock> servicesConfigBlock, String serviceName) { | ||
ReloadingFactory reloadingFactory = DialogueClients.create(servicesConfigBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can go with an AtlasBackupService
that has a fixed client and the same API instead of recreating the client for each task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming is misleading you here - renaming to AtlasBackupService
makes sense. The serviceName
is whatever the timelock service name is configured to be internally (not connected to the namespace), and the class will be created once (on internal service startup), and reused for multiple backup operations.
atlasdb-backups/src/main/java/com/palantir/atlasdb/backup/AtlasBackupTask.java
Outdated
Show resolved
Hide resolved
atlasdb-backups/src/main/java/com/palantir/atlasdb/backup/AtlasBackupTask.java
Outdated
Show resolved
Hide resolved
.collect(Collectors.toSet()); | ||
CompleteBackupRequest request = CompleteBackupRequest.of(tokens); | ||
CompleteBackupResponse response = atlasBackupServiceBlocking.completeBackup(authHeader, request); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: actually persist the tokens using a persister that is passed in to this class, then we have an atlas-side implementation of the persister that conforms with the current backup story in internal tooling
timelock-impl/src/main/java/com/palantir/atlasdb/backup/AtlasBackupResource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/backup/AtlasBackupResource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/backup/AtlasBackupResource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/test/java/com/palantir/atlasdb/backup/AtlasBackupResourceTest.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/test/java/com/palantir/atlasdb/backup/AtlasBackupResourceTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost there (I guess I said that in the previous review, but now it's mainly nits)
atlasdb-backup/src/main/java/com/palantir/atlasdb/backup/AtlasBackupService.java
Show resolved
Hide resolved
atlasdb-backup/src/main/java/com/palantir/atlasdb/backup/AtlasBackupService.java
Show resolved
Hide resolved
storedTokens.put(backupToken.getNamespace(), backupToken); | ||
} | ||
|
||
// TODO(gs): actually persist the token using a persister passed into this class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this should be a follow-up PR
// Then we have an atlas-side implementation of the persister that conforms with the current backup story | ||
public Set<Namespace> completeBackup(Set<Namespace> namespaces) { | ||
Set<InProgressBackupToken> tokens = namespaces.stream() | ||
.map(storedTokens::get) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map(storedTokens::get) | |
.map(storedTokens::remove) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the unlock fails, do we still want to remove the element from the map?
I guess that depends on why it failed?
- If the lock is lost, remove the element
- If we just failed to talk to timelock, keep the element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately the current API doesn't give us the information to decide...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we will retry anyway
atlasdb-backup/src/test/java/com/palantir/atlasdb/backup/AtlasBackupServiceTest.java
Show resolved
Hide resolved
@@ -117,6 +144,22 @@ types: | |||
leaderTimes: map<Namespace, LeaderTime> | |||
|
|||
services: | |||
AtlasBackupClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremyk-91 do you think this is a reasonable name even though it defies the usual naming? AtlasBackupService should really be the class we expose to clients, so I think this makes sense
timelock-impl/src/test/java/com/palantir/atlasdb/util/TimelockTestUtils.java
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/backup/AtlasBackupResource.java
Outdated
Show resolved
Hide resolved
timelock-impl/src/main/java/com/palantir/atlasdb/backup/AtlasBackupResource.java
Show resolved
Hide resolved
Optional<InProgressBackupToken> prepareBackup(Namespace namespace) { | ||
try { | ||
return Optional.of(tryPrepareBackup(namespace)); | ||
} catch (Exception ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the above ever throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like not - it must have done in an earlier version of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having said that, don't you get some sort of Exception if we fail to reach timelock? Don't we want to handle that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is server-side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
private InProgressBackupToken prepareBackup(Namespace namespace) { | ||
AsyncTimelockService timelock = timelock(namespace); | ||
LockImmutableTimestampResponse response = timelock.lockImmutableTimestamp(IdentifiedTimeLockRequest.create()); | ||
long timestamp = timelock.getFreshTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised that technically we can reuse the immutable timestamp here, but let's just keep this as it makes the change smaller
Current status (1/11):
The implementation and wiring looks complete and ready for review to me, needs more tests.
Goals (and why): Atlas should provide backup and restore services, so that our internal backup + restore solution is not overly reliant on AtlasDB internals.
Implementation Description (bullets):
Testing (What was existing testing like? What have you done to improve it?):
Concerns (what feedback would you like?):
Where should we start reviewing?:
Priority (whenever / two weeks / yesterday):