-
Notifications
You must be signed in to change notification settings - Fork 15
feature: Expose API to let client define number of expected values in a Get. #6655
feature: Expose API to let client define number of expected values in a Get. #6655
Conversation
return getCache().get(tableRef, cells, uncached -> { | ||
int cachedCells = cells.size() - uncached.size(); | ||
int numberOfCellsExpectingValuePostCache = expectedNumberOfPresentCells - cachedCells; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we actually storing empty values in this cache (and on the one on CachingTransaction
)?
If so, this might not be the correct thing to do, as we'd be wrongly lowering the number of expected present cells, leading to throwing expectations down the line or potentially returning empty values when we shouldn't have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot! It's a bit messy:
- This cache uses lock watches, which does cache empty reads.
- The caching transaction (which I don't think is actually used in normal atlasdb usage) does also cache empty reads.
So yeah we might need a bit of refactoring here; as you note, the expression as written isn't quite right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored! Let me know how do you think things look like now.
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Outdated
Show resolved
Hide resolved
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Outdated
Show resolved
Hide resolved
...sdb-impl-shared/src/main/java/com/palantir/atlasdb/transaction/impl/SnapshotTransaction.java
Outdated
Show resolved
Hide resolved
...sdb-impl-shared/src/main/java/com/palantir/atlasdb/transaction/impl/SnapshotTransaction.java
Outdated
Show resolved
Hide resolved
return getCache().get(tableRef, cells, uncached -> { | ||
int cachedCells = cells.size() - uncached.size(); | ||
int numberOfCellsExpectingValuePostCache = expectedNumberOfPresentCells - cachedCells; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot! It's a bit messy:
- This cache uses lock watches, which does cache empty reads.
- The caching transaction (which I don't think is actually used in normal atlasdb usage) does also cache empty reads.
So yeah we might need a bit of refactoring here; as you note, the expression as written isn't quite right
(tableReference, toRead) -> Futures.immediateFuture(super.getWithExpectedNumberOfCells( | ||
tableReference, toRead, expectedNumberOfPresentCells))) | ||
.get(); | ||
} catch (InterruptedException | ExecutionException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be missing something: usually we need to reset the interrupted flag on catching InterruptedException
- is this different for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... not sure? I just assumed the pattern used on the standard get method was correct and repeated it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So Throwables
handles re-interruption if an InterruptException
is passed... though this is probably actually an old bug, as InterruptedException
isn't guaranteed to have a cause (and/or one of type InterruptedException
).
I think (though you should check) that in the normal get and here, this is wrong, and should be
} catch (InterruptedException e) {
throw Throwables.rewrapAndThrowUncheckedException(e); // This does the InterruptedException magic
} catch (ExecutionException e) {
throw Throwables.rewrapAndThrowUncheckedException(e.getCause());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to do the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InterruptedException::getCause should return null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you're right! Fixed the CachingTransaction
but forgot to fix the instance here.
atlasdb-client/src/main/java/com/palantir/atlasdb/transaction/impl/CachingTransaction.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Idempotent | ||
Map<Cell, byte[]> getWithExpectedNumberOfCells( | ||
TableReference tableRef, Set<Cell> cells, int expectedNumberOfPresentCells); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this one: the docs read well. But, I'm not sure we want to expose this to Atlas clients in general: we should see if there's a way to restrict this to the AtlasDB proxy only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any such features that we hide from general AtlasDb users today? If so, any examples on how it's done?
atlasdb-api/src/main/java/com/palantir/atlasdb/transaction/api/Transaction.java
Outdated
Show resolved
Hide resolved
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Show resolved
Hide resolved
…ed value is empty
Map<Cell, byte[]> getWithCachedRef( | ||
TableReference tableReference, | ||
Set<Cell> cells, | ||
Function<CacheLookupResult, ListenableFuture<Map<Cell, byte[]>>> valueLoader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have also just replaced the existing get
and getAsync
to receive the CacheLookupResult
instead of creating new two separate methods. Let me know what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How bad would this change have been? Basically I think it's nicer to just have one get/getAsync, but I understand refactoring this might be very difficult/costly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very, tbh. Although this PR is already getting quite big, so I'll try to do the refactor in a separate one.
atlasdb-api/src/main/java/com/palantir/atlasdb/keyvalue/api/cache/TransactionScopedCache.java
Show resolved
Hide resolved
...sdb-impl-shared/src/main/java/com/palantir/atlasdb/transaction/impl/SnapshotTransaction.java
Outdated
Show resolved
Hide resolved
long nonEmptyValuesInCache = cachedCells.values().stream() | ||
.filter(value -> value != PtBytes.EMPTY_BYTE_ARRAY) | ||
.count(); | ||
long numberOfCellsExpectingValuePostCache = | ||
expectedNumberOfPresentCells - nonEmptyValuesInCache; | ||
|
||
return Futures.immediateFuture(super.getWithExpectedNumberOfCells( | ||
tableReference, toRead, numberOfCellsExpectingValuePostCache)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add some tests to this class verifying that we call super...
with less expected values if we have things cached, but couldn't think of a good way to do it... any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It is tangential, but if we pass in a mock delegate to the constructor we can verify stuff on it right? Admittedly this kind of tests ForwardingTransaction itself too but I think that's fine
# Conflicts: # .palantir/revapi.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty solid! Just a couple of smaller bits left
Map<Cell, byte[]> getWithCachedRef( | ||
TableReference tableReference, | ||
Set<Cell> cells, | ||
Function<CacheLookupResult, ListenableFuture<Map<Cell, byte[]>>> valueLoader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How bad would this change have been? Basically I think it's nicer to just have one get/getAsync, but I understand refactoring this might be very difficult/costly
atlasdb-api/src/main/java/com/palantir/atlasdb/transaction/api/Transaction.java
Show resolved
Hide resolved
atlasdb-api/src/main/java/com/palantir/atlasdb/transaction/api/Transaction.java
Show resolved
Hide resolved
atlasdb-client/src/main/java/com/palantir/atlasdb/transaction/impl/CachingTransaction.java
Show resolved
Hide resolved
(tableReference, toRead) -> Futures.immediateFuture(super.getWithExpectedNumberOfCells( | ||
tableReference, toRead, expectedNumberOfPresentCells))) | ||
.get(); | ||
} catch (InterruptedException | ExecutionException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So Throwables
handles re-interruption if an InterruptException
is passed... though this is probably actually an old bug, as InterruptedException
isn't guaranteed to have a cause (and/or one of type InterruptedException
).
I think (though you should check) that in the normal get and here, this is wrong, and should be
} catch (InterruptedException e) {
throw Throwables.rewrapAndThrowUncheckedException(e); // This does the InterruptedException magic
} catch (ExecutionException e) {
throw Throwables.rewrapAndThrowUncheckedException(e.getCause());
}
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Outdated
Show resolved
Hide resolved
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Outdated
Show resolved
Hide resolved
...sdb-impl-shared/src/main/java/com/palantir/atlasdb/transaction/impl/SnapshotTransaction.java
Outdated
Show resolved
Hide resolved
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Show resolved
Hide resolved
long nonEmptyValuesInCache = cachedCells.values().stream() | ||
.filter(value -> value != PtBytes.EMPTY_BYTE_ARRAY) | ||
.count(); | ||
long numberOfCellsExpectingValuePostCache = | ||
expectedNumberOfPresentCells - nonEmptyValuesInCache; | ||
|
||
return Futures.immediateFuture(super.getWithExpectedNumberOfCells( | ||
tableReference, toRead, numberOfCellsExpectingValuePostCache)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It is tangential, but if we pass in a mock delegate to the constructor we can verify stuff on it right? Admittedly this kind of tests ForwardingTransaction itself too but I think that's fine
if (cacheLookup.missedCells().isEmpty()) { | ||
return Futures.immediateFuture(filterEmptyValues(cacheLookup.cacheHits())); | ||
} else { | ||
return Futures.transform( | ||
valueLoader.apply(cacheLookup.missedCells()), | ||
uncachedValues -> processUncachedCells( | ||
tableReference, cacheLookup.cacheHits(), cacheLookup.missedCells(), uncachedValues), | ||
MoreExecutors.directExecutor()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially controversial change here: before if we had no missed cells, we'd just return early. But now we delegate to the value loader, so it can decide if too many cells were cached and throw. Not extra cells should be loaded, though, since we pass an empty Set as the missed cells.
Modified the test for the existing behaviour here: https://github.com/palantir/atlasdb/pull/6655/files#diff-0205f833db5fc2b2032f684954c530b494e176e210ab17d9bdd0713869ef1981R152-R163
0155cb8
to
a25666a
Compare
2ad1b7a
to
2ab80f9
Compare
return arguments; | ||
} | ||
|
||
@Unsafe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the @Unsafe
marker here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the SafeLoggingPropagation check. Otherwise we fail to compile with the following message:
/Volumes/git/Projects/atlasdb/atlasdb-api/src/main/java/com/palantir/atlasdb/transaction/api/exceptions/MoreCellsPresentThanExpectedException.java:59: error: [SafeLoggingPropagation] Safe logging annotations should be propagated to encapsulating elements to allow static analysis tooling to work with as much information as possible. This check can be auto-fixed using `./gradlew classes testClasses -PerrorProneApply=SafeLoggingPropagation`
private static List<Arg<?>> argsFrom(Map<Cell, byte[]> retrievedCells, long expectedNumberOfCells) {
^
(see https://github.com/palantir/gradle-baseline#baseline-error-prone-checks)
Did you mean '@Unsafe private static List<Arg<?>> argsFrom(Map<Cell, byte[]> retrievedCells, long expectedNumberOfCells) {'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just a style nit on the new tests
...src/test/java/com/palantir/atlasdb/transaction/impl/expectations/CellCountValidatorTest.java
Outdated
Show resolved
Hide resolved
import com.palantir.logsafe.exceptions.SafeIllegalStateException; | ||
import java.util.Map; | ||
|
||
public class MoreCellsPresentThanExpectedException extends IllegalStateException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the entire message that is passed to super
will be unrendered IMO
* rename validator * changing API to return result type * rev api * isOk final * dry * spotless * Result refactor (#6692) * implement result on children * method reference
[skip ci]
[skip ci]
[skip ci]
[skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for updating the PR! Looks great
...a/com/palantir/atlasdb/transaction/api/exceptions/MoreCellsPresentThanExpectedException.java
Show resolved
Hide resolved
atlasdb-api/src/main/java/com/palantir/atlasdb/transaction/api/Transaction.java
Show resolved
Hide resolved
long expectedNumberOfPresentCellsToFetch, Map<Cell, CacheValue> cachedLookup) { | ||
Map<Cell, byte[]> cachedCellsWithNonEmptyValue = EntryStream.of(cachedLookup) | ||
.filterValues(value -> value.value().isPresent() | ||
&& !Arrays.equals(value.value().get(), PtBytes.EMPTY_BYTE_ARRAY)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this was the bit we missed the last time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really! That was correct from the beginning.
We were over subtracting from the number of expected cells here and fixed in this commit: f11d972
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. I see, yep thanks
...ests-shared/src/test/java/com/palantir/atlasdb/transaction/impl/SnapshotTransactionTest.java
Show resolved
Hide resolved
Released 0.920.0 |
General
Before this PR:
Only users with schemas that expected all cells to be present could actually make use of the performance improvement provided by #6624.
The idea is to let clients specify how many values they expect at most, so they can benefit from skipping the immutable timestamp lock check if they know only one of two cells is every going to be present, for example.
After this PR:
==COMMIT_MSG==
Expose API letting clients specify how many values they expect at most from a Get. This let us skip the immutable timestamp lock even in the case we've done some empty reads if the client knows it's safe to do so.
==COMMIT_MSG==
Priority:
Concerns / possible downsides (what feedback would you like?): Probably have to pay a bit more attention on how this plays with the caches we have. Are we caching empty values? Also run the risk of returning empty results when we shouldn't if the client misuses the API.
Is documentation needed?:
Compatibility
Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:
Does this PR need a schema migration?
Testing and Correctness
What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:
What was existing testing like? What have you done to improve it?:
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:
Execution
How would I tell this PR works in production? (Metrics, logs, etc.):
Has the safety of all log arguments been decided correctly?:
Will this change significantly affect our spending on metrics or logs?:
How would I tell that this PR does not work in production? (monitors, etc.):
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):
Scale
Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:
Development Process
Where should we start reviewing?:
If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:
Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju