Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

feature: Expose API to let client define number of expected values in a Get. #6655

Merged
merged 46 commits into from
Aug 30, 2023

Conversation

LucasIME
Copy link
Contributor

@LucasIME LucasIME commented Jul 4, 2023

General

Before this PR:
Only users with schemas that expected all cells to be present could actually make use of the performance improvement provided by #6624.

The idea is to let clients specify how many values they expect at most, so they can benefit from skipping the immutable timestamp lock check if they know only one of two cells is every going to be present, for example.

After this PR:

==COMMIT_MSG==
Expose API letting clients specify how many values they expect at most from a Get. This let us skip the immutable timestamp lock even in the case we've done some empty reads if the client knows it's safe to do so.
==COMMIT_MSG==

Priority:

Concerns / possible downsides (what feedback would you like?): Probably have to pay a bit more attention on how this plays with the caches we have. Are we caching empty values? Also run the risk of returning empty results when we shouldn't if the client misuses the API.

Is documentation needed?:

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:

Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:

The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):

Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:

Does this PR need a schema migration?

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:

What was existing testing like? What have you done to improve it?:

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:

Execution

How would I tell this PR works in production? (Metrics, logs, etc.):

Has the safety of all log arguments been decided correctly?:

Will this change significantly affect our spending on metrics or logs?:

How would I tell that this PR does not work in production? (monitors, etc.):

If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:

If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:

Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:

Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:

Development Process

Where should we start reviewing?:

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:

Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju

Comment on lines 924 to 926
return getCache().get(tableRef, cells, uncached -> {
int cachedCells = cells.size() - uncached.size();
int numberOfCellsExpectingValuePostCache = expectedNumberOfPresentCells - cachedCells;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we actually storing empty values in this cache (and on the one on CachingTransaction)?

If so, this might not be the correct thing to do, as we'd be wrongly lowering the number of expected present cells, leading to throwing expectations down the line or potentially returning empty values when we shouldn't have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot! It's a bit messy:

  • This cache uses lock watches, which does cache empty reads.
  • The caching transaction (which I don't think is actually used in normal atlasdb usage) does also cache empty reads.

So yeah we might need a bit of refactoring here; as you note, the expression as written isn't quite right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored! Let me know how do you think things look like now.

Comment on lines 924 to 926
return getCache().get(tableRef, cells, uncached -> {
int cachedCells = cells.size() - uncached.size();
int numberOfCellsExpectingValuePostCache = expectedNumberOfPresentCells - cachedCells;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot! It's a bit messy:

  • This cache uses lock watches, which does cache empty reads.
  • The caching transaction (which I don't think is actually used in normal atlasdb usage) does also cache empty reads.

So yeah we might need a bit of refactoring here; as you note, the expression as written isn't quite right

(tableReference, toRead) -> Futures.immediateFuture(super.getWithExpectedNumberOfCells(
tableReference, toRead, expectedNumberOfPresentCells)))
.get();
} catch (InterruptedException | ExecutionException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be missing something: usually we need to reset the interrupted flag on catching InterruptedException - is this different for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... not sure? I just assumed the pattern used on the standard get method was correct and repeated it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So Throwables handles re-interruption if an InterruptException is passed... though this is probably actually an old bug, as InterruptedException isn't guaranteed to have a cause (and/or one of type InterruptedException).

I think (though you should check) that in the normal get and here, this is wrong, and should be

} catch (InterruptedException e) {
  throw Throwables.rewrapAndThrowUncheckedException(e); // This does the InterruptedException magic
} catch (ExecutionException e) {
  throw Throwables.rewrapAndThrowUncheckedException(e.getCause());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to do the above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InterruptedException::getCause should return null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right! Fixed the CachingTransaction but forgot to fix the instance here.

*/
@Idempotent
Map<Cell, byte[]> getWithExpectedNumberOfCells(
TableReference tableRef, Set<Cell> cells, int expectedNumberOfPresentCells);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this one: the docs read well. But, I'm not sure we want to expose this to Atlas clients in general: we should see if there's a way to restrict this to the AtlasDB proxy only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any such features that we hide from general AtlasDb users today? If so, any examples on how it's done?

Comment on lines +64 to +67
Map<Cell, byte[]> getWithCachedRef(
TableReference tableReference,
Set<Cell> cells,
Function<CacheLookupResult, ListenableFuture<Map<Cell, byte[]>>> valueLoader);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have also just replaced the existing get and getAsync to receive the CacheLookupResult instead of creating new two separate methods. Let me know what you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bad would this change have been? Basically I think it's nicer to just have one get/getAsync, but I understand refactoring this might be very difficult/costly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very, tbh. Although this PR is already getting quite big, so I'll try to do the refactor in a separate one.

Comment on lines 140 to 147
long nonEmptyValuesInCache = cachedCells.values().stream()
.filter(value -> value != PtBytes.EMPTY_BYTE_ARRAY)
.count();
long numberOfCellsExpectingValuePostCache =
expectedNumberOfPresentCells - nonEmptyValuesInCache;

return Futures.immediateFuture(super.getWithExpectedNumberOfCells(
tableReference, toRead, numberOfCellsExpectingValuePostCache));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add some tests to this class verifying that we call super... with less expected values if we have things cached, but couldn't think of a good way to do it... any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. It is tangential, but if we pass in a mock delegate to the constructor we can verify stuff on it right? Admittedly this kind of tests ForwardingTransaction itself too but I think that's fine

@LucasIME LucasIME marked this pull request as ready for review July 20, 2023 16:10
@LucasIME LucasIME requested a review from jeremyk-91 July 20, 2023 16:20
Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid! Just a couple of smaller bits left

Comment on lines +64 to +67
Map<Cell, byte[]> getWithCachedRef(
TableReference tableReference,
Set<Cell> cells,
Function<CacheLookupResult, ListenableFuture<Map<Cell, byte[]>>> valueLoader);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bad would this change have been? Basically I think it's nicer to just have one get/getAsync, but I understand refactoring this might be very difficult/costly

(tableReference, toRead) -> Futures.immediateFuture(super.getWithExpectedNumberOfCells(
tableReference, toRead, expectedNumberOfPresentCells)))
.get();
} catch (InterruptedException | ExecutionException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So Throwables handles re-interruption if an InterruptException is passed... though this is probably actually an old bug, as InterruptedException isn't guaranteed to have a cause (and/or one of type InterruptedException).

I think (though you should check) that in the normal get and here, this is wrong, and should be

} catch (InterruptedException e) {
  throw Throwables.rewrapAndThrowUncheckedException(e); // This does the InterruptedException magic
} catch (ExecutionException e) {
  throw Throwables.rewrapAndThrowUncheckedException(e.getCause());
}

Comment on lines 140 to 147
long nonEmptyValuesInCache = cachedCells.values().stream()
.filter(value -> value != PtBytes.EMPTY_BYTE_ARRAY)
.count();
long numberOfCellsExpectingValuePostCache =
expectedNumberOfPresentCells - nonEmptyValuesInCache;

return Futures.immediateFuture(super.getWithExpectedNumberOfCells(
tableReference, toRead, numberOfCellsExpectingValuePostCache));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. It is tangential, but if we pass in a mock delegate to the constructor we can verify stuff on it right? Admittedly this kind of tests ForwardingTransaction itself too but I think that's fine

Comment on lines -97 to -105
if (cacheLookup.missedCells().isEmpty()) {
return Futures.immediateFuture(filterEmptyValues(cacheLookup.cacheHits()));
} else {
return Futures.transform(
valueLoader.apply(cacheLookup.missedCells()),
uncachedValues -> processUncachedCells(
tableReference, cacheLookup.cacheHits(), cacheLookup.missedCells(), uncachedValues),
MoreExecutors.directExecutor());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially controversial change here: before if we had no missed cells, we'd just return early. But now we delegate to the value loader, so it can decide if too many cells were cached and throw. Not extra cells should be loaded, though, since we pass an empty Set as the missed cells.

Modified the test for the existing behaviour here: https://github.com/palantir/atlasdb/pull/6655/files#diff-0205f833db5fc2b2032f684954c530b494e176e210ab17d9bdd0713869ef1981R152-R163

@LucasIME LucasIME force-pushed the lmeireles/expose-get-with-expected-size branch from 0155cb8 to a25666a Compare July 25, 2023 13:56
@LucasIME LucasIME force-pushed the lmeireles/expose-get-with-expected-size branch from 2ad1b7a to 2ab80f9 Compare August 1, 2023 13:06
return arguments;
}

@Unsafe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the @Unsafe marker here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the SafeLoggingPropagation check. Otherwise we fail to compile with the following message:

/Volumes/git/Projects/atlasdb/atlasdb-api/src/main/java/com/palantir/atlasdb/transaction/api/exceptions/MoreCellsPresentThanExpectedException.java:59: error: [SafeLoggingPropagation] Safe logging annotations should be propagated to encapsulating elements to allow static analysis tooling to work with as much information as possible. This check can be auto-fixed using `./gradlew classes testClasses -PerrorProneApply=SafeLoggingPropagation`
    private static List<Arg<?>> argsFrom(Map<Cell, byte[]> retrievedCells, long expectedNumberOfCells) {
                                ^
    (see https://github.com/palantir/gradle-baseline#baseline-error-prone-checks)
  Did you mean '@Unsafe private static List<Arg<?>> argsFrom(Map<Cell, byte[]> retrievedCells, long expectedNumberOfCells) {'?

Copy link
Contributor

@Sam-Kramer Sam-Kramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a style nit on the new tests

import com.palantir.logsafe.exceptions.SafeIllegalStateException;
import java.util.Map;

public class MoreCellsPresentThanExpectedException extends IllegalStateException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the entire message that is passed to super will be unrendered IMO

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for updating the PR! Looks great

long expectedNumberOfPresentCellsToFetch, Map<Cell, CacheValue> cachedLookup) {
Map<Cell, byte[]> cachedCellsWithNonEmptyValue = EntryStream.of(cachedLookup)
.filterValues(value -> value.value().isPresent()
&& !Arrays.equals(value.value().get(), PtBytes.EMPTY_BYTE_ARRAY))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this was the bit we missed the last time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really! That was correct from the beginning.

We were over subtracting from the number of expected cells here and fixed in this commit: f11d972

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. I see, yep thanks

@bulldozer-bot bulldozer-bot bot merged commit 954092e into develop Aug 30, 2023
@bulldozer-bot bulldozer-bot bot deleted the lmeireles/expose-get-with-expected-size branch August 30, 2023 14:39
@svc-autorelease
Copy link
Collaborator

Released 0.920.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants