Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Simplify getCandidateCellsForSweeping() #2439

Merged
merged 2 commits into from
Oct 10, 2017

Conversation

gbonik
Copy link
Contributor

@gbonik gbonik commented Oct 5, 2017

Initially I hoped to take advantage of in-database filtering if we get
to implement the "transaction table sweeping" feature. However, seems
like that wasn't a great decision on my part - unclear when (and if) we
get that feature done, and also how much of improvement we would
actually get. The extra logic makes the code significantly more complex,
so I think we need to back off and give up the idea.

This doesn't invalidate the sweep rewrite project. The new impls still
bring significant performance improvements.


This change is Reviewable

Initially I hoped to take advantage of in-database filtering if we get
to implement the "transaction table sweeping" feature. However, seems
like that wasn't a great decision on my part - unclear when (and if) we
get that feature done, and also how much of improvement we would
actually get. The extra logic makes the code significantly more complex,
so I think we need to back off and give up the idea.

This doesn't invalidate the sweep rewrite project. The new impls still
bring significant performance improvements.
@gsheasby gsheasby self-assigned this Oct 9, 2017
@gsheasby gsheasby requested review from gsheasby and hsaraogi October 9, 2017 14:14
Copy link
Contributor

@gsheasby gsheasby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me (just a few nits) - @hsaraogi will also take a look today.

candidateBatch.add(ImmutableCandidateCellForSweeping.builder()
.cell(cell)
.sortedTimestamps(candidate ? timestampArr : EMPTY_LONG_ARRAY)
.sortedTimestamps(timestampArr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to check if we should still be returning EMPTY_LONG_ARRAY if there is just one timestamp - I see that case excluded at SweepableCellFilter.getCellToSweep, so looks ok to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there is one timestamp, you still might need to sweep it (if that timestamp is uncommitted)

assertThat(getAllCandidates(conservativeRequest(PtBytes.EMPTY_BYTE_ARRAY, 40L, 30L))).isEmpty();
public void singleCellSpanningSeveralPages() {
new TestDataBuilder()
.put(10, 1, 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 1000L (and similar below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -125,7 +120,7 @@ public void returnCellsInOrder() {
.putEmpty(3, 2, 10L)
.putEmpty(3, 3, 10L)
.store();
assertThat(getAllCandidates(conservativeRequest(PtBytes.EMPTY_BYTE_ARRAY, 30L, 5L))
assertThat(getAllCandidates(conservativeRequest(PtBytes.EMPTY_BYTE_ARRAY, 30L, 100))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 100L (repeated on line 138 below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that argument is actually an int (the batch size)

+ " FROM ("
+ " SELECT row_name, col_name, ts"
+ " FROM " + prefixedTableName
+ " WHERE ts < ? " + bounds.predicates + getIgnoredTimestampPredicate()
Copy link
Contributor

@hsaraogi hsaraogi Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should thorough have the getIgnoredTimestampPredicate? This brings us to the question if shouldCheckIfLatestValueIsEmpty and timestampsToIgnore should be two separate values or grouped as SweepStrategyParams (maybe an enum with types CONSERVATIVE/ThOROUGH).

timestampsToIgnore will be -1 for Conservative and empty for THOROUGH and we can add binds in the conservative query as well..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss what the API should be (and in fact we could have discussed this a few months back when I introduced it), but in this PR I'm trying to keep changes minimal.

This is a question of the level of abstraction. In the current design, KVS doesn't really know anything about sweep strategies; we just tell it whether to ignore timestamps and/or load empty values. In this design, the two concepts are orthogonal, and that's why you see getIgnoredTimestampPredicate() in both "if" branches. We could change the design and make KVS aware of sweep strategies, but that's beyond the scope of this pull request.

Copy link
Contributor

@hsaraogi hsaraogi Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to point out that shouldCheckIfLatestValueIsEmpty and timestampsToIgnore are not independent concepts but are correlated based on the sweep strategy, seems like a bad design to allow many more cases here when there are exactly two possibilities.

Open to the idea of doing this in a separate PR, but I think allowing exactly two sets of values will simplify the queries here.

Copy link
Contributor

@gsheasby gsheasby Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think that the KVS should not know about sweep - its one job is to make the appropriate calls to the DB (via the clientPool, in the C* case) in a correct and efficient way. It shouldn't have to know why the user wants to get these timestamps - +1 to keeping as-is.

However, the boundaries are blurred somewhat as we're specifically satisfying a getCandidateCellsForSweeping request, implying that it's reasonable to know something about what being a CandidateCellForSweeping might entail.

But also, the branching here is no more than it was before this PR, and I don't think it's an egregious combinatorial explosion.

On balance, I'd like this to merge as is - any refactor to try and cut down on the branching should be in a separate PR, and should also be a low priority, given the limited potential for simplification and the urgency of getting this + Oracle sweep merged in to develop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding removing branching - I want to benchmark a different query later that would eliminate this branching (for thorough, check values for all rows rather than only load the latest values). But of course I need to get the Oracle part done first - we can get back to this later.

}

@Test
public void returnFirstAndLastCellOfThePage() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test still seems valuable, why did we get rid of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tested implementation-specific behavior that doesn't exist anymore

@codecov-io
Copy link

Codecov Report

Merging #2439 into feature/sweep-rewrite will decrease coverage by 0.03%.
The diff coverage is 97.29%.

Impacted file tree graph

@@                     Coverage Diff                     @@
##             feature/sweep-rewrite    #2439      +/-   ##
===========================================================
- Coverage                    60.36%   60.32%   -0.04%     
- Complexity                    4470     4690     +220     
===========================================================
  Files                          843      843              
  Lines                        39797    39782      -15     
  Branches                      4096     4083      -13     
===========================================================
- Hits                         24023    24000      -23     
- Misses                       14284    14290       +6     
- Partials                      1490     1492       +2
Impacted Files Coverage Δ Complexity Δ
...va/com/palantir/atlasdb/sweep/SweepTaskRunner.java 92.13% <ø> (-0.09%) 19 <0> (ø)
...om/palantir/atlasdb/keyvalue/dbkvs/impl/DbKvs.java 84.45% <ø> (-0.03%) 137 <0> (ø)
...eep/CassandraGetCandidateCellsForSweepingImpl.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...impl/AbstractGetCandidateCellsForSweepingTest.java 97.69% <100%> (-0.7%) 21 <4> (-2)
...eyvalue/impl/GetCandidateCellsForSweepingShim.java 87.14% <100%> (+1.61%) 14 <0> (-9) ⬇️
...postgres/PostgresGetCandidateCellsForSweeping.java 97.93% <100%> (+0.82%) 2 <1> (-2) ⬇️
...java/com/palantir/lock/impl/LockClientIndices.java 85% <0%> (-10%) 5% <0%> (ø)
...ain/java/com/palantir/leader/LeadershipEvents.java 83.33% <0%> (-8.34%) 0% <0%> (ø)
...rc/main/java/com/palantir/lock/LockCollection.java 66.66% <0%> (-4.77%) 11% <0%> (-1%)
...yvalue/dbkvs/impl/ConnectionManagerAwareDbKvs.java 88.88% <0%> (-3.71%) 5% <0%> (-1%)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f9b464...77b8c16. Read the comment docs.

@gbonik gbonik merged commit 801904a into feature/sweep-rewrite Oct 10, 2017
dxiao pushed a commit that referenced this pull request Oct 14, 2017
* The big sweep rewrite, part 3: Postgres impl (#2021)

* The big sweep rewrite, part 3: Postgres impl

- Implement KeyValueService.getCandidateCellsForSweeping() for Postgres

- Add AgnosticResultRow.getBoolean() and LightAgnosticResultRow.getArray()

* Address CR comments

Also change the stopping condition in computeNext()

* Add a release note

* Add the ORDER BY that I forgot

* Rewrite a fragile test

* Join pages based on the number of (cell, ts) pairs

* Move release note to develop

* The big sweep rewrite, part 4: Cassandra impl (#2231)

* The big sweep rewrite, part 4: Cassandra impl

Iterate over cells in a table without ever loading more than a given
number at once regardless of how wide the rows are, but still perform
efficiently on narrow tables.

This should finally allow us to run sweep safely on
Cassandra with decent performance.

* Address clockfort's comments + checkstyle + add release notes

* Address code review comments

* Split up test cases in SingleRowColumnPagerTest

* Tests for KeyRanges

* Update locks

* Return Optional instead of passing in @output list

* Update dependency locks

* Fix TestDataBuilder after merge

* Make tests green

- Fix a small bug in CassandraGetCandidateCellsForSweepingImpl

- Remove an irrelevant test (should_not_oom_when_...) and make
  CassKVSeSweepTaskRunnerIntegrationTest non-parameterized.
  (getRangeOfTimestamps() is not used in the sweep code path anymore)

- Relax assertions in testSweepBatchesUpToDeleteBatchSize

* Bring back the old variable name to make checkstyle happy

* Run ./gradlew generateLock saveLock

* Reapply changes

* Fix compile errors

* Simplify getCandidateCellsForSweeping() (#2439)

* Simplify getCandidateCellsForSweeping()

Initially I hoped to take advantage of in-database filtering if we get
to implement the "transaction table sweeping" feature. However, seems
like that wasn't a great decision on my part - unclear when (and if) we
get that feature done, and also how much of improvement we would
actually get. The extra logic makes the code significantly more complex,
so I think we need to back off and give up the idea.

This doesn't invalidate the sweep rewrite project. The new impls still
bring significant performance improvements.

* The big sweep rewrite, part 5: Oracle impl

- Extract the paging logic from the Postgres impl into a separate class
  named CandidatePagingState so that we could reuse it in the Oracle impl
  (and also test the logic with unit tests)

- Add a new benchmark fullTableScanOneWideRowThorough that simulates
  sweep of a very wide row. This is important for Oracle - a naive
  implementation would perform extremely poorly in that scenario.

* A bunch of refactors

- Make a builder for FullQuery

- Rename RangeBoundPredicates -> RangePredicateHelper and make it
  append to a FullQuery.Builder instead of returning a string + args.

- Make DbKvsGetCandidateCellsForSweeping a concrete class with common
  logic. Now, database-specific logic is behind the CellTsPairLoader
  interface.

- Add full test coverage for CandidatePagingState

* Tautological tests for RangePredicateHelper

* Import the "correct" assertThat

* Static classes

* Refactors

* CandidatePagingState: extract state-updating methods

Also, reorganised class

* Expand the comment in OracleCellTsPageLoaderFactory

Also filter out empty pages

* Fix release notes

* Fix release notes for real
@felixdesouza felixdesouza deleted the feature/simplify-get-candidates branch July 2, 2019 22:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants