-
Notifications
You must be signed in to change notification settings - Fork 15
DbKvs sweep refactors to address last PR comments #2497
Conversation
As dxiao pointed out, CandidatePagingState does too many things. The solution is to make CellTsPairLoader return an iterator and move all paging logic inside that iterator, and remove CellTsPairLoaderFactory. Then we replace CandidatePagingState with CandidateGroupingIterator, which now doesn't have any paging logic and only takes care of grouping (cell, ts) pairs by cell. Also remove CandidatePageJoiningIterator because it's not used anymore. We could have done this earlier but I forgot. Plan for the future: - Since we gave up on the in-database filtering idea, we can replace KVS.getCandidateCellsForSweeping() with a simpler call like getCellTsPairs() which would do what CellTsPairLoader does now. - Implement the new call for the remaining KVS's. We need at least the InMemoryKVS. We should decide the destiny of JdbcKVS and CqlKVS. - Remove all remaining usages of getRangeOfTimestamps(). I think that's basically deleteRange() on Cassandra. - Remove KVS.getRangeOfTimestamps()! [no release notes]
Codecov Report
@@ Coverage Diff @@
## develop #2497 +/- ##
=============================================
- Coverage 60.1% 60.02% -0.09%
+ Complexity 4411 4043 -368
=============================================
Files 857 857
Lines 40033 40023 -10
Branches 4079 4076 -3
=============================================
- Hits 24062 24023 -39
- Misses 14503 14532 +29
Partials 1468 1468
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To speed things along towards the LTS, I'm going to do the renames and potentially some further refactors myself.
Some questions might remain:
- Do we care that we go through an Iterator twice instead of once in order to make the code cleaner?
- Does the Iterator returned by DbKvs.getCandidateCellsForSweeping do an eager page load of the kind we were trying to avoid in OracleCellTsPageLoader.PageIterator?
@@ -204,7 +204,7 @@ private static DbKvs createOracle(ExecutorService executor, | |||
OverflowValueLoader overflowValueLoader = new OracleOverflowValueLoader(oracleDdlConfig, tableNameGetter); | |||
DbKvsGetRange getRange = new OracleGetRange( | |||
connections, overflowValueLoader, tableNameGetter, valueStyleCache, oracleDdlConfig); | |||
CellTsPairLoaderFactory cellTsPairLoaderFactory = new OracleCellTsPageLoaderFactory( | |||
CellTsPairLoader cellTsPairLoaderFactory = new OracleCellTsPageLoader( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename
query.getArgs()); | ||
} | ||
|
||
private void updateCountOfExaminedCellTsPairsInCurrentRow(List<CellTsPairInfo> results) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're iterating through the results twice - once to bundle each one into a CellTsPairInfo
, and again to figure out the count of cellTsPairsAlreadyExaminedInCurrentRow
.
Is it worth cutting this iteration time in half by updating cellTsPairsAlreadyExaminedInCurrentRow
within loadPage()
?
if (nextRow == null) { | ||
reachedEnd = true; | ||
} else { | ||
startRowInclusive = nextRow; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields go together very tightly. Consider extracting a Token
class to cover startRowInclusive
, startColInclusive
, startTsInclusive
, reachedEnd
, and cellTsPairsAlreadyExaminedInCurrentRow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not cells..currentRow
as only Oracle cares about this, and the Token could be used across Postgres and Oracle (and maybe Cassandra?)
|
||
public class DbKvsGetCandidateCellsForSweeping { | ||
|
||
private final CellTsPairLoaderFactory cellTsPairLoaderFactory; | ||
private final CellTsPairLoader cellTsPairLoaderFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found another non-rename. Intellij lets you do the rename automagically.
return Iterators.filter(new PageIterator(loader, state), page -> !page.isEmpty()); | ||
Iterator<List<CellTsPairInfo>> cellTsIter = cellTsPairLoaderFactory.createPageIterator(tableRef, request); | ||
Iterator<List<CandidateCellForSweeping>> rawIter = CandidateGroupingIterator.create(cellTsIter); | ||
return Iterators.filter(rawIter, page -> !page.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this Iterator do an eager page load of the kind we were trying to avoid in OracleCellTsPageLoader.PageIterator
?
Thanks for following through quickly with the refactor! |
having-next-ness is a state, not an argument
Well... two, actually. This one is a CellTsPairToken. We already had a Token class in DbKvs :-/
I don't think we can do it nicely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes I want to make - deferring review of this to @hsaraogi.
There's probably more we can do here (the two PageIterator classes are rather similar), but I don't think it's worth it at this stage.
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/sweep/CellTsPairToken.java, line 23 at r2 (raw file):
Value.Immutable? Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/sweep/CellTsPairToken.java, line 23 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
We'll have a builder that will make the various types of tokens easier to construct.. Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 116 at r2 (raw file):
final? Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 230 at r1 (raw file): Previously, gsheasby (Glenn Sheasby) wrote…
Discussed offline with Glenn, doesnt seem like this significantly effects the time complexity. Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 258 at r2 (raw file):
Lets add a message for the IllegalStateException here. Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 258 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Out of curiosity why do we need this check? Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 258 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Will be clearer to have the precondition in continueRow Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 95 at r2 (raw file):
This seems strange, an iterator checking hasNext in next and throwing? Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 95 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
We should either not throw here and return an empty list or just throw the exception (if any) we get when trying to load a page. Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 128 at r2 (raw file):
query -> fullQuery. Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 132 at r2 (raw file):
Rename to getFullQuery. or we have getQuery().getQuery(). Comments from Reviewable |
atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 182 at r2 (raw file):
Lets move the check to continueRow. Comments from Reviewable |
Reviewed 4 of 15 files at r1, 3 of 5 files at r2. Comments from Reviewable |
Review status: 4 of 13 files reviewed at latest revision, 12 unresolved discussions. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 95 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Why is it strange? This is how iterators are supposed to work. Try this: Collections.emptyList().iterator().next(); The only strange this about this is that a more common exception type to throw is NoSuchElementException - I am being lazy here and using Preconditions.checkState instead Comments from Reviewable |
Review status: 4 of 13 files reviewed at latest revision, 12 unresolved discussions. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/sweep/DbKvsGetCandidateCellsForSweeping.java, line 40 at r1 (raw file): Previously, gsheasby (Glenn Sheasby) wrote…
Yes, the filtering iterator indeed needs to load the next page when hasNext() is called. However, this is not a problem: every call to hasNext() will be followed by next(), so no work is wasted. The reason why I was trying to avoid that in PageIterators is that CandidateGroupingIterator calls PageIterator.hasNext() after each call to PageIterator.next(). Comments from Reviewable |
Review status: 3 of 13 files reviewed at latest revision, 11 unresolved discussions. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 116 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
No - it's updated by this class. This variable now encapsulates all the state. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 258 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Done atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 128 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Done atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 132 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Done. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/sweep/CellTsPairToken.java, line 23 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Done Comments from Reviewable |
Review status: 3 of 13 files reviewed at latest revision, 11 unresolved discussions. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/oracle/OracleCellTsPageLoader.java, line 258 at r2 (raw file):
We need this check because we increment the timestamp by one to find the next starting position. So if it is equal to Long.MAX_VALUE, it would overflow. This should never be the case, but we check anyway for paranoia reasons Comments from Reviewable |
Review status: 3 of 13 files reviewed at latest revision, 8 unresolved discussions. atlasdb-dbkvs/src/main/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/postgres/PostgresCellTsPageLoader.java, line 182 at r2 (raw file): Previously, hsaraogi (Himangi Saraogi) wrote…
Done Comments from Reviewable |
atlasdb-dbkvs/src/test/java/com/palantir/atlasdb/keyvalue/dbkvs/impl/sweep/CandidateGroupingIteratorTest.java, line 104 at r3 (raw file):
Should be verify that ts is sorted here? Comments from Reviewable |
Some very small comments, looks good overall. Comments from Reviewable |
Reviewed 9 of 15 files at r1, 2 of 5 files at r2, 3 of 3 files at r3. Comments from Reviewable |
As dxiao pointed out, CandidatePagingState does too many things. The
solution is to make CellTsPairLoader return an iterator and move all
paging logic inside that iterator, and remove CellTsPairLoaderFactory.
Then we replace CandidatePagingState with CandidateGroupingIterator,
which now doesn't have any paging logic and only takes care of grouping
(cell, ts) pairs by cell.
Also remove CandidatePageJoiningIterator because it's not used anymore.
We could have done this earlier but I forgot.
Plan for the future:
KVS.getCandidateCellsForSweeping() with a simpler call like
getCellTsPairs() which would do what CellTsPairLoader does now.
the InMemoryKVS. We should decide the destiny of JdbcKVS and CqlKVS.
basically deleteRange() on Cassandra.
[no release notes]
Priority (whenever / two weeks / yesterday):
Ideally before we cut for the LTS
This change is