Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

compare cassandra revert to 0.60.0 #2522

Closed
wants to merge 11 commits into from

Conversation

nziebart
Copy link
Contributor

@nziebart nziebart commented Oct 19, 2017

Goals (and why):

Implementation Description (bullets):

Concerns (what feedback would you like?):

Where should we start reviewing?:

Priority (whenever / two weeks / yesterday):


This change is Reviewable

gsheasby and others added 11 commits October 13, 2017 17:45
* The big sweep rewrite, part 3: Postgres impl (#2021)

* The big sweep rewrite, part 3: Postgres impl

- Implement KeyValueService.getCandidateCellsForSweeping() for Postgres

- Add AgnosticResultRow.getBoolean() and LightAgnosticResultRow.getArray()

* Address CR comments

Also change the stopping condition in computeNext()

* Add a release note

* Add the ORDER BY that I forgot

* Rewrite a fragile test

* Join pages based on the number of (cell, ts) pairs

* Move release note to develop

* The big sweep rewrite, part 4: Cassandra impl (#2231)

* The big sweep rewrite, part 4: Cassandra impl

Iterate over cells in a table without ever loading more than a given
number at once regardless of how wide the rows are, but still perform
efficiently on narrow tables.

This should finally allow us to run sweep safely on
Cassandra with decent performance.

* Address clockfort's comments + checkstyle + add release notes

* Address code review comments

* Split up test cases in SingleRowColumnPagerTest

* Tests for KeyRanges

* Update locks

* Return Optional instead of passing in @output list

* Update dependency locks

* Fix TestDataBuilder after merge

* Make tests green

- Fix a small bug in CassandraGetCandidateCellsForSweepingImpl

- Remove an irrelevant test (should_not_oom_when_...) and make
  CassKVSeSweepTaskRunnerIntegrationTest non-parameterized.
  (getRangeOfTimestamps() is not used in the sweep code path anymore)

- Relax assertions in testSweepBatchesUpToDeleteBatchSize

* Bring back the old variable name to make checkstyle happy

* Run ./gradlew generateLock saveLock

* Reapply changes

* Fix compile errors

* Simplify getCandidateCellsForSweeping() (#2439)

* Simplify getCandidateCellsForSweeping()

Initially I hoped to take advantage of in-database filtering if we get
to implement the "transaction table sweeping" feature. However, seems
like that wasn't a great decision on my part - unclear when (and if) we
get that feature done, and also how much of improvement we would
actually get. The extra logic makes the code significantly more complex,
so I think we need to back off and give up the idea.

This doesn't invalidate the sweep rewrite project. The new impls still
bring significant performance improvements.

* The big sweep rewrite, part 5: Oracle impl

- Extract the paging logic from the Postgres impl into a separate class
  named CandidatePagingState so that we could reuse it in the Oracle impl
  (and also test the logic with unit tests)

- Add a new benchmark fullTableScanOneWideRowThorough that simulates
  sweep of a very wide row. This is important for Oracle - a naive
  implementation would perform extremely poorly in that scenario.

* A bunch of refactors

- Make a builder for FullQuery

- Rename RangeBoundPredicates -> RangePredicateHelper and make it
  append to a FullQuery.Builder instead of returning a string + args.

- Make DbKvsGetCandidateCellsForSweeping a concrete class with common
  logic. Now, database-specific logic is behind the CellTsPairLoader
  interface.

- Add full test coverage for CandidatePagingState

* Tautological tests for RangePredicateHelper

* Import the "correct" assertThat

* Static classes

* Refactors

* CandidatePagingState: extract state-updating methods

Also, reorganised class

* Expand the comment in OracleCellTsPageLoaderFactory

Also filter out empty pages

* Fix release notes

* Fix release notes for real
* Exclude PostgresDbPersistenceTimelockTestSuite from ete tests

* Rename test as the regex is case sensitive
[no release notes]
* Better logging for internal customer

* Update docs

* Copy over #2444
* change version number

* Clarify 0.60.0 release

* Delete stray line

* Touch up release notes
* sweep full table

* fix tests

* logsafe

* release notes

* pr comments

* docs

* unnecessary method call

* checkstyle

* fix build

* fix build

* javadocs

* Random logging bits
* DbKvs sweep refactors to address last PR comments

As dxiao pointed out, CandidatePagingState does too many things. The
solution is to make CellTsPairLoader return an iterator and move all
paging logic inside that iterator, and remove CellTsPairLoaderFactory.
Then we replace CandidatePagingState with CandidateGroupingIterator,
which now doesn't have any paging logic and only takes care of grouping
(cell, ts) pairs by cell.

Also remove CandidatePageJoiningIterator because it's not used anymore.
We could have done this earlier but I forgot.

Plan for the future:
- Since we gave up on the in-database filtering idea, we can replace
  KVS.getCandidateCellsForSweeping() with a simpler call like
  getCellTsPairs() which would do what CellTsPairLoader does now.
- Implement the new call for the remaining KVS's. We need at least
  the InMemoryKVS. We should decide the destiny of JdbcKVS and CqlKVS.
- Remove all remaining usages of getRangeOfTimestamps(). I think that's
  basically deleteRange() on Cassandra.
- Remove KVS.getRangeOfTimestamps()!

[no release notes]

* Renames

* Create OracleCellTsPageLoader.PageIterator.Token class

* Create PostgresCellTsPageLaoder.PageIterator.Token

* Fix Precondition

having-next-ness is a state, not an argument

* One Token To Rull Them All

Well... two, actually. This one is a CellTsPairToken. We already had a
Token class in DbKvs :-/

* Make CellTsPairToken final; replace modification with factory methods

* Have computeNextStartPosition return a token

* Remove the TODO

I don't think we can do it nicely

* empty commit

* Nits

* Move duplicated precondition to CellTsPairToken and add error message
* Renames

* Make CellTsPairToken immutable

* Sort timestamps array, just in case
Increase Cell Ts pairs to 1024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants