[QoS] Qos ete test #2708

hsaraogi · 2017-11-17T17:07:24Z

Goals (and why): One ete test, more coming in but we can merge this in.

Priority (whenever / two weeks / yesterday): today

This change is

…re and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates

* SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting.

* Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage

* [no release notes] Drive-by add test for when a batch is full

…#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes

* Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes]

* Release notes banners * fix pr numbers

* Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments

* thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile

…ng (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle

* [no release notes] Document behaviour regarding index rows * fix compile bug * ``List``

* Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas

* 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list

* check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle

* Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit

* Instrument CqlExecutor * [no release notes]

* Upgrade to newer Awaitility. * locks [no release notes] * unused import

* Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks

* Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs

* Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile

* Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick

gsheasby

Mostly checked the new "Qos...TestSuite" classes.

gsheasby · 2017-11-22T16:36:11Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadTestSuite.java

+
+        serializableTransactionManager
+                .runTaskWithRetry((transaction) -> {
+                    writeNTodosOfSize(transaction, 200, 1_000);


nit, ONE_TODO_SIZE_IN_BYTES is derived from this (plus a small constant overhead)

gsheasby · 2017-11-22T16:37:22Z

...ndra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraKeyValueServiceImpl.java

            } catch (Exception e) {
-                throw Throwables.unwrapAndThrowAtlasDbDependencyException(e);
+                throw QosAwareThrowables.unwrapAndThrowRateLimitExceededOrAtlasDbDependencyException(e);


do we need lines 2475-2477 as well?

gsheasby · 2017-11-22T16:42:24Z

...sdb-config/src/test/java/com/palantir/atlasdb/http/RateLimitExceededExceptionMapperTest.java

-
-import com.palantir.atlasdb.qos.ratelimit.RateLimitExceededException;
-
-public class RateLimitExceededExceptionMapperTest {


Did you mean to make this go away?

gsheasby · 2017-11-22T16:43:54Z

atlasdb-ete-tests/src/main/java/com/palantir/atlasdb/AtlasDbEteServer.java

        }
    }

-    private TransactionManager createTransactionManagerWithRetry(AtlasDbConfig config, Environment environment)
+    private TransactionManager createTransactionManagerWithRetry(AtlasDbConfig config, Environment environment,


For these methods it's super weird to have the environment parameter sandwiched in between two config parameters.

gsheasby · 2017-11-22T16:44:40Z

atlasdb-ete-tests/src/main/java/com/palantir/atlasdb/AtlasDbEteServer.java

        return TransactionManagers.builder()
                .config(config)
                .schemas(ETE_SCHEMAS)
                .registrar(environment.jersey()::register)
                .userAgent("ete test")
+                .runtimeConfigSupplier(() -> atlasDbRuntimeConfigOptional)


@jeremyk-91 @nziebart Just checking: are these comments now resolved elsewhere?

gsheasby · 2017-11-22T16:50:03Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadTestSuite.java

+    @Test
+    public void shouldBeAbleToReadLargeAmountsExceedingTheLimitSecondTimeWithSoftLimiting() {
+        assertThat(readOneBatchOfSize(12)).hasSize(12);
+        // The second read might actually be faster as the transaction/metadata


as the transaction/metadata.... what? Is cached?

Looks like we don't check that any soft limiting actually happened here. Not sure if we can.

Yup, cached.

gsheasby · 2017-11-22T16:51:03Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadTestSuite.java

+        List<Future<List<Todo>>> futures = new ArrayList<>(numReadsPerThread);
+
+        long start = System.nanoTime();
+        IntStream.range(0, numReadsPerThread)


[test correctness issue] Typo, this should be numThreads.

gsheasby · 2017-11-22T16:54:08Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteTestSuite.java

+import com.palantir.atlasdb.todo.Todo;
+import com.palantir.atlasdb.todo.TodoResource;
+
+public class QosCassandraWriteTestSuite extends EteSetup {


Why does this one extend EteSetup whereas the read suite doesn't?

gsheasby · 2017-11-22T16:55:35Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteTestSuite.java

+        ForkJoinPool threadPool = new ForkJoinPool(100);
+        List<Future<?>> futures = Lists.newArrayList();
+
+        IntStream.range(0, 100).parallel().forEach(i ->


there's a 👍 , yet we still have parallel...

gsheasby · 2017-11-22T16:56:04Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteTestSuite.java

+        assertThat(exceptionCounter.get()).isGreaterThan(90);
+    }
+
+    @Test


I would say we don't need both of these tests any more.

jeremyk-91

Great tests!

Let's discuss this in person, I have some concerns about using NANOSECONDS.toSeconds() and then making assertions that do math on top of that value - depending on the time intervals there can be a considerable loss of accuracy

jeremyk-91 · 2017-11-23T10:23:46Z

atlasdb-ete-tests/src/main/java/com/palantir/atlasdb/AtlasDbEteServer.java

        return TransactionManagers.builder()
                .config(config)
                .schemas(ETE_SCHEMAS)
                .registrar(environment.jersey()::register)
                .userAgent("ete test")
+                .runtimeConfigSupplier(() -> atlasDbRuntimeConfigOptional)


As Himangi mentions we aren't dealing with 2 for now.

1 is done - see QosRateLimiter https://github.com/palantir/atlasdb/blob/feature/qos-service-api/qos-service-impl/src/main/java/com/palantir/atlasdb/qos/ratelimit/QosRateLimiter.java.

Aside: the concurrency in that class feels a bit weird (I think it is correct, but could result in multiple creations on a change - though last-one-wins is probably fine), but that's not the scope of this.

jeremyk-91 · 2017-11-23T10:31:48Z

atlasdb-ete-tests/src/main/java/com/palantir/atlasdb/AtlasDbEteServer.java

-            return createTransactionManagerWithRetry(config.getAtlasDbConfig(), environment);
+            return createTransactionManagerWithRetry(config.getAtlasDbConfig(), config.getAtlasDbRuntimeConfig(),
+                    environment
+            );


nit: The ordering on the factories should be consistent (i.e. install/runtime/environment, or install/environment/runtime). I generally prefer install/runtime/environment as this is closer to internal server framework's ordering

The ordering currently is install/runtime/environment.

Not on createTransactionManager.

jeremyk-91 · 2017-11-23T10:34:38Z

atlasdb-ete-tests/src/main/java/com/palantir/atlasdb/AtlasDbEteConfiguration.java


-    public AtlasDbEteConfiguration(@JsonProperty("atlasdb") AtlasDbConfig atlasdb) {
+    public AtlasDbEteConfiguration(@JsonProperty("atlasdb") AtlasDbConfig atlasdb,
+            @JsonProperty("atlasdbRuntime") Optional<AtlasDbRuntimeConfig> atlasDbRuntimeConfig) {


nit: probably atlasDbRuntime for consistency?

jeremyk-91 · 2017-11-23T10:36:47Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraEteTestSetup.java

+    static final int readBytesPerSecond = 10_000;
+    static final int writeBytesPerSecond = 10_000;
+    private static final int CASSANDRA_PORT_NUMBER = 9160;
+    static final int MAX_SOFT_LIMITING_SLEEP_MILLIS = 2000;


In general I think the non-private ones should probably be protected rather than package visibility (the qos cassandra read test works because it's in the same package, but if I extend this elsewhere I probably want to be able to access some of these).

Intellij kept warning as this is not actually extended anywhere outside the package. Dont feel very strongly, but should we leave it as it for now and change the visibility if its required to be used outside the package?

jeremyk-91 · 2017-11-23T10:38:21Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraEteTestSetup.java

+            transaction.put(TodoSchema.todoTable(), write);
+            return null;
+        });
+    }


(as before, protected?)

jeremyk-91 · 2017-11-23T10:49:38Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadEteTest.java

+
+        assertThatAllReadsWereSuccessful(futures, numReadsPerThread);
+        long actualReadRate = (numThreads * numReadsPerThread * ONE_TODO_SIZE_IN_BYTES) / timeTakenToReadInSeconds;
+        assertThat(actualReadRate).isLessThan(readBytesPerSecond + (readBytesPerSecond / 10 /* to allow burst time */));


The bursty-ness allows us to save up to five extra seconds of permits, so I think we can bound this by readBytesPerSecond * (time + 5) / time rather than have the 1/10th fudge factor.

In general I'd prefer if we used doubles for the math above though, to avoid losses of precision with integer division

jeremyk-91 · 2017-11-23T10:53:44Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteEteTest.java

+        writeNTodosOfSize(200, 1_000);
+        long secondWriteTime = stopwatch.elapsed(TimeUnit.MILLISECONDS);
+
+        assertThat(secondWriteTime).isGreaterThan(firstWriteTime);


Are the queries meant to be the same? It seems like the second query is roughly 10X larger than the first in terms of the number of bytes involved

jeremyk-91 · 2017-11-23T10:54:03Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteEteTest.java

+    public void shouldNotBeAbleToWriteLargeAmountsIfSoftLimitSleepWillBeMoreThanConfiguredBackoffTime() {
+        // Have one quick limit-exceeding write, as the rate-limiter
+        // will let anything pass through until the limit is exceeded.
+        writeNTodosOfSize(1, 100_000);


jeremyk-91 · 2017-11-23T10:55:59Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteEteTest.java

+
+    @Test
+    public void writeRateLimitShouldBeRespectedByConcurrentWritingThreads() throws InterruptedException {
+        int oneTodoSizeInBytes = 167;


nit: was hoping to have a programmatic way of generating this, though definitely non-blocking

I thought about that, but its not clean enough, it depends on the size of the encoded row, not easy to calculate directly, but we might be able to leverage ThriftObjectSizeUtils but that kind of defeats the purpose of a test, as we start using implementation details.

jeremyk-91 · 2017-11-23T10:57:02Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraWriteEteTest.java

+        assertThatAllWritesWereSuccessful(futures);
+        long actualWriteRate = (numThreads * numWritesPerThread * oneTodoSizeInBytes) / timeTakenToWriteInSeconds;
+        assertThat(actualWriteRate).isLessThan(
+                writeBytesPerSecond + (writeBytesPerSecond / 10 /* to allow burst time */));


As with the other test, I'm a bit concerned about the possible accuracy losses with converting the time to a whole number of seconds, and then integer division here

gsheasby

Remaining things that I can see:

Fix integer division issue
Fix argument order in AtlasDbEteServer.createTransactionManager
Possibly write one more test (big write limited, then small write OK).

gsheasby · 2017-11-23T12:52:12Z

atlasdb-ete-tests/src/main/java/com/palantir/atlasdb/AtlasDbEteServer.java

-            return createTransactionManagerWithRetry(config.getAtlasDbConfig(), environment);
+            return createTransactionManagerWithRetry(config.getAtlasDbConfig(), config.getAtlasDbRuntimeConfig(),
+                    environment
+            );


Not on createTransactionManager.

gsheasby · 2017-11-23T13:00:10Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadEteTest.java

+        assertThatThrownBy(() -> readOneBatchOfSize(200))
+                .isInstanceOf(RateLimitExceededException.class)
+                .hasMessage("Rate limited. Available capacity has been exhausted.");
+    }


We could also have a test to establish that we can be rate limited on one read and then have a smaller read succeed.

I dont think thats the behavior we will have, but yes we can a test for reads/writes in this order.

gsheasby · 2017-11-23T14:55:35Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadEteTest.java

@@ -81,6 +81,11 @@ public void shouldNotBeAbleToReadLargeAmountsIfSoftLimitSleepWillBeMoreThanConfi
        assertThatThrownBy(() -> readOneBatchOfSize(200))
                .isInstanceOf(RateLimitExceededException.class)
                .hasMessage("Rate limited. Available capacity has been exhausted.");
+
+        // TODO(hsaraogi): This should not happen.


Yeah... considering the current impl it does. Maybe we should have a triviality threshold?

It's always hard to predict though :)

jeremyk-91

Yep, think this looks good. Not sure about why there are 68 commits though :O but the change looks fine!

jeremyk-91 · 2017-11-23T18:50:10Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadEteTest.java

@@ -81,6 +81,11 @@ public void shouldNotBeAbleToReadLargeAmountsIfSoftLimitSleepWillBeMoreThanConfi
        assertThatThrownBy(() -> readOneBatchOfSize(200))
                .isInstanceOf(RateLimitExceededException.class)
                .hasMessage("Rate limited. Available capacity has been exhausted.");
+
+        // TODO(hsaraogi): This should not happen.


Yeah... considering the current impl it does. Maybe we should have a triviality threshold?

It's always hard to predict though :)

jeremyk-91 · 2017-11-23T19:19:00Z

atlasdb-ete-tests/src/test/java/com/palantir/atlasdb/ete/QosCassandraReadEteTest.java

+        assertThatAllReadsWereSuccessful(futures, numReadsPerThread);
+        double actualBytesRead = numThreads * numReadsPerThread * ONE_TODO_SIZE_IN_BYTES;
+        double maxReadBytesLimit = readBytesPerSecond * ((double) readTime / TimeUnit.SECONDS.toNanos(1)
+                + 5 /* to allow for rate-limiter burst */);


Fixed the comments.

* Extremely basic QosServiceResource * Make resource an interface * Add client PathParam * Clean up javax.ws.rs dependencies * Create stub for AtlasDbQosClient * Calls to checkLimit use up a credit; throw when out of credits * Add QosServiceResourceImpl + test * AutoDelegate for Cassandra.Client * Rename QosService stuff * Pass AtlasDbQosClient to CassandraClient * Check limit on multiget_slice * Check limit on batch_mutate * Don't test we aren't soft-limited while we can never be soft-limited * Check limit on remaining CassandraClient methods * Scheduled refresh of AtlasDbQosClient.credits * Refresh every second Once we have configurable quotas on the QoS service, they will be more understandable (per second rather than per-10-seconds). * Mount qos-service on Timelock * Checkstyle * Update dependency locks * Dont throw limitExceededException * Move client param around * Comment * Qos Service config (#2644) * Service config * Allow clients to run without configuring limits * simpler tests * [QoS] qos ete test (#2652) * checkpoint * checkpoint * working test * check passing * unused deps * [QoS] rate limiter (#2653) * rate limiting * update license and docs * [QoS] Feature/qos client (#2650) * Create one qosCLient for each service QosClientBuilder hooked up to KVS create Create the QosClient in CassandraClientPoolImpl if the config is specified. Create FakeQosClient if the config is not specified Cleanup get broken tests to pass * Locks * Fix failing tests * Add getNamespace [no release notes] * Create QosClient at the Top level * fix test * test and checkstyle fixes * locks * deps * fix tests * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * [QoS] QosClient with ratelimiter (#2667) * QosClient with ratelimiter * Checkstyle * locks * [QoS] Create a jaxrs-client for the integ tests (#2675) * Create a jaxrs-client for the integ tests * build fix * clean up * Nziebart/merge develop into qos (#2683) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * qos rate limiting (#2709) * rate limiting * [QoS] total time spent talking to Cassandra (#2687) * total-time * [QoS] Client config (#2690) * qos config * respect max backoff itme * [QoS] [Refactor] Query Weights (#2697) * query weights * extra tests * [QoS] Number of rows per query (#2698) * num rows * checkstyle * fix tests * no int casting * fix numRows calculation on batch_mutate * [QoS] CAS metrics (#2705) * cas metrics * exceptions (#2706) * [QoS] Guava license (#2703) * guava license * Cleanup: class reference * [QoS] live reload (#2710) * live reload and logging * millis * checkpoint * fix tests * comments * checkstyle * [QoS] Don't rate limit CAS (#2711) * dont limit cas * Remove tests of deleted method * Cherrypick/qos exception mapping (#2715) * very simple ratelimitexceededexception * Need to be able to throw RLEE directly from Cass, rather than ADDE(RLEE)s * fix bug with ADDE(RLEE) * Exception Mapper * unravel a bad javadoc * CR comments part 1 * lock lock * split qos aware throwables * visibility * fix compile break * checkstyle * handle exceptions properly * [QoS] Estimate the number of read bytes w/ number of rows (#2717) * Refactor the name of the functions * Estimate based on the number of rows * Fix modifiers on ThriftQueryWeighers * Add unit tests to estimation logic * ThriftQueryWeighers.multigetSlice takes a List, not number of rows * getRangeSlices takes KeyRange, not count * weight estimates (#2725) * [QoS] Fix exceptions thrown on CqlExecutor (#2696) * Address #2683 comments * Clarify query and add cause * Add just the cqlQuery.queryFormat * checkstyle * Update test We changed the error message... * [QoS] Qos ete test (#2708) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * rate limiting * total-time * qos config * respect max backoff itme * query weights * extra tests * num rows * checkstyle * fix tests * no int casting * Qos ete tests * shouldFailIfWritingTooManyBytes * fix test * rm file * Remove metrics * Test shouldFailIfReadingTooManyBytes * canBeWritingLargeNumberOfBytesConcurrently * checkstyle * cannotWriteLargeNumberOfBytesConcurrently * fix tests * create tm in test * More read tests (after writing a lot of data at once) * WIP * Tests that should pas * Actually update the rate * Add another test * More tests and address comments * Dont extend etesetup * Make dumping data faster * cleanup * wip * Add back lost file * Cleanup * Write tests * numReadsPerThread -> numThreads * More write tests, cleanup, check style fixes * Refactor to avoid code duplication * Cleanup * cr comments * Small read/write after a rate-limited read/write * annoying no new linw at eof * Uniform parameters for hard limiting * [QoS] Fix/qos system table rate limiting (#2739) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * rate limiting * total-time * qos config * respect max backoff itme * query weights * extra tests * num rows * checkstyle * fix tests * no int casting * Qos ete tests * shouldFailIfWritingTooManyBytes * fix test * rm file * Remove metrics * Test shouldFailIfReadingTooManyBytes * canBeWritingLargeNumberOfBytesConcurrently * checkstyle * cannotWriteLargeNumberOfBytesConcurrently * fix tests * create tm in test * More read tests (after writing a lot of data at once) * WIP * Tests that should pas * Actually update the rate * Add another test * More tests and address comments * Dont extend etesetup * Make dumping data faster * cleanup * wip * Add back lost file * Cleanup * Write tests * numReadsPerThread -> numThreads * More write tests, cleanup, check style fixes * Refactor to avoid code duplication * Cleanup * cr comments * Small read/write after a rate-limited read/write * annoying no new linw at eof * Uniform parameters for hard limiting * Don't consume any estimated bytes for a _transaction or metadata table query * Add tests * cr comments * Merge develop to the feature branch (#2741) * Merge develop * Re-delete CqlQueryUtils * Nziebart/cell timestamps qos (#2745) * handle qos exceptions in cell timestamp loader [no release notes] * actually just remove checked exception * Remove the throws in the method signature * Differentiate between read and write limits when logging (#2751) * Differentiate between read and write limits when logging * Type -> name * Use longs in the rate limiter and handle negative adjustments. (#2758) * Differentiate between read and write limits when logging * handle negative adjustments * More tests * pr comments

tboam and others added 30 commits November 8, 2017 10:32

Refactor TracingKVS (#2643)

c1e21ee

* Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage

Delete docs (#2657)

53afd91

[20 minute tasks] Add test for when a batch is full (#2655)

e25d806

* [no release notes] Drive-by add test for when a batch is full

MetricRegistry log level downgrade + multiple timestamp tracker tests (…

88e3ffe

…#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes

Extract interface for Cassandra client (#2660)

8b67855

* Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes]

client -> namespace [no release notes] (#2654)

c0c05f6

0.65.2 and 0.66.0 release notes (#2663)

fede1c3

* Release notes banners * fix pr numbers

[QoS] Add getNamespace to AtlasDBConfig (#2661)

a2be749

* Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments

[TTT] [no release notes] Document behaviour regarding index rows (#2658)

8fdd50b

* [no release notes] Document behaviour regarding index rows * fix compile bug * ``List``

Refactor and Instrument CassandraClient api (#2665)

6fed36f

* Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas

Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647)

e8e85f9

* 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list

check immutable ts (#2406)

74180cf

* check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle

Propagate top-level KVS method names to CassandraClient (#2669)

d4bf805

* Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit

Extract cql executor interface (#2670)

247f60c

* Instrument CqlExecutor * [no release notes]

bump awaitility (#2668)

913e5e7

* Upgrade to newer Awaitility. * locks [no release notes] * unused import

Correctly log Paxos events (#2674)

7fb4d17

* Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs

use supplier for object size [no release notes]

22e129a

fix merge

757a282

fix merge in AtlasDbConfig

a45809d

rate limiting

e3bd685

total-time

dd55403

qos config

2318f69

respect max backoff itme

c7dff29

cleanup

2f49e0e

hsaraogi changed the base branch from fix/qos-ete-base to feature/qos-service-api November 22, 2017 16:39

hsaraogi added 3 commits November 22, 2017 16:41

wip

be06b19

Add back lost file

d5b5a8a

Cleanup

3f080db

gsheasby suggested changes Nov 22, 2017

View reviewed changes

Write tests

504f345

hsaraogi force-pushed the qos-ete-test branch from 441e0b4 to 504f345 Compare November 22, 2017 17:04

hsaraogi added 3 commits November 22, 2017 17:09

numReadsPerThread -> numThreads

59e2b6b

More write tests, cleanup, check style fixes

35327a9

Refactor to avoid code duplication

b04b4fd

hsaraogi force-pushed the qos-ete-test branch from e2dbc9f to 80f4ccf Compare November 22, 2017 19:27

Cleanup

c9ebe6e

hsaraogi force-pushed the qos-ete-test branch 2 times, most recently from 48996d0 to c9ebe6e Compare November 22, 2017 19:32

jeremyk-91 reviewed Nov 23, 2017

View reviewed changes

gsheasby suggested changes Nov 23, 2017

View reviewed changes

cr comments

af9d67c

hsaraogi force-pushed the qos-ete-test branch from e0acbc3 to af9d67c Compare November 23, 2017 14:05

gsheasby approved these changes Nov 23, 2017

View reviewed changes

hsaraogi added 2 commits November 23, 2017 14:49

Small read/write after a rate-limited read/write

6f483d9

annoying no new linw at eof

dd18730

gsheasby reviewed Nov 23, 2017

View reviewed changes

Uniform parameters for hard limiting

be0759c

jeremyk-91 approved these changes Nov 23, 2017

View reviewed changes

Merge branch 'feature/qos-service-api' into qos-ete-test

304f4ed

hsaraogi merged commit d2d7b18 into feature/qos-service-api Nov 24, 2017

hsaraogi deleted the qos-ete-test branch November 24, 2017 11:01


		import com.palantir.atlasdb.qos.ratelimit.RateLimitExceededException;

		public class RateLimitExceededExceptionMapperTest {

[QoS] Qos ete test #2708

[QoS] Qos ete test #2708

Conversation

hsaraogi commented Nov 17, 2017 • edited by jboreiko Loading

gsheasby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyk-91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyk-91 Nov 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsheasby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyk-91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsaraogi commented Nov 17, 2017 •

edited by jboreiko

Loading

jeremyk-91 Nov 23, 2017 •

edited

Loading