[QoS] Feature/qos meters #2640

hsaraogi · 2017-11-06T18:06:59Z

Goals (and why): Meters for recording client activity - specifically number of reads and writes, and bytes reads and written.

Implementation Description (bullets): meters for counts, histograms for byte counts.

TODO:

We may also want cellsRead and cellsWritten, per @nziebart's comments on the internal planning doc.
Tests for ThriftObjectSizeUtils.

Concerns (what feedback would you like?): These metrics maybe too big (to send over the network), how do we calculate the effect of these on rate-limiting? Should we have a computation on the client side that aggregates the effects of the metrics and sends scores to the server periodically?

Where should we start reviewing?: QosMetrics.java

Priority (whenever / two weeks / yesterday): soon

This change is

schlosna · 2017-11-06T18:15:38Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/QosMetrics.java

+
+        //TODO: The histogram should have a sliding window reservoir.
+        bytesRead = metricsManager.registerHistogram(QosMetrics.class, "bytesRead");
+        bytesWritten = metricsManager.registerHistogram(QosMetrics.class, "bytesWritten");


if we go sliding time window reservoir, definitely need to consider what the memory bounds overhead of that will require.

What will the sliding window reservoir give us? Does that affect the cumulative counts or just things like m15?

I want to keep the metrics info for the last 30s or so around. The exponentially decaying reservoir .getSnapshot gives us quantiles, which are not easy to reason about.

I was thinking if we could rate limit based on simple metrics like total bytes read/written in the last 30s? These are difficult to extract from an exponentialDecayResorvoir while slidingWindowReservoir.getSnapshot gives us this information for free.

Do we really need the Snapshot even? We definitely want the sum of X in a window but its not clear to me that we want the quantiles, which is the only reason we'd need to keep the Reservoir around.

The question we're interested in here is: what is the current rate at which I'm reading / writing data? This can probably be answered by just having a cumulative count (Counter probably works, though Meter gives some extra info like mean rate and m5 rate for free).

It would also be nice to have a histogram showing average amount of data per query, but don't think we have a need for that now for QoS.

Switched to meters here.

gsheasby · 2017-11-06T18:40:32Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+
+    private int getApproximateReadByteCount(Map<ByteBuffer, List<ColumnOrSuperColumn>> result) {
+        return result.entrySet().stream()
+                .mapToInt((entry) -> entry.getKey().array().length + entry.getValue().stream()


this is hard to reason about. Perhaps:
.mapToInt(entry -> getRowKeySize(entry) + getValueSize(entry))

Also noting that we briefly discussed verbally whether the computational overhead of this is low enough. We think so.

gsheasby · 2017-11-06T18:42:38Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+        // SELECT %s, %s FROM %s WHERE key=%s;
+
+        // Or for sweep?
+        // Should we consider all of these reads when recording metrics?


I think we should examine the CqlResult object returned. We can probably use the size of the rows within this object.

gsheasby · 2017-11-06T18:43:29Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/QosMetrics.java

+import com.codahale.metrics.Meter;
+import com.palantir.atlasdb.util.MetricsManager;
+
+public class QosMetrics {


gsheasby · 2017-11-06T18:43:59Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/QosMetrics.java

+    private final Meter readRequestCount;
+    private final Meter writeRequestCount;
+    private final Histogram bytesRead;
+    private final Histogram bytesWritten;


We may also want cellsRead and cellsWritten, per @nziebart's comments on the internal planning doc.

nziebart · 2017-11-06T22:12:41Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+    private int getApproximateReadByteCount(Map<ByteBuffer, List<ColumnOrSuperColumn>> result) {
+        return result.entrySet().stream()
+                .mapToInt((entry) -> entry.getKey().array().length + entry.getValue().stream()
+                        .mapToInt(columnOrSuperColumn -> SerializationUtils.serialize(columnOrSuperColumn).length)


I see @gsheasby commented that this should be cheap, but I think we can get the raw data pretty easily here (Column#name and Column#value are just ByteBuffers)

A columnOrSuperColumn has four fields:

public Column column; // optional public SuperColumn super_column; // optional public CounterColumn counter_column; // optional public CounterSuperColumn counter_super_column; // optional

Column has:

public ByteBuffer name; // required public ByteBuffer value; // optional public long timestamp; // optional public int ttl; // optional

SuperColumn has:

public ByteBuffer name; // required public List<Column> columns; // required

CounterColumn has:

public ByteBuffer name; // required public long value; // required

CounterSuperColumn has:

public ByteBuffer name; // required public List<CounterColumn> columns; // required

It will be lengthy (not difficult) to extract Bytebuffers to calculate the size. Happy to do that if its better in tems of perf.

Also, note that serialization here does not give us exact size in bytes but only an estimate, but I think the relative sizes matter and not the absolute ones, so this might be fine.

nziebart · 2017-11-06T22:13:44Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

@@ -75,22 +93,58 @@ public void batch_mutate(Map<ByteBuffer, Map<String, List<Mutation>>> mutation_m
            throws InvalidRequestException, UnavailableException, TimedOutException, TException {
        checkLimitAndCall(() -> {
            super.batch_mutate(mutation_map, consistency_level);
+
+            qosMetrics.updateWriteCount();
+            qosMetrics.updateBytesWritten(getApproximateWriteByteCount(mutation_map));


we could combine these two steps in a QosMetrics#recordBytesWritten method (since they always need to happen together)

nziebart · 2017-11-06T22:14:46Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+                    int approximateBytesInStrings = singleMap.keySet().stream().mapToInt(String::length).sum();
+                    int approximateBytesInMutations = singleMap.values().stream()
+                            .mapToInt(listOfMutations -> listOfMutations.stream()
+                                    .mapToInt(mutation -> SerializationUtils.serialize(mutation).length)


again would prefer not serializing if we can avoid it

A mutation again has:

public ColumnOrSuperColumn column_or_supercolumn; // optional
public Deletion deletion; // optional

We can resuse the ColumnOrSuperColumn calculation here, will be happy to do if its better.

codecov-io · 2017-11-07T11:31:37Z

Codecov Report

Merging #2640 into feature/qos-service-api will increase coverage by 0.04%.
The diff coverage is 73.75%.

@@                      Coverage Diff                      @@
##             feature/qos-service-api    #2640      +/-   ##
=============================================================
+ Coverage                      60.38%   60.43%   +0.04%     
  Complexity                      4715     4715              
=============================================================
  Files                            875      877       +2     
  Lines                          40132    40268     +136     
  Branches                        4026     4027       +1     
=============================================================
+ Hits                           24233    24335     +102     
- Misses                         14409    14439      +30     
- Partials                        1490     1494       +4

Impacted Files	Coverage Δ	Complexity Δ
...antir/atlasdb/qos/ratelimit/SmoothRateLimiter.java	`94.28% <ø> (ø)`	`0 <0> (ø)`	⬇️
...om/palantir/atlasdb/qos/ratelimit/RateLimiter.java	`58.92% <ø> (ø)`	`0 <0> (ø)`	⬇️
...palantir/atlasdb/qos/ratelimit/QosRateLimiter.java	`85.71% <ø> (ø)`	`0 <0> (ø)`	⬇️
...alantir/atlasdb/keyvalue/cassandra/QosMetrics.java	`100% <100%> (ø)`	`0 <0> (?)`
...java/com/palantir/atlasdb/util/MetricsManager.java	`100% <100%> (ø)`	`14 <1> (+1)`	⬆️
...asdb/keyvalue/cassandra/ThriftObjectSizeUtils.java	`68.53% <68.53%> (ø)`	`0 <0> (?)`
...ir/atlasdb/keyvalue/cassandra/CassandraClient.java	`80% <75%> (-12.86%)`	`0 <0> (ø)`
.../atlasdb/transaction/impl/RecomputingSupplier.java	`66.66% <0%> (-19.05%)`	`5% <0%> (-1%)`
...a/com/palantir/common/base/BatchingVisitables.java	`75.12% <0%> (-1.04%)`	`18% <0%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22b1e7a...21125a6. Read the comment docs.

gsheasby · 2017-11-07T11:40:39Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

-        return checkLimitAndCall(() -> super.execute_cql3_query(query, compression, consistency));
+        CqlResult cqlResult = checkLimitAndCall(() -> super.execute_cql3_query(query, compression, consistency));
+        try {
+            recordBytesRead(SerializationUtils.serialize(cqlResult).length);


feels like we want to do something smarter here than serializing the whole thing...

hsaraogi · 2017-11-07T10:56:40Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+    private int getApproximateReadByteCount(Map<ByteBuffer, List<ColumnOrSuperColumn>> result) {
+        return result.entrySet().stream()
+                .mapToInt((entry) -> entry.getKey().array().length + entry.getValue().stream()
+                        .mapToInt(columnOrSuperColumn -> SerializationUtils.serialize(columnOrSuperColumn).length)


A columnOrSuperColumn has four fields:

public Column column; // optional public SuperColumn super_column; // optional public CounterColumn counter_column; // optional public CounterSuperColumn counter_super_column; // optional

Column has:

public ByteBuffer name; // required public ByteBuffer value; // optional public long timestamp; // optional public int ttl; // optional

SuperColumn has:

public ByteBuffer name; // required public List<Column> columns; // required

CounterColumn has:

public ByteBuffer name; // required public long value; // required

CounterSuperColumn has:

public ByteBuffer name; // required public List<CounterColumn> columns; // required

It will be lengthy (not difficult) to extract Bytebuffers to calculate the size. Happy to do that if its better in tems of perf.

Also, note that serialization here does not give us exact size in bytes but only an estimate, but I think the relative sizes matter and not the absolute ones, so this might be fine.

hsaraogi · 2017-11-07T11:04:36Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+                    int approximateBytesInStrings = singleMap.keySet().stream().mapToInt(String::length).sum();
+                    int approximateBytesInMutations = singleMap.values().stream()
+                            .mapToInt(listOfMutations -> listOfMutations.stream()
+                                    .mapToInt(mutation -> SerializationUtils.serialize(mutation).length)


A mutation again has:

public ColumnOrSuperColumn column_or_supercolumn; // optional
public Deletion deletion; // optional

We can resuse the ColumnOrSuperColumn calculation here, will be happy to do if its better.

hsaraogi · 2017-11-07T11:19:22Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+            qosMetrics.updateReadCount();
+            List<KeySlice> range_slices = super.get_range_slices(column_parent, predicate, range, consistency_level);
+            int approximateBytesRead = range_slices.stream()
+                    .mapToInt(keySlice -> SerializationUtils.serialize(keySlice).length)


keyslice is also:

public ByteBuffer key; // required public List<ColumnOrSuperColumn> columns; // required

hsaraogi · 2017-11-07T11:22:03Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+        // SELECT %s, %s FROM %s WHERE key=%s;
+
+        // Or for sweep?
+        // Should we consider all of these reads when recording metrics?


hsaraogi · 2017-11-07T11:43:08Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/QosMetrics.java

+
+        //TODO: The histogram should have a sliding window reservoir.
+        bytesRead = metricsManager.registerHistogram(QosMetrics.class, "bytesRead");
+        bytesWritten = metricsManager.registerHistogram(QosMetrics.class, "bytesWritten");


I want to keep the metrics info for the last 30s or so around. The exponentially decaying reservoir .getSnapshot gives us quantiles, which are not easy to reason about.

I was thinking if we could rate limit based on simple metrics like total bytes read/written in the last 30s? These are difficult to extract from an exponentialDecayResorvoir while slidingWindowReservoir.getSnapshot gives us this information for free.

tboam

I'm wondering if CassandraClient is doing too many things here and we should have 2 delegates - one for checking the client limits, and one recording data read/written.

You could also convince me that we should be recording reads/writes requested vs reads/writes executed and this is better done in a single class.

tboam · 2017-11-08T11:21:06Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+        }
+    }
+
+    private int getApproximateWriteByteCount(Map<ByteBuffer, Map<String, List<Mutation>>> mutation_map) {


Might be good to get the Cassandra benchmarks on here as soon as possible, interested to see if streaming through all these objects add any overhead

Yeah it may be better to just use regular for loops here, since this is going to be a hot codepath

Also, might be nice to refactor out the size calculation logic into a static helper class

Refactored to a helper class ThriftObjectSizeUtils. We are no longer serializing objects to calculate the size in bytes.

tboam · 2017-11-08T11:23:08Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+                .sum();
+        try {
+            recordBytesRead(approximateBytesRead);
+        } catch (Exception e) {


I would really hope that recording metrics shouldn't throw an Exception for any reason. Why do we need this?

tboam · 2017-11-08T11:29:33Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/QosMetrics.java

+
+        //TODO: The histogram should have a sliding window reservoir.
+        bytesRead = metricsManager.registerHistogram(QosMetrics.class, "bytesRead");
+        bytesWritten = metricsManager.registerHistogram(QosMetrics.class, "bytesWritten");


Do we really need the Snapshot even? We definitely want the sum of X in a window but its not clear to me that we want the quantiles, which is the only reason we'd need to keep the Reservoir around.

tboam

Nearly there I think. I agree with Nathan that some of the methods would be clearer without using lambdas

tboam · 2017-11-09T10:05:40Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+                predicate, consistency_level);
+        try {
+            recordBytesRead(getApproximateReadByteCount(result));
+        } catch (Exception e) {


As before I don't think there should be any need to catch exceptions from the metrics code so we can remove all of these.

What if there is a NullPointerException or something wrong in our metrics calculation, we will end up failing Cassandra operations if we dont catch these. Ideally, we dont want to throw an exception if recording metrics throws, right.

Let's move the try/catch into the recordBytesRead method

tboam · 2017-11-09T10:39:16Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+        qosMetrics.updateBytesRead(numBytesRead);
+    }
+
+    private void recordBytesWritten(long numBytesWitten) {


nit: numBytesWritten

tboam

Have discussed being defensive and catching all Exceptions for now and will move to testing and removing this later (once we need to get the number of X for rate limiting.

nziebart

Good to merge, couple comments that could be addressed now or in a later PR

nziebart · 2017-11-09T12:18:58Z

atlasdb-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/CassandraClient.java

+                predicate, consistency_level);
+        try {
+            recordBytesRead(getApproximateReadByteCount(result));
+        } catch (Exception e) {


Let's move the try/catch into the recordBytesRead method

nziebart · 2017-11-09T12:19:24Z

...b-cassandra/src/main/java/com/palantir/atlasdb/keyvalue/cassandra/ThriftObjectSizeUtils.java

+                + getSliceRangeCountSize();
+    }
+
+    private static long getCqlMetadataSize(CqlMetadata schema) {


Good to see we're being thorough here. However some of these sizes could probably be removed for simplicity. I think we are mainly interested in the sizes of the column names and values we read and write, and not so much the extra metadata included in the request/response objects. (Fine to merge as is for now though)

Merging as is for now, with most code paths tested.

* Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license

Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes

* Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick

* Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig

* Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * rate limiting * total-time * qos config * respect max backoff itme * query weights * extra tests * num rows * checkstyle * fix tests * no int casting * Qos ete tests * shouldFailIfWritingTooManyBytes * fix test * rm file * Remove metrics * Test shouldFailIfReadingTooManyBytes * canBeWritingLargeNumberOfBytesConcurrently * checkstyle * cannotWriteLargeNumberOfBytesConcurrently * fix tests * create tm in test * More read tests (after writing a lot of data at once) * WIP * Tests that should pas * Actually update the rate * Add another test * More tests and address comments * Dont extend etesetup * Make dumping data faster * cleanup * wip * Add back lost file * Cleanup * Write tests * numReadsPerThread -> numThreads * More write tests, cleanup, check style fixes * Refactor to avoid code duplication * Cleanup * cr comments * Small read/write after a rate-limited read/write * annoying no new linw at eof * Uniform parameters for hard limiting

* Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * rate limiting * total-time * qos config * respect max backoff itme * query weights * extra tests * num rows * checkstyle * fix tests * no int casting * Qos ete tests * shouldFailIfWritingTooManyBytes * fix test * rm file * Remove metrics * Test shouldFailIfReadingTooManyBytes * canBeWritingLargeNumberOfBytesConcurrently * checkstyle * cannotWriteLargeNumberOfBytesConcurrently * fix tests * create tm in test * More read tests (after writing a lot of data at once) * WIP * Tests that should pas * Actually update the rate * Add another test * More tests and address comments * Dont extend etesetup * Make dumping data faster * cleanup * wip * Add back lost file * Cleanup * Write tests * numReadsPerThread -> numThreads * More write tests, cleanup, check style fixes * Refactor to avoid code duplication * Cleanup * cr comments * Small read/write after a rate-limited read/write * annoying no new linw at eof * Uniform parameters for hard limiting * Don't consume any estimated bytes for a _transaction or metadata table query * Add tests * cr comments

* Extremely basic QosServiceResource * Make resource an interface * Add client PathParam * Clean up javax.ws.rs dependencies * Create stub for AtlasDbQosClient * Calls to checkLimit use up a credit; throw when out of credits * Add QosServiceResourceImpl + test * AutoDelegate for Cassandra.Client * Rename QosService stuff * Pass AtlasDbQosClient to CassandraClient * Check limit on multiget_slice * Check limit on batch_mutate * Don't test we aren't soft-limited while we can never be soft-limited * Check limit on remaining CassandraClient methods * Scheduled refresh of AtlasDbQosClient.credits * Refresh every second Once we have configurable quotas on the QoS service, they will be more understandable (per second rather than per-10-seconds). * Mount qos-service on Timelock * Checkstyle * Update dependency locks * Dont throw limitExceededException * Move client param around * Comment * Qos Service config (#2644) * Service config * Allow clients to run without configuring limits * simpler tests * [QoS] qos ete test (#2652) * checkpoint * checkpoint * working test * check passing * unused deps * [QoS] rate limiter (#2653) * rate limiting * update license and docs * [QoS] Feature/qos client (#2650) * Create one qosCLient for each service QosClientBuilder hooked up to KVS create Create the QosClient in CassandraClientPoolImpl if the config is specified. Create FakeQosClient if the config is not specified Cleanup get broken tests to pass * Locks * Fix failing tests * Add getNamespace [no release notes] * Create QosClient at the Top level * fix test * test and checkstyle fixes * locks * deps * fix tests * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * [QoS] QosClient with ratelimiter (#2667) * QosClient with ratelimiter * Checkstyle * locks * [QoS] Create a jaxrs-client for the integ tests (#2675) * Create a jaxrs-client for the integ tests * build fix * clean up * Nziebart/merge develop into qos (#2683) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * qos rate limiting (#2709) * rate limiting * [QoS] total time spent talking to Cassandra (#2687) * total-time * [QoS] Client config (#2690) * qos config * respect max backoff itme * [QoS] [Refactor] Query Weights (#2697) * query weights * extra tests * [QoS] Number of rows per query (#2698) * num rows * checkstyle * fix tests * no int casting * fix numRows calculation on batch_mutate * [QoS] CAS metrics (#2705) * cas metrics * exceptions (#2706) * [QoS] Guava license (#2703) * guava license * Cleanup: class reference * [QoS] live reload (#2710) * live reload and logging * millis * checkpoint * fix tests * comments * checkstyle * [QoS] Don't rate limit CAS (#2711) * dont limit cas * Remove tests of deleted method * Cherrypick/qos exception mapping (#2715) * very simple ratelimitexceededexception * Need to be able to throw RLEE directly from Cass, rather than ADDE(RLEE)s * fix bug with ADDE(RLEE) * Exception Mapper * unravel a bad javadoc * CR comments part 1 * lock lock * split qos aware throwables * visibility * fix compile break * checkstyle * handle exceptions properly * [QoS] Estimate the number of read bytes w/ number of rows (#2717) * Refactor the name of the functions * Estimate based on the number of rows * Fix modifiers on ThriftQueryWeighers * Add unit tests to estimation logic * ThriftQueryWeighers.multigetSlice takes a List, not number of rows * getRangeSlices takes KeyRange, not count * weight estimates (#2725) * [QoS] Fix exceptions thrown on CqlExecutor (#2696) * Address #2683 comments * Clarify query and add cause * Add just the cqlQuery.queryFormat * checkstyle * Update test We changed the error message... * [QoS] Qos ete test (#2708) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * rate limiting * total-time * qos config * respect max backoff itme * query weights * extra tests * num rows * checkstyle * fix tests * no int casting * Qos ete tests * shouldFailIfWritingTooManyBytes * fix test * rm file * Remove metrics * Test shouldFailIfReadingTooManyBytes * canBeWritingLargeNumberOfBytesConcurrently * checkstyle * cannotWriteLargeNumberOfBytesConcurrently * fix tests * create tm in test * More read tests (after writing a lot of data at once) * WIP * Tests that should pas * Actually update the rate * Add another test * More tests and address comments * Dont extend etesetup * Make dumping data faster * cleanup * wip * Add back lost file * Cleanup * Write tests * numReadsPerThread -> numThreads * More write tests, cleanup, check style fixes * Refactor to avoid code duplication * Cleanup * cr comments * Small read/write after a rate-limited read/write * annoying no new linw at eof * Uniform parameters for hard limiting * [QoS] Fix/qos system table rate limiting (#2739) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success (#2630) * Fix SweepBatchConfig values to properly decrease to 1 with each failure and increase with each success * add logging when we stop reducing the batch size multiplier * further improve the tests * Allow sweep to recover faster after backing off. Before we would increase by 1% for each successive success, if we had reduced a value to 1 it would be 70 iterations before we got 2 and 700 iterations before we got back to 1000. Now we always 25 iterations with the lower batch size and then try increasing the rate by doubling each time. This means that when sweep has to back off it should speed up again quickly. * Use an AtomicInteger to handle concurrent updates * SweeperService logging improvements (#2618) * SweeperServiceImpl now logs when it starts sweeping make it clear if it is running full sweep or not * Added sweep parameters to the log lines * no longer default the service parameter in the interface, this way we can see when the parameter isn't provided and we are defaulting to true. Behaviour is unchanged but we can log a message when defaulting. * Refactor TracingKVS (#2643) * Wrap next() and hasNext() in traces * Use span names as safe * Remove iterator wrappings * checkstyle * refactor methods and remove misleading traces * Fix unit tests * release notes * Final nits * fix java arrays usage * Delete docs (#2657) * [20 minute tasks] Add test for when a batch is full (#2655) * [no release notes] Drive-by add test for when a batch is full * MetricRegistry log level downgrade + multiple timestamp tracker tests (#2636) * change metrics manager to warn plus log the metric name * more timestamp tracker tests * release notes * Extract interface for Cassandra client (#2660) * Create a CassandraClient * Propagate CassandraClient to all classes but CKVS * Use CassandraClient on CKVS * Propagate CassandraClient to remaining Impl classes * Use CassandraClient in tests * [no release notes] * client -> namespace [no release notes] (#2654) * 0.65.2 and 0.66.0 release notes (#2663) * Release notes banners * fix pr numbers * [QoS] Add getNamespace to AtlasDBConfig (#2661) * Add getNamespace [no release notes] * Timelock client config cannot be empty * Make it explicit that unspecified namespace is only possible for InMemoryKVS * CR comments * Live Reloading the TimeLock Block, Part 1: Pull to Push (#2621) * thoughts * More tests for RIH * Paranoid logging * statics * javadoc part 1 * polling refreshable * Unit tests * Remove the old RIH * lock lock * Tests that test how we deal with exceptions * logging * [no release notes] * CR comments part 1 * Make interval configurable * Standard nasty time edge cases * lastSeenValue does not need to be volatile * Live Reloading the TimeLock Block, Part 2: TransactionManagers Plumbing (#2622) * ServiceCreator.applyDynamic() * Propagate config through TMs * Json Serialization fixes * Some refactoring * lock/lock * Fixed checkstyle * CR comments part 1 * Switch to RPIH * add test * [no release notes] forthcoming in part 4 * checkstyle * [TTT] [no release notes] Document behaviour regarding index rows (#2658) * [no release notes] Document behaviour regarding index rows * fix compile bug * ``List`` * Refactor and Instrument CassandraClient api (#2665) * Sanitize Client API * Instrument CassandraClient * checkstyle * Address comment * [no release notes] * checkstyle * Fix cas * Live Reloading the TimeLock Block, Part 3: Working with 0 Nodes (#2647) * 0 nodes part 1 * add support for 0 servers in a ServerListConfig * extend deserialization tests * More tests * code defensively * [no release notes] defer to 2648 * Fixed CR nits * singleton server list * check immutable ts (#2406) * check immutable ts * checkstyle * release notes * Fix TM creation * checkstyle * Propagate top-level KVS method names to CassandraClient (#2669) * Propagate method names down to multiget_slice * Add the corresponding KVS method to remaining methods * Add TODO * [no release notes] * nit * Extract cql executor interface (#2670) * Instrument CqlExecutor * [no release notes] * bump awaitility (#2668) * Upgrade to newer Awaitility. * locks [no release notes] * unused import * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts (#2662) * Bump Atlas on Tritium 0.8.4 to fix dependency conflicts * Add changes into missing file * Doc changes * Exclude Tracing and HdrHistogram from Tritium dependencies * update locks * Add excluded dependencies explicitly * Fix merge conflict in relase notes * Uncomment dependencies * Regenerate locks * Correctly log Paxos events (#2674) * Log out Paxos values when recording Paxos events * Updated release notes * Checkstyle * Pull request number * Address comments * fix docs * Slow log and tracing (#2673) * Trace and instrument the thrift client * Instrument CqlExecutor * Fix metric names of IntrumentedCassandraClient * Fix nit * Also log internal table references * Checkstyle * simplify metric names * Address comments * add slow logging to the cassandra thrift client * add slow logging to cqlExecutor * fix typos * Add tracing to the CassandraClient * trace cqlExecutor queries * Add slow-logging in the CassandraClient * Delete InstrumentedCC and InstrumentedCqlExec * Fix small nits * Checkstyle * Add kvs method names to slow logs * Fix wrapping of exception * Extract CqlQuery * Move kvs-slow-log and tracing of CqlExecutor to CCI * Propagate execute_cql3_query api breaks * checkstyle * delete unused string * checkstyle * fix number of mutations on batch_mutate * some refactors * fix compile * Refactor cassandra client (#2676) * Extract TracingCassandraClient Extract ProfilingCassandraClient Move todos and some cleanup Cherry-pick QoS metrics to develop (#2679) * [QoS] Feature/qos meters (#2640) * Metrics for bytes and counts in each read/write * Refactors, dont throw if recordMetrics throws * Use meters instead of histograms * Multiget bytes * Batch mutate exact size * Cqlresult size * Calculate exact byte sizes for all thrift objects * tests and bugfixes - partial * More tests and bugs fixed * More tests and cr comments * byte buffer size * Remove register histogram * checkstyle * checkstyle * locks and license * Qos metrics CassandraClient * Exclude unused classes * fix cherry pick * use supplier for object size [no release notes] * fix merge in AtlasDbConfig * rate limiting * total-time * qos config * respect max backoff itme * query weights * extra tests * num rows * checkstyle * fix tests * no int casting * Qos ete tests * shouldFailIfWritingTooManyBytes * fix test * rm file * Remove metrics * Test shouldFailIfReadingTooManyBytes * canBeWritingLargeNumberOfBytesConcurrently * checkstyle * cannotWriteLargeNumberOfBytesConcurrently * fix tests * create tm in test * More read tests (after writing a lot of data at once) * WIP * Tests that should pas * Actually update the rate * Add another test * More tests and address comments * Dont extend etesetup * Make dumping data faster * cleanup * wip * Add back lost file * Cleanup * Write tests * numReadsPerThread -> numThreads * More write tests, cleanup, check style fixes * Refactor to avoid code duplication * Cleanup * cr comments * Small read/write after a rate-limited read/write * annoying no new linw at eof * Uniform parameters for hard limiting * Don't consume any estimated bytes for a _transaction or metadata table query * Add tests * cr comments * Merge develop to the feature branch (#2741) * Merge develop * Re-delete CqlQueryUtils * Nziebart/cell timestamps qos (#2745) * handle qos exceptions in cell timestamp loader [no release notes] * actually just remove checked exception * Remove the throws in the method signature * Differentiate between read and write limits when logging (#2751) * Differentiate between read and write limits when logging * Type -> name * Use longs in the rate limiter and handle negative adjustments. (#2758) * Differentiate between read and write limits when logging * handle negative adjustments * More tests * pr comments

hsaraogi assigned gsheasby Nov 6, 2017

hsaraogi requested a review from nziebart November 6, 2017 18:06

schlosna reviewed Nov 6, 2017

View reviewed changes

gsheasby reviewed Nov 6, 2017

View reviewed changes

nziebart reviewed Nov 6, 2017

View reviewed changes

gsheasby reviewed Nov 7, 2017

View reviewed changes

hsaraogi commented Nov 7, 2017

View reviewed changes

tboam reviewed Nov 8, 2017

View reviewed changes

hsaraogi added 2 commits November 8, 2017 17:45

Metrics for bytes and counts in each read/write

60e3593

Refactors, dont throw if recordMetrics throws

9756bcc

hsaraogi force-pushed the feature/qos-meters branch from 3752cef to 9756bcc Compare November 8, 2017 17:48

hsaraogi changed the title ~~Feature/qos meters~~ [QoS] Feature/qos meters Nov 8, 2017

hsaraogi added 5 commits November 8, 2017 18:16

Use meters instead of histograms

b548633

Multiget bytes

89c920a

Batch mutate exact size

06613a2

Cqlresult size

2f59382

Calculate exact byte sizes for all thrift objects

a94c331

tboam reviewed Nov 9, 2017

View reviewed changes

tboam approved these changes Nov 9, 2017

View reviewed changes

tests and bugfixes - partial

9cfbb96

nziebart approved these changes Nov 9, 2017

View reviewed changes

hsaraogi added 2 commits November 9, 2017 15:04

More tests and bugs fixed

dd73344

More tests and cr comments

de9dc41

hsaraogi force-pushed the feature/qos-meters branch from 8ac4698 to de9dc41 Compare November 10, 2017 18:37

byte buffer size

5414df9

hsaraogi force-pushed the feature/qos-meters branch from 95c315c to 5414df9 Compare November 10, 2017 18:52

hsaraogi and others added 5 commits November 10, 2017 18:54

Remove register histogram

2e262fe

Merge branch 'feature/qos-service-api' into feature/qos-meters

40e8ff6

checkstyle

d642c07

checkstyle

71d668d

locks and license

21125a6

nziebart merged commit 3d1f43f into feature/qos-service-api Nov 10, 2017

[QoS] Feature/qos meters #2640

[QoS] Feature/qos meters #2640

Conversation

hsaraogi commented Nov 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Nov 7, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tboam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tboam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tboam left a comment

Choose a reason for hiding this comment

nziebart left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsaraogi commented Nov 6, 2017 •

edited

Loading

codecov-io commented Nov 7, 2017 •

edited

Loading

nziebart left a comment •

edited

Loading