jsonnet for running loki using boltdb-shipper #2547

sandeepsukhani · 2020-08-25T07:19:59Z

What this PR does / why we need it:
Jsonnet for deploying Loki using boltdb-shipper. It also includes the new compactor service which helps dedupe and compact uploaded boltdb files to the store.

codecov-commenter · 2020-08-25T07:27:35Z

Codecov Report

Merging #2547 into master will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2547      +/-   ##
==========================================
- Coverage   62.87%   62.86%   -0.01%     
==========================================
  Files         170      170              
  Lines       15049    15049              
==========================================
- Hits         9462     9461       -1     
- Misses       4826     4832       +6     
+ Partials      761      756       -5

Impacted Files	Coverage Δ
pkg/canary/comparator/comparator.go	`78.43% <0.00%> (-2.36%)`	⬇️
pkg/promtail/targets/file/filetarget.go	`66.28% <0.00%> (-0.58%)`	⬇️
pkg/logql/evaluator.go	`92.88% <0.00%> (+0.40%)`	⬆️
pkg/promtail/targets/file/tailer.go	`75.00% <0.00%> (+4.16%)`	⬆️

primeroz · 2020-09-09T08:57:21Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+    'boltdb.shipper.cache-location': '/data/boltdb-cache',
+
+    // Disable chunk transfer
+    'ingester.max-transfer-retries': 0,


In my testing i also had to add ingester.lifecycler.join_after: 0s too. do we need it here ?

primeroz · 2020-09-09T09:10:21Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+      },
+    },
+  },
+


For cleanliness i ended up removing some options from the table_manager_args ... in my case i was moving from bigtable so i had something like

table_manager_args:: std.mergePatch( super.table_manager_args, { 'bigtable.grpc-client-rate-limit': null, 'bigtable.grpc-client-rate-limit-burst': null, 'bigtable.backoff-on-ratelimits': null, 'bigtable.table-cache.enabled': null, } ),

not sure if it make sense or , since there is no bigtable enabled, it does not really matter.

Someone cloud be using both(moving to or away from bigtable) and would have to add an override again. It is safe to have those extra flags even if you are not using bigtable. We can take care of it by refactoring the jsonnet to include them only when bigtable is one of the stores but it is time-consuming so it would not happen anytime soon.

Make sense. thanks

owen-d

Hey, this is looking great. I left a bunch of suggestions, many of which are non blocking but I'd appreciate your thoughts on. I like how all the shipper functionality is in it's own file, but it'd be nice if we put it behind some if statement guards to enable us to drop in this reference into our main loki definition such that users can toggle a few configs and have boltdb support without having to reference the new files themselves.

Looking forward to merging this 🎊

owen-d · 2020-09-10T12:37:58Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+    querier_pvc_size: '10Gi',
+    compactor_pvc_size: '10Gi',
+
+    boltdb_shipper_shared_store: error 'must define boltdb_shipper_shared_store',


I think this should be hidden behind a boltdb_shipper_enabled variable inside _config so that we can do things like

storage_config+: if $._config.boltdb_shipper_enabled then { boltdb_shipper: { shared_store: $._config.boltdb_shipper_shared_store, }, } else {},

to avoid clutter in our configs

owen-d · 2020-09-10T12:41:01Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+    index_period_hours: 24,
+    loki+: {
+      chunk_store_config+: {
+        write_dedupe_cache_config:: {},


Not sure why the dedupe cache is an empty config here. We should be able to leave it out.

Or do you mean we actually should not care because it will automatically disabled ? https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/#write-deduplication-disabled

We do not use that config so it was better to remove it instead of using the default which would have some values set. It avoids setting a wrong expectation who doesn't know much about it.

owen-d · 2020-09-10T12:41:51Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+      },
+      storage_config+: {
+        boltdb_shipper: {
+          shared_store: $._config.boltdb_shipper_shared_store,


This should also be guarded by an if clause for those who don't use the shipper.

owen-d · 2020-09-10T12:45:18Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+    'boltdb.shipper.cache-location': '/data/boltdb-cache',
+
+    // Disable chunk transfer
+    'ingester.max-transfer-retries': 0,


I think we should try to ensure 'ingester.max-transfer-retries': 0, is specified in the config file as per our best practices rather than be a one-off thing in the args.

When running statefulsets, there is only one ingester which is being rolled out i.e there would always be only 1 ingeters going down and coming back up so it won't find another ingester to transfer chunks. I have added a comment for it with the requested changes.

owen-d · 2020-09-10T12:53:22Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+
+  // The ingesters should persist index files on a persistent
+  // volume in order to be crash resilient.
+  local ingester_data_pvc =


It might be nice to expose these as hidden fields rather than local variables such that they could be overridden/altered easily, for instance if someone wanted to change storage class names.

owen-d · 2020-09-10T12:56:51Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+
+  querier_args+:: {
+    // Use PVC for caching
+    'boltdb.shipper.cache-location': '/data/boltdb-cache',


I think this should be in the config file behind an if flag

owen-d · 2020-09-10T13:02:51Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+  querier_service:
+    $.util.serviceFor($.querier_statefulset),
+
+  memcached_index_writes:: {}, // we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it.


This definitely needs to be behind an if statement

owen-d · 2020-09-10T13:03:33Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+  memcached_index_writes:: {}, // we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it.
+
+  // Use PVC for compactor instead of node disk.
+  local compactor_data_pvc =


It might be nice to expose these as hidden fields rather than local variables such that they could be overridden/altered easily, for instance if someone wanted to change storage class names.

owen-d · 2020-09-10T13:03:55Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+    pvc.mixin.spec.withAccessModes(['ReadWriteOnce']) +
+    pvc.mixin.spec.withStorageClassName('fast'),
+
+  compactor_args::


Again, I think we should prefer config file > args

I have kept service-specific configs as args to avoid causing all the services to be rolled out un-necessarily due to config change and it lets us keep service-specific args not to be passed to every other service. I don't mind changing it but it would good to hear what do you think about it?

owen-d · 2020-09-10T13:04:38Z

production/ksonnet/loki/boltdb_shipper.libsonnet

+    statefulSet.mixin.spec.updateStrategy.withType('RollingUpdate') +
+    statefulSet.mixin.spec.template.spec.securityContext.withFsGroup(10001),  // 10001 is the group ID assigned to Loki in the Dockerfile
+
+  compactor_service:


Is the service used?

I think not, I have removed it. Thanks!

owen-d

Looking good. Thanks for the work on this.

pull-request-size bot added the size/L label Aug 25, 2020

primeroz reviewed Sep 9, 2020

View reviewed changes

periklis mentioned this pull request Sep 10, 2020

Bump loki queriers to statefulsets to use pvc for sync'ing indices observatorium/observatorium#336

Merged

owen-d reviewed Sep 10, 2020

View reviewed changes

sandeepsukhani added 2 commits September 14, 2020 16:38

jsonnet for running loki using boltdb-shipper

8cc8630

changes suggested from PR review

509aef5

sandeepsukhani force-pushed the boltdb-shipper-jsonnet branch from d0970d8 to 509aef5 Compare September 14, 2020 11:08

owen-d approved these changes Sep 16, 2020

View reviewed changes

sandeepsukhani merged commit 1765d7f into grafana:master Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jsonnet for running loki using boltdb-shipper #2547

jsonnet for running loki using boltdb-shipper #2547

sandeepsukhani commented Aug 25, 2020

codecov-commenter commented Aug 25, 2020 •

edited

Loading

primeroz Sep 9, 2020

primeroz Sep 9, 2020

sandeepsukhani Sep 10, 2020

primeroz Sep 10, 2020

owen-d left a comment

owen-d Sep 10, 2020

owen-d Sep 10, 2020

primeroz Sep 10, 2020

sandeepsukhani Sep 14, 2020

owen-d Sep 10, 2020

owen-d Sep 10, 2020

sandeepsukhani Sep 14, 2020

owen-d Sep 10, 2020

owen-d Sep 10, 2020

owen-d Sep 10, 2020

owen-d Sep 10, 2020

owen-d Sep 10, 2020

sandeepsukhani Sep 14, 2020

owen-d Sep 10, 2020

sandeepsukhani Sep 14, 2020

owen-d left a comment

jsonnet for running loki using boltdb-shipper #2547

jsonnet for running loki using boltdb-shipper #2547

Conversation

sandeepsukhani commented Aug 25, 2020

codecov-commenter commented Aug 25, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

codecov-commenter commented Aug 25, 2020 •

edited

Loading