Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonnet for running loki using boltdb-shipper #2547

Merged
merged 2 commits into from
Sep 17, 2020

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
Jsonnet for deploying Loki using boltdb-shipper. It also includes the new compactor service which helps dedupe and compact uploaded boltdb files to the store.

@codecov-commenter
Copy link

codecov-commenter commented Aug 25, 2020

Codecov Report

Merging #2547 into master will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2547      +/-   ##
==========================================
- Coverage   62.87%   62.86%   -0.01%     
==========================================
  Files         170      170              
  Lines       15049    15049              
==========================================
- Hits         9462     9461       -1     
- Misses       4826     4832       +6     
+ Partials      761      756       -5     
Impacted Files Coverage Δ
pkg/canary/comparator/comparator.go 78.43% <0.00%> (-2.36%) ⬇️
pkg/promtail/targets/file/filetarget.go 66.28% <0.00%> (-0.58%) ⬇️
pkg/logql/evaluator.go 92.88% <0.00%> (+0.40%) ⬆️
pkg/promtail/targets/file/tailer.go 75.00% <0.00%> (+4.16%) ⬆️

'boltdb.shipper.cache-location': '/data/boltdb-cache',

// Disable chunk transfer
'ingester.max-transfer-retries': 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing i also had to add ingester.lifecycler.join_after: 0s too. do we need it here ?

},
},
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cleanliness i ended up removing some options from the table_manager_args ... in my case i was moving from bigtable so i had something like

  table_manager_args:: std.mergePatch(
    super.table_manager_args, {
      'bigtable.grpc-client-rate-limit': null,
      'bigtable.grpc-client-rate-limit-burst': null,
      'bigtable.backoff-on-ratelimits': null,
      'bigtable.table-cache.enabled': null,
    }
  ),

not sure if it make sense or , since there is no bigtable enabled, it does not really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone cloud be using both(moving to or away from bigtable) and would have to add an override again. It is safe to have those extra flags even if you are not using bigtable. We can take care of it by refactoring the jsonnet to include them only when bigtable is one of the stores but it is time-consuming so it would not happen anytime soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. thanks

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this is looking great. I left a bunch of suggestions, many of which are non blocking but I'd appreciate your thoughts on. I like how all the shipper functionality is in it's own file, but it'd be nice if we put it behind some if statement guards to enable us to drop in this reference into our main loki definition such that users can toggle a few configs and have boltdb support without having to reference the new files themselves.

Looking forward to merging this 🎊

querier_pvc_size: '10Gi',
compactor_pvc_size: '10Gi',

boltdb_shipper_shared_store: error 'must define boltdb_shipper_shared_store',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be hidden behind a boltdb_shipper_enabled variable inside _config so that we can do things like

      storage_config+: if $._config.boltdb_shipper_enabled then {
        boltdb_shipper: {
          shared_store: $._config.boltdb_shipper_shared_store,
        },
      } else {},

to avoid clutter in our configs

index_period_hours: 24,
loki+: {
chunk_store_config+: {
write_dedupe_cache_config:: {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the dedupe cache is an empty config here. We should be able to leave it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean we actually should not care because it will automatically disabled ? https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/#write-deduplication-disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use that config so it was better to remove it instead of using the default which would have some values set. It avoids setting a wrong expectation who doesn't know much about it.

},
storage_config+: {
boltdb_shipper: {
shared_store: $._config.boltdb_shipper_shared_store,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be guarded by an if clause for those who don't use the shipper.

'boltdb.shipper.cache-location': '/data/boltdb-cache',

// Disable chunk transfer
'ingester.max-transfer-retries': 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to ensure 'ingester.max-transfer-retries': 0, is specified in the config file as per our best practices rather than be a one-off thing in the args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running statefulsets, there is only one ingester which is being rolled out i.e there would always be only 1 ingeters going down and coming back up so it won't find another ingester to transfer chunks. I have added a comment for it with the requested changes.


// The ingesters should persist index files on a persistent
// volume in order to be crash resilient.
local ingester_data_pvc =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to expose these as hidden fields rather than local variables such that they could be overridden/altered easily, for instance if someone wanted to change storage class names.


querier_args+:: {
// Use PVC for caching
'boltdb.shipper.cache-location': '/data/boltdb-cache',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in the config file behind an if flag

querier_service:
$.util.serviceFor($.querier_statefulset),

memcached_index_writes:: {}, // we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs to be behind an if statement

memcached_index_writes:: {}, // we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it.

// Use PVC for compactor instead of node disk.
local compactor_data_pvc =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to expose these as hidden fields rather than local variables such that they could be overridden/altered easily, for instance if someone wanted to change storage class names.

pvc.mixin.spec.withAccessModes(['ReadWriteOnce']) +
pvc.mixin.spec.withStorageClassName('fast'),

compactor_args::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think we should prefer config file > args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept service-specific configs as args to avoid causing all the services to be rolled out un-necessarily due to config change and it lets us keep service-specific args not to be passed to every other service. I don't mind changing it but it would good to hear what do you think about it?

statefulSet.mixin.spec.updateStrategy.withType('RollingUpdate') +
statefulSet.mixin.spec.template.spec.securityContext.withFsGroup(10001), // 10001 is the group ID assigned to Loki in the Dockerfile

compactor_service:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the service used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, I have removed it. Thanks!

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for the work on this.

@sandeepsukhani sandeepsukhani merged commit 1765d7f into grafana:master Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants