-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jsonnet for running loki using boltdb-shipper #2547
jsonnet for running loki using boltdb-shipper #2547
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2547 +/- ##
==========================================
- Coverage 62.87% 62.86% -0.01%
==========================================
Files 170 170
Lines 15049 15049
==========================================
- Hits 9462 9461 -1
- Misses 4826 4832 +6
+ Partials 761 756 -5
|
'boltdb.shipper.cache-location': '/data/boltdb-cache', | ||
|
||
// Disable chunk transfer | ||
'ingester.max-transfer-retries': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing i also had to add ingester.lifecycler.join_after: 0s
too. do we need it here ?
}, | ||
}, | ||
}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cleanliness i ended up removing some options from the table_manager_args
... in my case i was moving from bigtable so i had something like
table_manager_args:: std.mergePatch(
super.table_manager_args, {
'bigtable.grpc-client-rate-limit': null,
'bigtable.grpc-client-rate-limit-burst': null,
'bigtable.backoff-on-ratelimits': null,
'bigtable.table-cache.enabled': null,
}
),
not sure if it make sense or , since there is no bigtable enabled, it does not really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone cloud be using both(moving to or away from bigtable) and would have to add an override again. It is safe to have those extra flags even if you are not using bigtable. We can take care of it by refactoring the jsonnet to include them only when bigtable is one of the stores but it is time-consuming so it would not happen anytime soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, this is looking great. I left a bunch of suggestions, many of which are non blocking but I'd appreciate your thoughts on. I like how all the shipper functionality is in it's own file, but it'd be nice if we put it behind some if
statement guards to enable us to drop in this reference into our main loki definition such that users can toggle a few configs and have boltdb support without having to reference the new files themselves.
Looking forward to merging this 🎊
querier_pvc_size: '10Gi', | ||
compactor_pvc_size: '10Gi', | ||
|
||
boltdb_shipper_shared_store: error 'must define boltdb_shipper_shared_store', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be hidden behind a boltdb_shipper_enabled
variable inside _config
so that we can do things like
storage_config+: if $._config.boltdb_shipper_enabled then {
boltdb_shipper: {
shared_store: $._config.boltdb_shipper_shared_store,
},
} else {},
to avoid clutter in our configs
index_period_hours: 24, | ||
loki+: { | ||
chunk_store_config+: { | ||
write_dedupe_cache_config:: {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the dedupe cache is an empty config here. We should be able to leave it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do you mean we actually should not care because it will automatically disabled ? https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/#write-deduplication-disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not use that config so it was better to remove it instead of using the default which would have some values set. It avoids setting a wrong expectation who doesn't know much about it.
}, | ||
storage_config+: { | ||
boltdb_shipper: { | ||
shared_store: $._config.boltdb_shipper_shared_store, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be guarded by an if
clause for those who don't use the shipper.
'boltdb.shipper.cache-location': '/data/boltdb-cache', | ||
|
||
// Disable chunk transfer | ||
'ingester.max-transfer-retries': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to ensure 'ingester.max-transfer-retries': 0,
is specified in the config file as per our best practices rather than be a one-off thing in the args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running statefulsets, there is only one ingester which is being rolled out i.e there would always be only 1 ingeters going down and coming back up so it won't find another ingester to transfer chunks. I have added a comment for it with the requested changes.
|
||
// The ingesters should persist index files on a persistent | ||
// volume in order to be crash resilient. | ||
local ingester_data_pvc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to expose these as hidden fields rather than local variables such that they could be overridden/altered easily, for instance if someone wanted to change storage class names.
|
||
querier_args+:: { | ||
// Use PVC for caching | ||
'boltdb.shipper.cache-location': '/data/boltdb-cache', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be in the config file behind an if
flag
querier_service: | ||
$.util.serviceFor($.querier_statefulset), | ||
|
||
memcached_index_writes:: {}, // we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs to be behind an if
statement
memcached_index_writes:: {}, // we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it. | ||
|
||
// Use PVC for compactor instead of node disk. | ||
local compactor_data_pvc = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to expose these as hidden fields rather than local variables such that they could be overridden/altered easily, for instance if someone wanted to change storage class names.
pvc.mixin.spec.withAccessModes(['ReadWriteOnce']) + | ||
pvc.mixin.spec.withStorageClassName('fast'), | ||
|
||
compactor_args:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think we should prefer config file > args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have kept service-specific configs as args to avoid causing all the services to be rolled out un-necessarily due to config change and it lets us keep service-specific args not to be passed to every other service. I don't mind changing it but it would good to hear what do you think about it?
statefulSet.mixin.spec.updateStrategy.withType('RollingUpdate') + | ||
statefulSet.mixin.spec.template.spec.securityContext.withFsGroup(10001), // 10001 is the group ID assigned to Loki in the Dockerfile | ||
|
||
compactor_service: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the service used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not, I have removed it. Thanks!
d0970d8
to
509aef5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Thanks for the work on this.
What this PR does / why we need it:
Jsonnet for deploying Loki using
boltdb-shipper
. It also includes the new compactor service which helps dedupe and compact uploaded boltdb files to the store.