Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsonnet for running loki using boltdb-shipper #2547

Merged
merged 2 commits into from
Sep 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions production/ksonnet/loki/boltdb_shipper.libsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
{
local pvc = $.core.v1.persistentVolumeClaim,
local volumeMount = $.core.v1.volumeMount,
local container = $.core.v1.container,
local statefulSet = $.apps.v1.statefulSet,
local service = $.core.v1.service,
local containerPort = $.core.v1.containerPort,

_config+:: {
// run ingesters and queriers as statefulsets when using boltdb-shipper to avoid using node disk for storing the index.
stateful_ingesters: if self.using_boltdb_shipper then true else super.stateful_ingesters,
stateful_queriers: if self.using_boltdb_shipper then true else super.stateful_queriers,

boltdb_shipper_shared_store: error 'must define boltdb_shipper_shared_store',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be hidden behind a boltdb_shipper_enabled variable inside _config so that we can do things like

      storage_config+: if $._config.boltdb_shipper_enabled then {
        boltdb_shipper: {
          shared_store: $._config.boltdb_shipper_shared_store,
        },
      } else {},

to avoid clutter in our configs

compactor_pvc_size: '10Gi',
index_period_hours: if self.using_boltdb_shipper then 24 else super.index_period_hours,
loki+: if self.using_boltdb_shipper then {
chunk_store_config+: {
write_dedupe_cache_config:: {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the dedupe cache is an empty config here. We should be able to leave it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean we actually should not care because it will automatically disabled ? https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/#write-deduplication-disabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use that config so it was better to remove it instead of using the default which would have some values set. It avoids setting a wrong expectation who doesn't know much about it.

},
storage_config+: {
boltdb_shipper: {
shared_store: $._config.boltdb_shipper_shared_store,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be guarded by an if clause for those who don't use the shipper.

},
},
} else {},
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cleanliness i ended up removing some options from the table_manager_args ... in my case i was moving from bigtable so i had something like

  table_manager_args:: std.mergePatch(
    super.table_manager_args, {
      'bigtable.grpc-client-rate-limit': null,
      'bigtable.grpc-client-rate-limit-burst': null,
      'bigtable.backoff-on-ratelimits': null,
      'bigtable.table-cache.enabled': null,
    }
  ),

not sure if it make sense or , since there is no bigtable enabled, it does not really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone cloud be using both(moving to or away from bigtable) and would have to add an override again. It is safe to have those extra flags even if you are not using bigtable. We can take care of it by refactoring the jsonnet to include them only when bigtable is one of the stores but it is time-consuming so it would not happen anytime soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. thanks

ingester_args+:: if $._config.using_boltdb_shipper then {
// Persist index in pvc
'boltdb.shipper.active-index-directory': '/data/index',

// Use PVC for caching
'boltdb.shipper.cache-location': '/data/boltdb-cache',
} else {},

querier_args+:: if $._config.using_boltdb_shipper then {
// Use PVC for caching
'boltdb.shipper.cache-location': '/data/boltdb-cache',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in the config file behind an if flag

} else {},

// we don't dedupe index writes when using boltdb-shipper so don't deploy a cache for it.
memcached_index_writes:: if $._config.using_boltdb_shipper then {} else self.memcached_index_writes,

// Use PVC for compactor instead of node disk.
compactor_data_pvc:: if $._config.using_boltdb_shipper then
pvc.new('compactor-data') +
pvc.mixin.spec.resources.withRequests({ storage: $._config.compactor_pvc_size }) +
pvc.mixin.spec.withAccessModes(['ReadWriteOnce']) +
pvc.mixin.spec.withStorageClassName('fast')
else {},

compactor_args:: if $._config.using_boltdb_shipper then {
'config.file': '/etc/loki/config/config.yaml',
'boltdb.shipper.compactor.working-directory': '/data/compactor',
'boltdb.shipper.compactor.shared-store': $._config.boltdb_shipper_shared_store,
target: 'compactor',
} else {},

local compactor_ports =
[
containerPort.new(name='http-metrics', port=$._config.http_listen_port),
],

compactor_container:: if $._config.using_boltdb_shipper then
container.new('compactor', $._images.compactor) +
container.withPorts(compactor_ports) +
container.withArgsMixin($.util.mapToFlags($.compactor_args)) +
container.withVolumeMountsMixin([volumeMount.new('compactor-data', '/data')]) +
container.mixin.readinessProbe.httpGet.withPath('/ready') +
container.mixin.readinessProbe.httpGet.withPort($._config.http_listen_port) +
container.mixin.readinessProbe.withTimeoutSeconds(1) +
$.util.resourcesRequests('4', '2Gi')
else {},

compactor_statefulset: if $._config.using_boltdb_shipper then
statefulSet.new('compactor', 1, [$.compactor_container], $.compactor_data_pvc) +
statefulSet.mixin.spec.withServiceName('compactor') +
$.config_hash_mixin +
$.util.configVolumeMount('loki', '/etc/loki/config') +
statefulSet.mixin.spec.updateStrategy.withType('RollingUpdate') +
statefulSet.mixin.spec.template.spec.securityContext.withFsGroup(10001) // 10001 is the group ID assigned to Loki in the Dockerfile
else {}
}
9 changes: 9 additions & 0 deletions production/ksonnet/loki/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,15 @@

grpc_server_max_msg_size: 100 << 20, // 100MB

// flag for tuning things when boltdb-shipper is current or upcoming index type.
using_boltdb_shipper: false,

// flags for running ingesters/queriers as a statefulset instead of deployment type.
stateful_ingesters: false,
ingester_pvc_size: '5Gi',

stateful_queriers: false,
querier_pvc_size: '10Gi',

querier: {
// This value should be set equal to (or less than) the CPU cores of the system the querier runs.
Expand Down
1 change: 1 addition & 0 deletions production/ksonnet/loki/images.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@
tableManager: self.loki,
query_frontend: self.loki,
ruler: self.loki,
compactor: self.loki,
},
}
45 changes: 40 additions & 5 deletions production/ksonnet/loki/ingester.libsonnet
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
{
local container = $.core.v1.container,
local pvc = $.core.v1.persistentVolumeClaim,
local volumeMount = $.core.v1.volumeMount,
local statefulSet = $.apps.v1.statefulSet,

ingester_args::
$._config.commonArgs {
target: 'ingester',
},
} + if $._config.stateful_ingesters then
{
// Disable chunk transfer when using statefulset since ingester which is going down won't find another
// ingester which is joining the ring for transferring chunks.
'ingester.max-transfer-retries': 0,
} else {},

ingester_container::
container.new('ingester', $._images.ingester) +
Expand All @@ -15,13 +23,17 @@
container.mixin.readinessProbe.withInitialDelaySeconds(15) +
container.mixin.readinessProbe.withTimeoutSeconds(1) +
$.util.resourcesRequests('1', '5Gi') +
$.util.resourcesLimits('2', '10Gi'),
$.util.resourcesLimits('2', '10Gi') +
if $._config.stateful_ingesters then
container.withVolumeMountsMixin([
volumeMount.new('ingester-data', '/data'),
]) else {},

local deployment = $.apps.v1.deployment,

local name = 'ingester',

ingester_deployment:
ingester_deployment: if !$._config.stateful_ingesters then
deployment.new(name, 3, [$.ingester_container]) +
$.config_hash_mixin +
$.util.configVolumeMount('loki', '/etc/loki/config') +
Expand All @@ -30,10 +42,33 @@
deployment.mixin.spec.withMinReadySeconds(60) +
deployment.mixin.spec.strategy.rollingUpdate.withMaxSurge(0) +
deployment.mixin.spec.strategy.rollingUpdate.withMaxUnavailable(1) +
deployment.mixin.spec.template.spec.withTerminationGracePeriodSeconds(4800),
deployment.mixin.spec.template.spec.withTerminationGracePeriodSeconds(4800)
else {},

ingester_data_pvc:: if $._config.stateful_ingesters then
pvc.new('ingester-data') +
pvc.mixin.spec.resources.withRequests({ storage: '10Gi' }) +
pvc.mixin.spec.withAccessModes(['ReadWriteOnce']) +
pvc.mixin.spec.withStorageClassName('fast')
else {},

ingester_statefulset: if $._config.stateful_ingesters then
statefulSet.new('ingester', 3, [$.ingester_container], $.ingester_data_pvc) +
statefulSet.mixin.spec.withServiceName('ingester') +
$.config_hash_mixin +
$.util.configVolumeMount('loki', '/etc/loki/config') +
$.util.configVolumeMount('overrides', '/etc/loki/overrides') +
$.util.antiAffinity +
statefulSet.mixin.spec.updateStrategy.withType('RollingUpdate') +
statefulSet.mixin.spec.template.spec.securityContext.withFsGroup(10001) + // 10001 is the group ID assigned to Loki in the Dockerfile
statefulSet.mixin.spec.template.spec.withTerminationGracePeriodSeconds(4800)
else {},

ingester_service:
$.util.serviceFor($.ingester_deployment),
if !$._config.stateful_ingesters then
$.util.serviceFor($.ingester_deployment)
else
$.util.serviceFor($.ingester_statefulset),

local podDisruptionBudget = $.policy.v1beta1.podDisruptionBudget,

Expand Down
3 changes: 3 additions & 0 deletions production/ksonnet/loki/loki.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,8 @@
(import 'query-frontend.libsonnet') +
(import 'ruler.libsonnet') +

// BoltDB Shipper support
(import 'boltdb_shipper.libsonnet') +

// Supporting services
(import 'memcached.libsonnet')
38 changes: 34 additions & 4 deletions production/ksonnet/loki/querier.libsonnet
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
{
local container = $.core.v1.container,
local pvc = $.core.v1.persistentVolumeClaim,
local volumeMount = $.core.v1.volumeMount,
local statefulSet = $.apps.v1.statefulSet,

querier_args::
$._config.commonArgs {
Expand All @@ -14,17 +17,44 @@
container.mixin.readinessProbe.httpGet.withPort($._config.http_listen_port) +
container.mixin.readinessProbe.withInitialDelaySeconds(15) +
container.mixin.readinessProbe.withTimeoutSeconds(1) +
$.util.resourcesRequests('4', '2Gi'),
$.util.resourcesRequests('4', '2Gi') +
if $._config.stateful_queriers then
container.withVolumeMountsMixin([
volumeMount.new('querier-data', '/data'),
]) else {},

local deployment = $.apps.v1.deployment,

querier_deployment:
querier_deployment: if !$._config.stateful_queriers then
deployment.new('querier', 3, [$.querier_container]) +
$.config_hash_mixin +
$.util.configVolumeMount('loki', '/etc/loki/config') +
$.util.configVolumeMount('overrides', '/etc/loki/overrides') +
$.util.antiAffinity,
$.util.antiAffinity
else {},

// PVC for queriers when running as statefulsets
querier_data_pvc:: if $._config.stateful_queriers then
pvc.new('querier-data') +
pvc.mixin.spec.resources.withRequests({ storage: $._config.querier_pvc_size }) +
pvc.mixin.spec.withAccessModes(['ReadWriteOnce']) +
pvc.mixin.spec.withStorageClassName('fast')
else {},

querier_statefulset: if $._config.stateful_queriers then
statefulSet.new('querier', 3, [$.querier_container], $.querier_data_pvc) +
statefulSet.mixin.spec.withServiceName('querier') +
$.config_hash_mixin +
$.util.configVolumeMount('loki', '/etc/loki/config') +
$.util.configVolumeMount('overrides', '/etc/loki/overrides') +
$.util.antiAffinity +
statefulSet.mixin.spec.updateStrategy.withType('RollingUpdate') +
statefulSet.mixin.spec.template.spec.securityContext.withFsGroup(10001) // 10001 is the group ID assigned to Loki in the Dockerfile
else {},

querier_service:
$.util.serviceFor($.querier_deployment),
if !$._config.stateful_queriers then
$.util.serviceFor($.querier_deployment)
else
$.util.serviceFor($.querier_statefulset),
}