flush boltdb to object store #1837

sandeepsukhani · 2020-03-23T11:27:07Z

What this PR does / why we need it:
This can be useful for running Loki using just boltdb and any of the supported object stores.
Some details about implementation:

Files are stored in a folder per periodic table and are named after ingester and startup timestamp so that integers can keep pushing latest changes by overriding same files.
Flushes files every 15 mins and before shutting down to make index available to other services.
New stores can be implemented easily by implementing methods required by ObjectStore interface.
Ingesters will also query store when using boltdb. Limits query to only last 30 mins when flushing boltdb to store is enabled.
Queriers would keep syncing new and updated index files in a local folder. Files which are not used for a configured period would be cleanup up.

Checklist

Documentation added
Tests updated

pracucci · 2020-03-24T13:54:29Z

Have you considered naming the "archiver" as "shipper", keeping the same naming of Thanos / Cortex blocks storage?

cyriltovena · 2020-03-24T14:17:54Z

pkg/loki/modules.go

@@ -277,6 +292,17 @@ func (t *Loki) stopTableManager() error {
 }

 func (t *Loki) initStore() (err error) {
+	if ActiveIndexType(t.cfg.SchemaConfig) == "boltdb" && t.cfg.StorageConfig.BoltDBArchiverConfig.Enable {
+		t.cfg.StorageConfig.BoltDBArchiverConfig.IngesterName = t.cfg.Ingester.LifecyclerConfig.ID
+		if t.cfg.Target == Ingester {


How is this going to work in a single binary mode ? I think it would overrides one of them no ?

Active boltdb files are kept in a separate directory anyways. This is just an optimization to avoid downloading files unnecessarily.

I think you should use a switch here and include "all" target just for clarity of what is going on.

My concern is we don't know really which one is picked for all probably none so default to ArchiverModeReadWrite ?

Yes, it would be ArchiverModeReadWrite

cyriltovena · 2020-03-24T14:19:00Z

pkg/loki/modules.go

@@ -473,3 +499,16 @@ var modules = map[moduleName]module{
 		deps: []moduleName{Querier, Ingester, Distributor, TableManager},
 	},
 }
+
+// ActiveIndexType type returns index type which would be applicable to metrics that would be pushed starting now


Suggested change

// ActiveIndexType type returns index type which would be applicable to metrics that would be pushed starting now

// ActiveIndexType type returns index type which would be applicable to logs that would be pushed starting now

cyriltovena · 2020-03-24T14:20:27Z

pkg/loki/modules.go

+
+// ActiveIndexType type returns index type which would be applicable to metrics that would be pushed starting now
+// Note: Another periodic config can be applicable in future which can change index type
+func ActiveIndexType(cfg chunk.SchemaConfig) string {


I know you want to add tests later, and you should write a test for that function, doesn't look like this would change and it seems easy to write.

cyriltovena · 2020-03-24T14:21:10Z

pkg/storage/store.go

-	MaxChunkBatchSize int `yaml:"max_chunk_batch_size"`
+	storage.Config       `yaml:",inline"`
+	MaxChunkBatchSize    int                  `yaml:"max_chunk_batch_size"`
+	BoltDBArchiverConfig local.ArchiverConfig `yaml:"bolt_db_archiver_config"`


Suggested change

BoltDBArchiverConfig local.ArchiverConfig `yaml:"bolt_db_archiver_config"`

BoltDBArchiverConfig local.ArchiverConfig `yaml:"boltdb_archiver_config"`

cyriltovena · 2020-03-24T14:27:22Z

pkg/storage/store.go

@@ -4,6 +4,10 @@ import (
 	"context"
 	"flag"

+	"github.com/grafana/loki/pkg/storage/stores"


Something is wrong with the formatting of those 2 new packages. They should be with everything loki related.

If you're using vs-code I think there is a way to configure it so that it does it automatically. https://gist.github.com/cyriltovena/d52bf9ae05c371ea0c8018d48d15d6bf might help you.

cyriltovena · 2020-03-24T14:35:11Z

pkg/loki/modules.go

+	// We want ingester to also query the store when using boltdb
+	if ActiveIndexType(t.cfg.SchemaConfig) == "boltdb" {
+		t.cfg.Ingester.QueryStore = true
+		// When using archiver, limit max look back for query to MaxChunkAge + upload interval by archiver + 15 mins to query only data whose index is not pushed yet


Can you explain the why we need a look back parameter ?

In microservices mode it would query for data which ingester itself has flushed while in single binary mode it would query all the data for whole query duration. This would mean there would be duplicates. I just set lookback period just enough to not miss any data.
What do you think?

I think there are 2 use cases here around a filesystem and we should figure out what we want to support:

A shared filesystem (like SAN or NFS)

A non-shared filesystem (each instance running on it's own machine for example)

In the second use case each ingester would always be responsible for querying all of the data, but in the first case this would be treated just like s3 or gcs... i'm almost wondering if we should just define them as different store types to be able to use in the logic here for setting max QueryStoreMaxLookBack?

The other concern is when we are using a shared object store, how much duplicate data are we going to get from each ingester, and is there a way to limit them to only using the index files they generate when querying the store?

So there are 2 things here wrt storage, chunks and index files. I would say it is safe to assume chunks would always be in a shared filesystem otherwise queriers would not be able to fetch them. For index, it should not matter to differentiate because we anyways keep syncing the files and we just need to query ingesters for live data that they have not uploaded or just uploaded, which we already are doing here. I might be wrong here but does it make sense?

I'd like to support the case where chunks/index are not on a shared filesystem, I think we can do this easy enough if the ingester doesn't force this limited lookback but instead can do a lookback of all the data, WDYT?

It is doable but I think it would be better to have a config instead of inferring whether the storage is shared or not. We can expose store look back config in ingesters which would default to getting set like how we are doing now.

cyriltovena · 2020-03-24T14:37:05Z

pkg/storage/store.go

+			return err
+		}
+
+		storage.RegisterIndexClient("boltdb", func() (client chunk.IndexClient, e error) {


you can remove those named returns as they are not used.

cyriltovena · 2020-03-24T14:50:52Z

pkg/storage/stores/local/archiver.go

+	// ArchiverModeWriteOnly is to allow only write operations
+	ArchiverModeWriteOnly
+
+	ArchiverFileUploadInterval = 15 * time.Minute


Should have a comment for public const.

cyriltovena · 2020-03-24T14:55:20Z

pkg/storage/stores/local/archiver.go

+	f.BoolVar(&cfg.Enable, "boltdb.archiver.enable", false, "Enable archival of boltdb files to a store")
+	f.StringVar(&cfg.StoreConfig.Store, "boltdb.archiver.store", "filesystem", "Store for keeping boltdb files")
+	f.StringVar(&cfg.CacheLocation, "boltdb.archiver.cache-location", "", "Cache location for restoring boltDB files for queries")
+	f.DurationVar(&cfg.CacheTTL, "boltdb.archiver.cache-ttl", 24*time.Hour, "TTL for boltDB files restored in cache for queries")
+	f.DurationVar(&cfg.ResyncInterval, "boltdb.archiver.resync-interval", 5*time.Minute, "Resync downloaded files with the store")


Your only documentation is those flags and I think they need a bit more details. Unless you plan to add a full page of documentation.

I think I will have to add full page documentation about running it because we can't put too much in help text of flags.

I think it's a good call !

cyriltovena · 2020-03-24T17:37:42Z

pkg/ingester/ingester.go

+				return err
+			}
+
+			err = sendBatches(queryServer.Context(), itr, queryServer, req.Limit)


I have some concerns here.

1 - You're sending data directly and stats won't reflect that change, you need some refactoring to support sending with stats.
2 - Even if you do support stats, I'm not sure you can send the data from the store and the ingester via GRPC sequentially. I could be wrong but I think you're losing ordering by not considering and deduping line prior sending them. In over words there could be overlap of data.

I'm not 100% but you'll need to build tests for this. if you keep it that way.

I was under the impression that whatever duplicates would be there querier would dedup it. I will investigate more. I feel here order of entries is more important.

Querier dedupe per batches:

// NewQueryClientIterator returns an iterator over a QueryClient. func NewQueryClientIterator(client logproto.Querier_QueryClient, direction logproto.Direction) EntryIterator { return &queryClientIterator{ client: client, direction: direction, } } func (i *queryClientIterator) Next() bool { for i.curr == nil || !i.curr.Next() { batch, err := i.client.Recv() if err == io.EOF { return false } else if err != nil { i.err = err return false } i.curr = NewQueryResponseIterator(i.client.Context(), batch, i.direction) } return true }

Imagine the second batch contains half of the data of the first batch. You'll end up not deduping this data. You can use that iterator in a test to prove it's fine.

cyriltovena · 2020-03-24T18:45:34Z

pkg/storage/stores/local/boltdb_index_client.go

+	}
+
+	archiver, err := NewArchiver(archiverCfg, archiveStoreClient, func(name string, operation int) (db *bbolt.DB, e error) {
+		return boltDBIndexClient.GetDB(name, operation)


This is a good candidate for an interface. I think it make more sense then a function.

Sorry, I am not sure why we need an interface here? We would only ever have 1 implementation here and it looks ugly to pass boltDBIndexClient to Archiver which itself embeds the Archiver. Maybe I am wrong but it would be good to hear your thoughts.

type BoltDBGetter interface { GetLocal(name string, operation int) (*bbolt.DB,error) }

It will easier for you to work with. I don't have strong opinions on that one. I just like to use interface for decoupling. This is just a nit.

cyriltovena · 2020-03-24T18:48:50Z

pkg/storage/stores/local/archiver.go

+
+// uploadFile uploads one of the files locally written by ingesters to archive.
+func (a *Archiver) uploadFile(ctx context.Context, period string) error {
+	if a.cfg.Mode == ArchiverModeReadWrite {


There is a lot going in this file, I think you can break it into two file at least, one for uploading, one for downloading.

cyriltovena · 2020-03-24T19:31:18Z

pkg/storage/stores/local/archiver.go

+		return err
+	}
+
+	db, err := a.localBoltdbGetter(period, local.DBOperationRead)


I think ingester locally should also use the localdb for the store when you query the store for extra data that might not have been delivered yet to others. It looks like currently if we are a readwriter (single binary) archiver ingester will also return data from other ingester, this is not necessary and will generates a lot of duplicates.

see #1837 (comment) WDYT

I can't think of cleaner ways to limit ingesters from querying data which only it has generated in single binary mode. Do you have any thoughts?

cyriltovena

Definitively excited about the possibilities. I think you should create a cluster and run this in our environment there's a lot of questions around maintenance and upgrade path, I think you can speed things up by running your own cluster with canaries.

sandeepsukhani · 2020-03-25T02:54:04Z

Definitively excited about the possibilities. I think you should create a cluster and run this in our environment there's a lot of questions around maintenance and upgrade path, I think you can speed things up by running your own cluster with canaries.

Yes, we will run it when it gets in decent shape. Thanks for all the reviews!

pkg/ingester/ingester.go

cyriltovena · 2020-03-26T20:57:55Z

pkg/ingester/ingester.go

+				return err
+			}
+
+			itr.Push(storeItr)


Push advance the iterator.

I would create the heap at the endl have an array of var iters []iter.EntryIterator at the beginning. If the array has more than one item use that item otherwise use a NewHeapIterator.

cyriltovena · 2020-03-26T20:59:39Z

pkg/ingester/instance.go

-	defer helpers.LogError("closing iterator", iter.Close)
-
-	return sendBatches(ctx, iter, queryServer, req.Limit)
+	return iter.NewHeapIterator(ctx, iters, req.Direction), nil


To avoid multiple HeapIterator nesting, which can slow down execution, you could return an array of iterators here and apprends the store later if needed.

sandeepsukhani · 2020-04-13T07:04:16Z

Hey @periklis, I really appreciate your kind words and showing interest in this.
Regarding the dependencies, gossip support is coming soon which will reduce another external dependency 😄.
I will have a look and share the feedback if I have any.

While this is not yet battle-tested, query correctness and not losing any index is utmost important to us as well. We have been running this since last 2 weeks in an internal cluster beside another cluster without boltdb shipper. We are ensuring both of them get the same logs and I am going to write a tool which would query both the clusters, compare the results and push some metrics based on the results. I will let you know when it is ready so that you can also give it a try if you want.

Please feel free to reach out to me for any help/issues. We can always connect in #loki-dev grafana slack channel.

sandeepsukhani · 2020-04-13T07:23:41Z

Sorry I missed commenting on your question. See the attached image which shows files for the ongoing week. We are running 5 stateful ingesters so there is 1 boltdb file per ingester. We add a timestamp at the end of the filename to avoid someone not using k8s overwriting previous files after restarts and some other issues they can run into. When tables rotate again on Thursday we will have another folder with name loki_boltdb_shipper_index_2624 with 5 files again with the same naming convention.

If you are interested in reading the contents of the boltdb files then I think it would not be straight forward since index entries depend on schema version that you are using and you will have to import some of the code from loki repo to save some efforts.

cyriltovena · 2020-04-14T22:19:08Z

cmd/loki/loki-local-config.yaml

@@ -18,11 +18,11 @@ ingester:
 schema_config:


Let's revert that file. You can add a new file next time and add it to your global git gitignore.

cyriltovena · 2020-04-15T00:43:22Z

pkg/ingester/ingester.go

+		}
+
+		if start.Before(end) {
+			storeRequest := recreateRequestWithTime(req, start, end)


I was not clear, but my idea was there's a lot of if here and I don't think you have a test covering each of those.

I think you should create a function like this:

// buildStoreRequest returns a store request from a ingester request, return nit if no request should be made based on configuration. The request may be truncated due to QueryStoreMaxLookBackPeriod, explain QueryStoreMaxLookBackPeriod purpose here func buildStoreRequest(config Config,req *logproto.QueryRequest) (*logql.SelectParams,error)

This has the advantages of making it easier to test. Now you can write tests with a combination of requests and configs and set some expectations has to when a request will be fired and how/which.

Now usage wise it would look like this:

if req, err:= buildStoreRequest(req,i.cfg);err !=nil && req != nil { storeItr, err := i.store.LazyQuery(ctx,req) if err != nil { return err } itrs = append(itrs, storeItr) }

cyriltovena · 2020-04-15T01:03:22Z

pkg/loki/modules_test.go

@@ -35,3 +38,30 @@ func TestUniqueDeps(t *testing.T) {
 	expected := []moduleName{Server, Overrides, Distributor, Ingester}
 	assert.Equal(t, expected, uniqueDeps(input))
 }
+
+func TestActiveIndexType(t *testing.T) {
+	var cfg chunk.SchemaConfig


I think you can transform those into table tests with name.

Sorry, I am not sure I get it. Can you please elaborate more so that we don't have to come back again on this?

https://github.com/golang/go/wiki/TableDrivenTests

Ohh I didn't know the term. Thanks for the link!

cyriltovena · 2020-04-15T01:22:06Z

pkg/storage/store.go

+		// Note: We are assuming that user would never store chunks in table based store otherwise NewObjectClient would return an error.
+
+		// ToDo: Try passing on ObjectType from Cortex to the callback for creating custom index client.
+		boltdbShipperEncounter := 0


Yes we do need that to be in cortex. I'm fine with having this hack in the meantime. How do we know that the callback is called in the same order of the configs ?

We need some tests too.

cyriltovena · 2020-04-15T01:30:58Z

pkg/storage/store.go

@@ -56,6 +62,16 @@ func NewStore(cfg Config, storeCfg chunk.StoreConfig, schemaCfg chunk.SchemaConf
 	}, nil
 }

+// NewTableClient creates a TableClient for managing tables for index/chunk store.
+// ToDo: Add support in Cortex for registering custom table client like index client.
+func NewTableClient(name string, cfg Config) (chunk.TableClient, error) {


This is supported now I think.

It is not supported yet. We have updated cortex version in vendor to v1.0.0 while my changes are added after that. I will have to update it again to latest master. I think we should do a follow-up commit for that.

cyriltovena · 2020-04-15T17:37:18Z

pkg/storage/stores/factory.go

+)
+
+// NewObjectClient makes a new ObjectClient of the desired type.
+func NewObjectClient(storeType string, cfg storage.Config) (chunk.ObjectClient, error) {


Why don't you use the factory from cortex ? https://github.com/cortexproject/cortex/blob/8f59b141ac55e8884436047cddb93498234a11e4/pkg/chunk/storage/factory.go#L280 ?

This too needs updating cortex to latest master. I will open a PR to update it.

cyriltovena · 2020-04-17T02:57:01Z

pkg/storage/stores/local/uploads.go

+		return err
+	}
+
+	filePath := path.Join(snapshotPath, fmt.Sprintf("%s.%d", s.uploader, time.Now().Unix()))


tempfile might be a better name.

cyriltovena

LGTM

files are stored in folder per periodic table and are named after ingester flushed every 15 mins to make index available to other services files are also flushed before ingester stops to avoid any data loss new stores can be implemented easily ingester to also query store when using boltdb

…from periodic config, other refactorings

shapeofarchitect · 2020-06-17T18:40:52Z

@sandeepsukhani thank for this work, may I know if there are any plans to support indexing for azure tables similar to dynamodb , we have azure backend and that's primarily the requirement we are looking for too. Thank you

pull-request-size bot added the size/XL label Mar 23, 2020

sandeepsukhani force-pushed the flush-boltdb-to-object-store branch from 08985e8 to ae64e5f Compare March 23, 2020 13:29

cyriltovena reviewed Mar 24, 2020

View reviewed changes

sandeepsukhani force-pushed the flush-boltdb-to-object-store branch 2 times, most recently from c359313 to 4e660a9 Compare March 25, 2020 15:56

cyriltovena reviewed Mar 26, 2020

View reviewed changes

pkg/ingester/ingester.go Outdated Show resolved Hide resolved

cyriltovena reviewed Mar 26, 2020

View reviewed changes

sandeepsukhani force-pushed the flush-boltdb-to-object-store branch 3 times, most recently from 7c3ae3c to 8d09c40 Compare April 1, 2020 14:36

sandeepsukhani force-pushed the flush-boltdb-to-object-store branch 4 times, most recently from 795555c to 4ea0cb4 Compare April 11, 2020 16:19

sandeepsukhani force-pushed the flush-boltdb-to-object-store branch from 584f3be to c29759f Compare April 13, 2020 07:07

cyriltovena reviewed Apr 14, 2020

View reviewed changes

cyriltovena reviewed Apr 15, 2020

View reviewed changes

cyriltovena reviewed Apr 17, 2020

View reviewed changes

pull-request-size bot added size/XXL and removed size/XL labels Apr 17, 2020

sandeepsukhani mentioned this pull request Apr 17, 2020

update cortex to latest master #1956

Merged

cyriltovena approved these changes Apr 17, 2020

View reviewed changes

sandeepsukhani added 12 commits April 18, 2020 19:05

changes suggested from PR review

aa67421

refactored some code to fix minor issues

83a28a0

revert accidentally pushed change

a468597

persisting uploader name across restarts, detecting objectstore type …

e5f8199

…from periodic config, other refactorings

updated cli flag for active directory

2b1c530

add tests for boltdb shipper and update vendor

8348471

syncing boltdb files to disk during update

b58cfb3

sleep in tests to let mtime of boltdb file be changed

4c63204

changes suggested from PR review

567c70f

revert go mod changes for using custom branch for cortex

eb1381a

fix merge conflict

753f57c

sandeepsukhani force-pushed the flush-boltdb-to-object-store branch from 0a9aa10 to 753f57c Compare April 18, 2020 15:25

add documentation for boltdb shipper and remove unwanted config

cdbbec3

sandeepsukhani merged commit fad3b61 into grafana:master Apr 20, 2020

	// ActiveIndexType type returns index type which would be applicable to metrics that would be pushed starting now
	// ActiveIndexType type returns index type which would be applicable to logs that would be pushed starting now

	BoltDBArchiverConfig local.ArchiverConfig `yaml:"bolt_db_archiver_config"`
	BoltDBArchiverConfig local.ArchiverConfig `yaml:"boltdb_archiver_config"`

flush boltdb to object store #1837

flush boltdb to object store #1837

Conversation

sandeepsukhani commented Mar 23, 2020 • edited Loading

pracucci commented Mar 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena Mar 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandeepsukhani Mar 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

sandeepsukhani commented Mar 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandeepsukhani commented Apr 13, 2020

sandeepsukhani commented Apr 13, 2020

Choose a reason for hiding this comment

cyriltovena Apr 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena Apr 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena Apr 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

shapeofarchitect commented Jun 17, 2020

sandeepsukhani commented Mar 23, 2020 •

edited

Loading

cyriltovena Mar 24, 2020 •

edited

Loading

sandeepsukhani Mar 25, 2020 •

edited

Loading

cyriltovena Apr 15, 2020 •

edited

Loading

cyriltovena Apr 15, 2020 •

edited

Loading

cyriltovena Apr 15, 2020 •

edited

Loading