Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds WAL support (experimental) #2981

Merged
merged 58 commits into from
Nov 27, 2020
Merged

Adds WAL support (experimental) #2981

merged 58 commits into from
Nov 27, 2020

Conversation

owen-d
Copy link
Member

@owen-d owen-d commented Nov 23, 2020

This is intended as an intermediate PR before marking the WAL as GA. It exposes WAL configurations and defaults them to false. This is not expected to be included in a release yet as there are a few other things which need fixing:

  1. The WAL has exposed some existing race conditions in the ingester by making them more likely. Notably there are now five places where stream chunks are read or edited:
  • During writes
  • During reads
  • During transfers
  • During Flushes
  • During WAL checkpointing
    Fortunately, these race conditions are (a) unlikely and (b) dataloss is nullified by the WAL. A follow up PR will introduce concurrency controls around this.
  1. Further investigation into the dedupe ratio with the WAL enabled.

In order to reduce cognitive load during review, I'm submitting this initial PR which will be built upon by another.

done chan struct{}
}

func newIngesterSeriesIter(ing *Ingester) *ingesterSeriesIter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should abstract away with a interface defined here the ingester. This way you clearly show what you need and reduce the coupling.

This has also the advantages of making testing easier which I think you don't have on the Iter() part.

It might requires ingester code refactoring to properly expose what you need so we can keep this for later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface is defined as

type SeriesIter interface {
	Num() int
	Iter() <-chan *SeriesWithErr
	Stop()
}

here:

type SeriesIter interface {
Num() int
Iter() <-chan *SeriesWithErr
Stop()
}

Do you mean something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I was talking about the Ingester passed in as Parameter in the new here.


for _, stream := range streams {
// TODO(owen-d): use a pool
chunks, err := toWireChunks(stream.chunks, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You most likely have a race here:

  • 1 the list of streams can change
  • 2 the list of chunks too.

You should write a test that run 2 goroutines, one that push entries to ingester and one that read this iterator, then using go test -race.

This is definitively tricky cause you want a snapshot of chunks but you're using a chanel which is driven by the reader and so in-between reads things can change.

I feel like you should either lock everything until iteration is over which seems dangerous or return an arrays of SeriesWithErr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is/will be covered by two things:

  1. Streams are buffered to an intermediate channel during checkpoint start.
    This helps synchronize the perSeriesDuration with a list of chunks and avoids the changing problem. The caveat is an old stream may be kept around for up to the checkpoint interval (default 5m).

    // Need to buffer streams internally so the read lock isn't held trying to write to a blocked channel.
    streams := make([]*stream, 0, len(inst.streams))
    inst.streamsMtx.RUnlock()
    _ = inst.forAllStreams(func(stream *stream) error {
    streams = append(streams, stream)
    return nil
    })

    // Give a 10% buffer to the checkpoint duration in order to account for
    // new series, slow writes, etc.
    perSeriesDuration := (90 * c.dur) / (100 * time.Duration(n))
    ticker := time.NewTicker(perSeriesDuration)

  2. The locking is mainly covered in a followup PR I'm preparing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a concurrent access tests in your following PR.

}

for _, ref := range r.RefEntries {
recordPool.PutEntries(ref.Entries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit or for later. This should be a leaky bucket. like

BytesBufferPool = pool.New(1<<9, 1<<13, 2, func(size int) interface{} { return make([]byte, 0, size) })

pkg/ingester/wal.go Outdated Show resolved Hide resolved
pkg/ingester/stream.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty happy with this PR state, this is very good work @owen-d .

My only concern is the race raised in the Iter() path code. I'd like to see something that avoids it.

uses entry pool in stream push/tailer

removes unnecessary pool interaction

checkpointbytes comment

fillchunk helper, record resetting in tests via pool

redundant comment

defers wg done in recovery

s/num/count/

checkpoint wal uses a logger

encodeWithTypeHeader now creates its own []byte

removes pool from decodeEntries

wal stop can error
pkg/ingester/stream.go Outdated Show resolved Hide resolved
pkg/ingester/stream.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@owen-d owen-d merged commit 4d9865a into grafana:master Nov 27, 2020
@owen-d owen-d deleted the wal/headblock branch November 27, 2020 14:43
@lookfirst
Copy link

@owen-d

This broke your documentation: https://grafana.com/docs/loki/latest/installation/local/

failed parsing config: /opt/loki/loki-local-config.yaml: yaml: unmarshal errors:
  line 7: field wal not found in type ingester.Config

@jlj77
Copy link

jlj77 commented Feb 3, 2021

@owen-d
This broke your documentation: https://grafana.com/docs/loki/latest/installation/local/

failed parsing config: /opt/loki/loki-local-config.yaml: yaml: unmarshal errors:
  line 7: field wal not found in type ingester.Config

Confirmed.

Service came up fine with the following lines commented out of the config file:

...
ingester:
#  wal:
#    enabled: true
#    dir: /tmp/wal
#    recover: true
  lifecycler:
    address: 127.0.0.1
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants