Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporter/prometheusremotewrite: add a Write-Ahead-Log #3017

Conversation

odeke-em
Copy link
Member

@odeke-em odeke-em commented Apr 25, 2021

Uses an off the shelf WAL implementation to add capabilities
to the Prometheus remote exporter using github.com/tidwall/wal.

By default the WAL will be on and to configure the WAL location,
please use this prometheusremotewrite exporter YAML configuration:

exporters:
  prometheusremotewrite:
    endpoint: "http://some.url:9411/api/prom/push"
    wal:
        directory: ./waldir/wal_directory
        truncate_frequency: 200s
        cache_size: 300

whose fields are quite similar to Prometheus' WAL fields per https://docs.google.com/document/d/1cCcoFgjDFwU2n823tKuMvrIhzHty4UDyn0IcfUHiyyI/edit#heading=h.mlf37ibqjgov

We are using an off-the-shelf WAL because:
By the time that we get OTLP metrics, we can convert to Prometheus proto,
but trying to implement the Prometheus storage interfaces would involve
either by-passing to-Prometheus-Proto and then save to Prometheus raw Go,
then retrieve Prometheus raw Go and then convert to Prometheus Proto.
It is much easier to get an off the shelf WAL implementation and add
it to the Prometheus implementation.

Fixes open-telemetry/prometheus-interoperability-spec#9

@odeke-em odeke-em requested a review from a team April 25, 2021 05:56
@odeke-em
Copy link
Member Author

Work in progress

@bogdandrutu
Copy link
Member

Converting to draft per author comment that this is WIP

go.mod Show resolved Hide resolved
@odeke-em
Copy link
Member Author

odeke-em commented Apr 26, 2021 via email

@odeke-em odeke-em force-pushed the exporter-prometheusremotewrite-add-WAL branch from c6ba14f to d9a6ab1 Compare April 27, 2021 00:08
@odeke-em odeke-em marked this pull request as ready for review April 27, 2021 00:09
@odeke-em
Copy link
Member Author

Alrighty, tests added and ready for action!

@odeke-em odeke-em requested a review from bogdandrutu April 27, 2021 04:39
@odeke-em
Copy link
Member Author

/cc @rakyll @alolita @Aneurysm9

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an initiative to add generic persistent queues to all exporters: https://docs.google.com/document/d/1Y4vNthCGdYI61ezeAzL5dXWgiZ73y9eSjIDitk3zXsU/edit#

I do not think we need to have a custom solution just for this exporter unless we believe it is really necessary due to this exporter having unique requirements (in which case I would like to know what they are).

@odeke-em
Copy link
Member Author

odeke-em commented May 2, 2021

There is an initiative to add generic persistent queues to all exporters: https://docs.google.com/document/d/1Y4vNthCGdYI61ezeAzL5dXWgiZ73y9eSjIDitk3zXsU/edit#

I do not think we need to have a custom solution just for this exporter unless we believe it is really necessary due to this exporter having unique requirements (in which case I would like to know what they are).

Thank you for the comment @tigrannajaryan! We talked about this a few times in a couple of meeting, both the Prometheus Working Group and the Agent working group. Basically we are taking a pragmatic approach to meet our deadlines, AWS want to get the capability in before a deadline that's coming close and whenever the generic WAL is added, the code can simply be moved.

exporter/prometheusremotewriteexporter/wal.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Show resolved Hide resolved
exporter/prometheusremotewriteexporter/README.md Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/README.md Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Show resolved Hide resolved
@odeke-em odeke-em force-pushed the exporter-prometheusremotewrite-add-WAL branch from c95cddb to 087e159 Compare May 5, 2021 19:50
@odeke-em odeke-em requested a review from tigrannajaryan May 5, 2021 20:11
@odeke-em
Copy link
Member Author

odeke-em commented May 5, 2021

Reviews addressed, please help me with another wave @rakyll @bogdandrutu @tigrannajaryan and thank you!

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can be merged if other reviewers also approve.

Copy link
Contributor

@rakyll rakyll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good, added a few comments. We can follow up with improvements.

exporter/prometheusremotewriteexporter/exporter.go Outdated Show resolved Hide resolved
exporter/prometheusremotewriteexporter/wal.go Show resolved Hide resolved
return prwe.wal.WriteBatch(batch)
}

func (prwe *PrwExporter) readFromWALThenExport(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we flushing the WAL at a Shutdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On shutdown, the WAL's .Close() method is invoked and that toggles a shutdown, before invoking this method in (*PrwExporter).turnOnWALIfEnabled(), we derive a cancellable context.Context and after the prwe.closeChan receives a value, it also invokes cancel() which ensures that this method will terminate alright.

@odeke-em odeke-em force-pushed the exporter-prometheusremotewrite-add-WAL branch from 080f211 to 19a51fc Compare May 7, 2021 22:28
@odeke-em odeke-em requested a review from tigrannajaryan May 7, 2021 22:32
@odeke-em
Copy link
Member Author

odeke-em commented May 7, 2021

@tigrannajaryan @rakyll @bogdandrutu addressed please take a look, thank you!

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase from main and make sure the build passes.

odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 14, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 14, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 16, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 16, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 16, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 16, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 16, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 16, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes #3727
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector that referenced this pull request Aug 22, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in PR open-telemetry#3597, which
was split off from PR open-telemetry#3017.

Note: there is very rare condition for which we can perhaps send the
same data many times and it can happen if we haven't yet truncated the
WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry#3017
Updates PR open-telemetry#3597
Fixes #3727
Fixes open-telemetry/prometheus-interoperability-spec#9
odeke-em added a commit to orijtech/opentelemetry-collector-contrib that referenced this pull request Sep 11, 2021
This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in
PR open-telemetry/opentelemetry-collector#3597, which
was split off from PR open-telemetry/opentelemetry-collector#3017.

Note: there is very rare condition for which we can perhaps send the
same data a couple of times, and it can happen if we haven't yet truncated
the WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry/opentelemetry-collector#3017
Updates PR open-telemetry/opentelemetry-collector#3597
Fixes #4751
Fixes open-telemetry/prometheus-interoperability-spec#9
jpkrohling added a commit to open-telemetry/opentelemetry-collector-contrib that referenced this pull request Mar 7, 2022
* exporter/prometheusremotewrite: glue up and use Write-Ahead-Log

This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in
PR open-telemetry/opentelemetry-collector#3597, which
was split off from PR open-telemetry/opentelemetry-collector#3017.

Note: there is very rare condition for which we can perhaps send the
same data a couple of times, and it can happen if we haven't yet truncated
the WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry/opentelemetry-collector#3017
Updates PR open-telemetry/opentelemetry-collector#3597
Fixes #4751
Fixes open-telemetry/prometheus-interoperability-spec#9

* fix go.mod

Signed-off-by: Anthony J Mirabella <[email protected]>

* exporter/prw: WALConfig should be exported to allow programmatic manipulation

Signed-off-by: Anthony J Mirabella <[email protected]>

* Add CHANGELOG entry

Signed-off-by: Anthony J Mirabella <[email protected]>

* fix lint error

Signed-off-by: Anthony J Mirabella <[email protected]>

* mod tidy

Signed-off-by: Anthony J Mirabella <[email protected]>

* tidy up WAL logic and comments

Signed-off-by: Anthony J Mirabella <[email protected]>

* Handle error from closing WAL

Signed-off-by: Anthony J Mirabella <[email protected]>

* Ensure WAL processor keeps running after error

Signed-off-by: Anthony J Mirabella <[email protected]>

* prwe/WAL: refactor WAL run loop

Signed-off-by: Anthony J Mirabella <[email protected]>

* make gotidy

Signed-off-by: Anthony J Mirabella <[email protected]>

* lint

Signed-off-by: Anthony J Mirabella <[email protected]>

* fix data races

Signed-off-by: Anthony J Mirabella <[email protected]>

* Ensure locking around entire WAL persistence routine

Signed-off-by: Anthony J Mirabella <[email protected]>

* Address PR feedback

Signed-off-by: Anthony J Mirabella <[email protected]>

* fix lint issues

Signed-off-by: Anthony J Mirabella <[email protected]>

* update read index inside WAL reader routine

Signed-off-by: Anthony J Mirabella <[email protected]>

* Update CHANGELOG.md

* Undo unrelated go.mod changes

Signed-off-by: Anthony J Mirabella <[email protected]>

Co-authored-by: Emmanuel T Odeke <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
foxlegend pushed a commit to foxlegend/opentelemetry-collector-contrib that referenced this pull request May 16, 2022
* exporter/prometheusremotewrite: glue up and use Write-Ahead-Log

This change completes the WAL capability for the Prometheus Remote-Write exporter
that allows recovery of data if the prior write requests weren't yet exported.

This wires up a Write-Ahead-Log (WAL) that was implemented in
PR open-telemetry/opentelemetry-collector#3597, which
was split off from PR open-telemetry/opentelemetry-collector#3017.

Note: there is very rare condition for which we can perhaps send the
same data a couple of times, and it can happen if we haven't yet truncated
the WAL, the RemoteWrite endpoint received the prior data but the process
received a Ctrl+C, kill signal. However, this will be very rare and
would have to be timed so fast and precisely.

Replaces PR open-telemetry/opentelemetry-collector#3017
Updates PR open-telemetry/opentelemetry-collector#3597
Fixes open-telemetry#4751
Fixes open-telemetry/prometheus-interoperability-spec#9

* fix go.mod

Signed-off-by: Anthony J Mirabella <[email protected]>

* exporter/prw: WALConfig should be exported to allow programmatic manipulation

Signed-off-by: Anthony J Mirabella <[email protected]>

* Add CHANGELOG entry

Signed-off-by: Anthony J Mirabella <[email protected]>

* fix lint error

Signed-off-by: Anthony J Mirabella <[email protected]>

* mod tidy

Signed-off-by: Anthony J Mirabella <[email protected]>

* tidy up WAL logic and comments

Signed-off-by: Anthony J Mirabella <[email protected]>

* Handle error from closing WAL

Signed-off-by: Anthony J Mirabella <[email protected]>

* Ensure WAL processor keeps running after error

Signed-off-by: Anthony J Mirabella <[email protected]>

* prwe/WAL: refactor WAL run loop

Signed-off-by: Anthony J Mirabella <[email protected]>

* make gotidy

Signed-off-by: Anthony J Mirabella <[email protected]>

* lint

Signed-off-by: Anthony J Mirabella <[email protected]>

* fix data races

Signed-off-by: Anthony J Mirabella <[email protected]>

* Ensure locking around entire WAL persistence routine

Signed-off-by: Anthony J Mirabella <[email protected]>

* Address PR feedback

Signed-off-by: Anthony J Mirabella <[email protected]>

* fix lint issues

Signed-off-by: Anthony J Mirabella <[email protected]>

* update read index inside WAL reader routine

Signed-off-by: Anthony J Mirabella <[email protected]>

* Update CHANGELOG.md

* Undo unrelated go.mod changes

Signed-off-by: Anthony J Mirabella <[email protected]>

Co-authored-by: Emmanuel T Odeke <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this pull request Apr 27, 2023
…#3017)

Bumps [setuptools](https://github.com/pypa/setuptools) from 67.6.1 to 67.7.1.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst)
- [Commits](pypa/setuptools@v67.6.1...v67.7.1)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support write-ahead log (WAL) capabilities similar to Prometheus server
7 participants