Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporterhelper should reject a queue size of 0 #4584

Closed
mx-psi opened this issue Dec 20, 2021 · 26 comments · Fixed by #6545
Closed

exporterhelper should reject a queue size of 0 #4584

mx-psi opened this issue Dec 20, 2021 · 26 comments · Fixed by #6545
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@mx-psi
Copy link
Member

mx-psi commented Dec 20, 2021

Describe the bug

exporterhelper allows for a QueueSize of 0, which results in a permanently blocked exporter that drops all data (see open-telemetry/opentelemetry-collector-contrib#6718 (comment)).

Steps to reproduce

Create a pipeline with an exporter that exposes the QueueSettings struct and set the queue size to 0.

What did you expect to see?

The Collector would fail to start and provide a descriptive error.

What did you see instead?

The Collector fails each time a payload gets to the exporter with a message similar to:

exporterhelper/queued_retry.go:99       Dropping data because sending_queue is full. Try increasing queue_size.   {"kind": "exporter", "name": "otlp", "dropped_items": 3}

What version did you use?
Version: (e.g., v0.4.0, 1eb551b, etc)

Latest commit reproduces 81ab024

What config did you use?
Config: (e.g. the yaml config file)

Example config with contrib distro
receivers:
  hostmetrics:
    scrapers:
      load:
    collection_interval: 2s

exporters:
  otlp:
    endpoint: 0.0.0.0:4317
    sending_queue:
      queue_size: 0

  
service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [otlp]

Environment

not relevant

@mx-psi mx-psi added the bug Something isn't working label Dec 20, 2021
@DiptoChakrabarty
Copy link

@mx-psi I would like to work on this , as I can understand from the codebase the error occurs here

if !qrs.queue.Produce(req) {
qrs.logger.Error(
"Dropping data because sending_queue is full. Try increasing queue_size.",
zap.Int("dropped_items", req.count()),
)
span.AddEvent("Dropped item, sending_queue is full.", trace.WithAttributes(qrs.traceAttributes...))
return errSendingQueueIsFull

I was thinking of adding a condition to check if queue size is set to 0 and returning the relevant error message else returning the error already present in the code

@jpkrohling jpkrohling added the good first issue Good for newcomers label Jan 3, 2022
@bogdandrutu
Copy link
Member

@DiptoChakrabarty I think we should fail fast, which means we should add a "Validate" on the ExporterSettings and ask users to call that when they embed the config. This change may take some time to "propagate" to all users, so in the meantime we can probably return an error in "Start" for couple of versions.

@mx-psi
Copy link
Member Author

mx-psi commented Jan 10, 2022

I agree with Bogdan's suggested approach @DiptoChakrabarty

@FreakyNobleGas
Copy link
Contributor

Hey, I'm new to contributing and noticed this issue was starting to go stale. I would like to take a stab at it if that's alright.

I've setup my environment locally, and wanted to confirm that I am looking at the correct location for the start function, is this correct?

Also, I noticed that in the Prometheus exporter where @mx-psi made a similar fix, we are checking for negative values, I don't believe that has been implemented for the OTLP exporter.

@mx-psi
Copy link
Member Author

mx-psi commented Feb 2, 2022

Hey @FreakyNobleGas,

I would like to take a stab at it if that's alright.

sure, go ahead :)

I've setup my environment locally, and wanted to confirm that I am looking at the correct location for the start function, is this correct?

If we were to check this at startup, I think that would be the right spot. However, as Bogdan pointed above, a better approach is to add a Validate() error method to the QueueSettings struct. This method would then be called by individual exporters that use the QueueSettings in their Validate method (e.g. see here the OTLP exporter Validate method). The advantage of this is that we can do a 'dry run' to check configuration without actually running the Collector.

So I believe the plan is to:

  1. open a first PR to add the Validate method to QueueSettings and maybe add it to the opentelemetry-collector components.
  2. open other PRs on the opentelemetry-collector-contrib to call the Validate method on every component from contrib that uses the QueueSettings struct.

Does that make sense?

Also, I noticed that in the Prometheus exporter where @mx-psi made a similar fix, we are checking for negative values, I don't believe that has been implemented for the OTLP exporter.

We can also check for negative values here, they are also nonsensical.

@FreakyNobleGas
Copy link
Contributor

That makes sense to me and I really appreciate the guidance here. I'll start working on this. :)

@TylerHelmuth
Copy link
Member

Can this issue be closed?

@mx-psi
Copy link
Member Author

mx-psi commented Mar 22, 2022

@TylerHelmuth While the Validate method is available on the QueueSettings, we still need to call QueueSettings.Validate on each Config.Validate method on opentelemetry-collector and opentelemetry-collector-contrib. This should be quite easy if you want to give it a try!

@TylerHelmuth
Copy link
Member

@mx-psi yes, but I've got some other items to complete first. Once I've got those done I'll check back and if its still up for grabs I'll take it.

@bogdandrutu
Copy link
Member

bogdandrutu commented Mar 23, 2022

@mx-psi. I think we should use reflection in the Config.Validation and look for all members that implement Validate and call it?

See https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configtest/configtest.go#L44 where we check for tags, but we can have the same logic for validate.

@mx-psi
Copy link
Member Author

mx-psi commented Mar 23, 2022

I think we should use reflection in the Config.Validation and look for all members that implement Validate and call it?

This sounds like a good idea, but it's unclear to me where this should happen. Would the service recursively check all struct fields on config.Config and call Validate for all of them? Or would this be a default implementation of Validate for component config structs?

@FreakyNobleGas
Copy link
Contributor

@TylerHelmuth, if you're not already working on this, I would like to work on it. :)

@TylerHelmuth
Copy link
Member

@FreakyNobleGas its all yours.

@FreakyNobleGas
Copy link
Contributor

Hey @mx-psi & @bogdandrutu, I've been attempting to implement the approach bogdandrutu described using reflection, but having some trouble with getting the Validate method using reflect.

Just as a proof of concept, my initial approach was to go through each Exporter in config.go and find the nested structures that implement the Validate method.

Here is the function I wrote to accomplish this, but the conditional to check if Validate is a method for the struct is never hit even though the struct does have a Validate function. (I confirmed this with print statements that I've removed for clarity)

func findValidateMethod(r reflect.Type) {
	
	switch r.Kind() {
	case reflect.Ptr:
		r = r.Elem()
		findValidateMethod(r)
	case reflect.Struct:
		nf := r.NumField()
		for i := 0; i < nf; i++ {
			f := r.Field(i).Type

			method := reflect.ValueOf(&f).MethodByName("Validate")
			if method.IsValid() {
				fmt.Println("Validate exists!")
			}

			if f.Kind() == reflect.Struct {
				findValidateMethod(f)
			}
		}
	}
}

@mx-psi
Copy link
Member Author

mx-psi commented Apr 1, 2022

I am not familiar enough with reflection in Go to really help with this, but I can say that I managed to reproduce it https://go.dev/play/p/o73crbqvPbE and the problem is that there are zero methods on the reflect.Value you are building.

@FreakyNobleGas
Copy link
Contributor

@mx-psi, this was extremely helpful! After taking a look at your code and an example from GeekforGeeks, I realized that the function should take type "reflect.Value" instead of "reflect.Type". With this change, the program now reports that Validate() exists. Here's a link to the example you sent with this modification.

Another lesson learned from your example is that the receiver type for the struct method will also have an impact. If an addressable struct is passed to the MethodByName function, it will be able to detect struct methods with both pointer and value receivers, however, if the struct is not addressable, then only methods with a value receiver will be detected. The reflection package has a function called Addr() which can return a pointer to the struct, but only if it is addressable. In the Go Playground example I shared, an example of an unaddressable struct is SubConfigOne.

When I tested this code in config.go, the function is now reporting that Validate does exist on structs such as exporterhelper.QueueSettings. (Woot!) My only concern is not being able to catch the Validate method on unaddressable structs, which introduces the possibility that Validate wouldn't be executed and with this implementation, we wouldn't be able to detect it.

Curious on your thoughts and thank you again for the help.

@mx-psi
Copy link
Member Author

mx-psi commented Apr 4, 2022

Glad to see I could help @FreakyNobleGas. Being able to run Validate on struct fields like SubConfigOne is important, since many of the components use such structs. Again, definitely not as familiar as I would like with reflect, but I think reflect.Indirect will help you here.

@bogdandrutu
Copy link
Member

@mx-psi have you looked into https://github.com/go-playground/validator it may be easier to use something that already offers some kind of support. Not sure if it helps, just throwing ideas to you :))

@FreakyNobleGas
Copy link
Contributor

Hey @mx-psi & @bogdandrutu, this took me forever, but I believe I have a proper solution to this issue using the standard reflect library that can call methods of both by value and by pointer regardless if the struct was passed as a pointer or by value.

Here's the code that represents this solution: https://go.dev/play/p/1Y8d-mXR4IP
(@bogdandrutu, I wanted to let you know that I'm not ignoring your last idea, I just wanted to share this first in case this is a good solution :) )

@bogdandrutu
Copy link
Member

bogdandrutu commented Apr 14, 2022

Look here how I check that a reflect.Value implements an interface. You should not look for method, you should check if it implements an interface: https://github.com/open-telemetry/opentelemetry-collector/pull/4879/files#diff-42d3514f5e3be7f3e45e57a1a6fddedacd15835b09a044cc74107f972973d1adR227

@FreakyNobleGas
Copy link
Contributor

Apologies if I'm misunderstanding this, but I thought the intention was to call Validate() on structs that exist within an interface such as QueueSettings.

@bogdandrutu
Copy link
Member

bogdandrutu commented Apr 15, 2022

@FreakyNobleGas indeed, but go is duck typing, see https://medium.com/@matryer/golang-advent-calendar-day-one-duck-typing-a513aaed544d

TLDR: If a struct has a "Validate() error" func it means it implements that interface.

@FreakyNobleGas
Copy link
Contributor

Hey @bogdandrutu,

I was able to get a working solution for this, but I'm curious if there is still a need since config/config.go has been deprecated?

Here is the link to my solution to call all validate methods based on your recommendation to use duck typing. I can move it to another location if you still think it is useful.

@mx-psi
Copy link
Member Author

mx-psi commented May 23, 2022

I'm curious if there is still a need since config/config.go has been deprecated?

There is still a need. The config package is not being removed, it's just being moved to a different place in the repository.

mattsains added a commit to mattsains/opentelemetry-collector that referenced this issue Nov 17, 2022
mattsains added a commit to mattsains/opentelemetry-collector that referenced this issue Nov 17, 2022
@mattsains
Copy link
Contributor

mattsains commented Nov 18, 2022

I was working on this by making some modifications to @FreakyNobleGas 's solution and integrated it into the new config implementation here: 84e49f1. However, it looks like the implementation changed again in #6572 so I'll be re-targetting my changes on top of that new PR.

mattsains added a commit to mattsains/opentelemetry-collector that referenced this issue Nov 18, 2022
mattsains added a commit to mattsains/opentelemetry-collector that referenced this issue Nov 18, 2022
@mattsains
Copy link
Contributor

It looks like this functionality is already being worked on in this draft PR: #6545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
7 participants