Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awscloudwatchlogsexporter exporter needs to support dynamic log stream names #10887

Closed
thomasbaldwin opened this issue Jun 10, 2022 · 17 comments
Closed
Labels

Comments

@thomasbaldwin
Copy link
Contributor

Describe the bug
If you are running a daemonset with a configmap that uses the awscloudwatchlogsexporter plugin as an exporter, and you have more than one node, you will have an issue in which multiple actors are trying to write to the same log stream at the same time. This causes an issue with cloud watch in which it will say you have an invalid sequence token.

Steps to reproduce
If possible, provide a recipe for reproducing the error.

What did you expect to see?
I would like to have something like the awsemfexporter has, in which you can provide {NodeName} as the log stream name, and it will resolve it to the k8s.node.name

What did you see instead?
No ability to do this

What version did you use?
0.51

What config did you use?
Config: (e.g. the yaml config file)

Environment
AWS EKS

@thomasbaldwin thomasbaldwin added the bug Something isn't working label Jun 10, 2022
@djaglowski djaglowski added the priority:p2 Medium label Jun 10, 2022
@djaglowski
Copy link
Member

@boostchicken, pinging you as code owner.

@boostchicken
Copy link
Member

boostchicken commented Jun 10, 2022

Sure let me give this a look and see if I can colme up with an LOE, thanks @djaglowski

@boostchicken
Copy link
Member

boostchicken commented Jun 10, 2022

@thomasbaldwin can you provide some configs or more artifacts that cause this issue? If I can easily reproduce it I am sure I can find a quick fix.

I am hesitant to put a templating language in there. Not saying no, I just want to discuss if it is inline with Otel as a whole. It would be very easy just to add 4 unique chars to the start of the name, or a date, but bringing in a whole templating language is a big change for OTel-contrib as a whole and whatever we do I want to make sure it is reusable.

@djaglowski who else should we involve in this discussion in terms of the overall direction that the collector is headed. I don't want to add a bunch of functionality that is out of line with the project as a whole. I asked for some extra opinions on slack as well.

@djaglowski
Copy link
Member

Thanks for the quick response @boostchicken.

Looking at this a little closer, it's not clear to me that this is really an issue for this component to solve. If I'm understanding the issue correctly, the proposed idea ultimately is adding or modifying an attribute on the log record. Is that right? If so, it probably should be handled by a processor (eg attributesprocessor or transformprocessor).

@boostchicken
Copy link
Member

boostchicken commented Jun 11, 2022

Upon further thought, the real issue here is there is no singleton for AWS API clients A name just might solve it. There is a sequence counter in the API call, if its out of order it is rejected, lots of services have it.

So I think what we need to do here is find every AWS API call in Otel-contrib and funnel it through a Singleton for each AWS Service Client. Make a factory, then share instances across threads.

@djaglowski This is no small change, if I go about it I would still like people's blessing. Also does AWS have anything to say about this?

@boostchicken
Copy link
Member

So using a single client might fix it, but I must yield to the documented AWS solution

https://aws.amazon.com/premiumsupport/knowledge-center/cloudwatch-invalid-sequence-token/#:~:text=InvalidSequenceToken%20errors%20when%20multiple%20sources,stream%2C%20you%20receive%20an%20error.

We need to make the log streams unique. Ill take a peak and see what the emf exporter does and if i cn make it generic.

@boostchicken
Copy link
Member

OK found the code here it is

// getLogInfo retrieves the log group and log stream names from a given set of metrics.
func getLogInfo(rm *pmetric.ResourceMetrics, cWNamespace string, config *Config) (string, string, bool) {
var logGroup, logStream string
groupReplaced := true
streamReplaced := true
if cWNamespace != "" {
logGroup = fmt.Sprintf("/metrics/%s", cWNamespace)
}
strAttributeMap := attrMaptoStringMap(rm.Resource().Attributes())
// Override log group/stream if specified in config. However, in this case, customer won't have correlation experience
if len(config.LogGroupName) > 0 {
logGroup, groupReplaced = replacePatterns(config.LogGroupName, strAttributeMap, config.logger)
}
if len(config.LogStreamName) > 0 {
logStream, streamReplaced = replacePatterns(config.LogStreamName, strAttributeMap, config.logger)
}
return logGroup, logStream, (groupReplaced && streamReplaced)
}

We need to refactor all of that out to the internal/aws namespace, update awsemfexporter, and then update cloudwatchlogsexporter

The change could be done quicker if we duplicate code, I doubt anyone wants to do that

@djaglowski Gonna wait on comments before I move forward.

@boostchicken
Copy link
Member

boostchicken commented Jun 11, 2022

Yeah this is gonna be ugly, we gotta re-do the whole exporter. It only creates 1 Pusher because it assumes the name never changes. The name could change on any attribute that is on a Logs.[i]Resource().Attributes Attributes are not in scope till translation, the pusher has already been made. Also, since we will now have multiple pushers we are going to have to have a Map of pushers so we don't re-create them all the time.

func (emf *emfExporter) getPusher(logGroup, logStream string) cwlogs.Pusher {
emf.pusherMapLock.Lock()
defer emf.pusherMapLock.Unlock()
var ok bool
var streamToPusherMap map[string]cwlogs.Pusher
if streamToPusherMap, ok = emf.groupStreamToPusherMap[logGroup]; !ok {
streamToPusherMap = map[string]cwlogs.Pusher{}
emf.groupStreamToPusherMap[logGroup] = streamToPusherMap
}
var emfPusher cwlogs.Pusher
if emfPusher, ok = streamToPusherMap[logStream]; !ok {
emfPusher = cwlogs.NewPusher(aws.String(logGroup), aws.String(logStream), emf.retryCnt, *emf.svcStructuredLog, emf.logger)
streamToPusherMap[logStream] = emfPusher
}
return emfPusher
}
func (emf *emfExporter) listPushers() []cwlogs.Pusher {
emf.pusherMapLock.Lock()
defer emf.pusherMapLock.Unlock()
pushers := []cwlogs.Pusher{}
for _, pusherMap := range emf.groupStreamToPusherMap {
for _, pusher := range pusherMap {
pushers = append(pushers, pusher)
}
}
return pushers
}

@boostchicken
Copy link
Member

boostchicken commented Jun 11, 2022

Paging @aateeqi and @Aneurysm9

Is AWS willing to pickup this work or is the community going to have to solve this?

@thomasbaldwin
Copy link
Contributor Author

thomasbaldwin commented Jun 13, 2022

@boostchicken thanks so much for your quick replies! is there a quick and easy way you would recommend getting it done? Even if it's a little sloppy, this is a blocker for something I'm working on, so I would be willing to run my own image until this feature request gets merged in. I would also be willing to work with you on the cleanest way to do it

@boostchicken
Copy link
Member

boostchicken commented Jun 24, 2022

I pretty much outlined it above, could make a quick copy paste job out of it. I wanted to give AWS sometime to chime in, I suppose they aren't gonna support this. If you wanna take a crack at it, go for it. I will be happy to review all your contributions and help you get up to speed on the contribution process where I can. If not I think I can get to this in a couple days.

Cheers!

@boostchicken
Copy link
Member

boostchicken commented Jun 24, 2022

The only thing that makes me hesitate is the need to refactor the functions I posted above out into the internal/aws space. It's a pretty material change and I am not sure how it might impact tests etc amd will require changes in multiple modules.

You could literally copy and paste these functions, replacing the current pusher and fix this. That would be the ugly way which would cause a large amount of duplicate code and not be of quality high enough to be considered for an OSS project imho. However, it will unblock you until we do it right.

boostchicken added a commit to boostchicken/opentelemetry-collector-contrib that referenced this issue Jul 8, 2022
Moving to multiple pushers

Signed-off-by: John Dorman <[email protected]>
@boostchicken
Copy link
Member

Hey @thomasbaldwin give this a go #12179

@boostchicken
Copy link
Member

It should now do the same replacements as the emf module. Give it a go!

@thomasbaldwin
Copy link
Contributor Author

sounds good @boostchicken ! will check it out and post back!

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 9, 2022
@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants