Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

filelog: add file_name_resolved and file_path_resolved attributes #189

Merged
merged 6 commits into from
Jun 22, 2021

Conversation

sumo-drosiek
Copy link
Member

Add file_name_resolved and file_path_resolved attributes. They are going to keep information about absolute path and file name after symlinks resolution

fixes #181
Replacement for #184 (new head repository)

@sumo-drosiek sumo-drosiek requested a review from a team June 16, 2021 07:00
@codecov
Copy link

codecov bot commented Jun 16, 2021

Codecov Report

Merging #189 (4049dee) into main (cdbb6d6) will decrease coverage by 0.0%.
The diff coverage is 86.4%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #189     +/-   ##
=======================================
- Coverage   75.7%   75.7%   -0.1%     
=======================================
  Files         95      95             
  Lines       4371    4400     +29     
=======================================
+ Hits        3313    3334     +21     
- Misses       736     741      +5     
- Partials     322     325      +3     
Impacted Files Coverage Δ
operator/builtin/input/file/reader.go 67.0% <68.0%> (-1.3%) ⬇️
operator/builtin/input/file/config.go 77.0% <100.0%> (+3.5%) ⬆️
operator/builtin/input/file/file.go 73.5% <100.0%> (ø)
testutil/mocks.go 72.2% <0.0%> (-2.0%) ⬇️
operator/helper/transformer.go 100.0% <0.0%> (ø)

@sumo-drosiek
Copy link
Member Author

Original comment

Do you expect a performance hit here?

This could be computed once per Reader, but that would mean it does not detect changes in symlinks. So perhaps the next best thing would be to check if it is needed at all. Since the default configuration does not need it, this would potentially avoid a performance hit for many users.

@djaglowski

I moved path resolving to Reader init. I also added test for rotation case and it looks fine. There is no issue with rotation due to implemented fingerprint.

One thing I observed during test, is that if we rotate to file with the same content it won't work. Is that a serious issue? Should it be fixed?

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks very nice. A few suggestions though.

Additionally, I think we should look at performance impact. Will you please run make bench on main and on your branch, and post the relevant benchmark results here?

if we rotate to file with the same content it won't work

This is as designed. Files with exactly the same contents are typically not meant to be reingested. Most commonly, this occurs when files are rotated using the copy/truncate method. In this scenario, multiple files with the same fingerprint exist briefly (after copy, before truncate), but they are deduplicated so as not to duplicate ingestion. One could perhaps define a use case that requires reingestion of the same content under a different file name, but I'm somewhat skeptical that this would be of enough value to support automatically.

operator/builtin/input/file/file_test.go Show resolved Hide resolved
operator/builtin/input/file/file.go Outdated Show resolved Hide resolved
operator/builtin/input/file/reader.go Outdated Show resolved Hide resolved
operator/builtin/input/file/file_test.go Show resolved Hide resolved
@sumo-drosiek sumo-drosiek force-pushed the drosiek-file-path-resolved branch from e2fcca1 to e4a6bbe Compare June 16, 2021 14:58
Signed-off-by: Dominik Rosiek <[email protected]>
@sumo-drosiek sumo-drosiek force-pushed the drosiek-file-path-resolved branch from e4a6bbe to 07b59ea Compare June 16, 2021 15:23
@@ -138,24 +142,36 @@ func (c InputConfig) Build(context operator.BuildContext) ([]operator.Operator,
filePathField = entry.NewAttributeField("file_path")
}

fileNameResolvedField := entry.NewNilField()
if c.IncludeFileNameResolved {
fileNameResolvedField = entry.NewAttributeField("file_name_resolved")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look into standardizing file related attribute names as Otel semantic conventions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@tigrannajaryan tigrannajaryan Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think we need to decide on how we group this attributes, perhaps introduce a file.* namespace and put everything there. Perhaps also some of the attributes discussed here should in that namespace, e.g. stream?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be added to https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/resource/semantic_conventions?

Maybe it's just a question of where is the right place to add the conventions, but this particular location seems to imply that we are defining a file as a resource. I'm not opposed to this, but my understanding is that, in the context of the file_input operator / filelog receiver, we should not consider files to be resources. They are not the things that emitted the logs. They are essentially just a mechanism by which logs are transmitted.

So just to be clear, are we talking about defining file-related fields as a resource, but then just using the same convention to structure our attributes here? Or do we need to add a parallel section to the spec, which specifically defines attribute conventions for files?

Maybe this isn't an important nuance, but I want to make sure we're not missing it.

Copy link
Member

@djaglowski djaglowski Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass proposal for a new file.* namespace:

file.name
file.path
file.name.resolved
file.path.resolved
file.stream

If we can agree on a general structure here, then perhaps we can switch to that in this PR, and formalize the semantic convention asynchronously, and backport to this repo if changes are made there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right, these are not resource attributes, they are log record attributes.

@sumo-drosiek sumo-drosiek force-pushed the drosiek-file-path-resolved branch from bd34ffc to 51d22df Compare June 17, 2021 06:42
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

We have one outstanding item (tracked in #191) to address regarding the establishment of a file.* namespace in the semantic conventions, but I believe we are using reasonable attribute names and can adapt them later if necessary.

Any objections to merging now @tigrannajaryan?

@tigrannajaryan
Copy link
Member

This looks good to me.

We have one outstanding item (tracked in #191) to address regarding the establishment of a file.* namespace in the semantic conventions, but I believe we are using reasonable attribute names and can adapt them later if necessary.

Any objections to merging now @tigrannajaryan?

No objection. Please file a spec repo issue and add to the next Log SIG agenda.

@djaglowski
Copy link
Member

Opened opentelemetry-specification#1772

@djaglowski djaglowski merged commit 0f9b1e6 into open-telemetry:main Jun 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add real filepath to the file input
3 participants