-
Notifications
You must be signed in to change notification settings - Fork 41
filelog: add file_name_resolved and file_path_resolved attributes #189
filelog: add file_name_resolved and file_path_resolved attributes #189
Conversation
Codecov Report
@@ Coverage Diff @@
## main #189 +/- ##
=======================================
- Coverage 75.7% 75.7% -0.1%
=======================================
Files 95 95
Lines 4371 4400 +29
=======================================
+ Hits 3313 3334 +21
- Misses 736 741 +5
- Partials 322 325 +3
|
I moved path resolving to Reader init. I also added test for rotation case and it looks fine. There is no issue with rotation due to implemented fingerprint. One thing I observed during test, is that if we rotate to file with the same content it won't work. Is that a serious issue? Should it be fixed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks very nice. A few suggestions though.
Additionally, I think we should look at performance impact. Will you please run make bench
on main
and on your branch, and post the relevant benchmark results here?
if we rotate to file with the same content it won't work
This is as designed. Files with exactly the same contents are typically not meant to be reingested. Most commonly, this occurs when files are rotated using the copy/truncate method. In this scenario, multiple files with the same fingerprint exist briefly (after copy, before truncate), but they are deduplicated so as not to duplicate ingestion. One could perhaps define a use case that requires reingestion of the same content under a different file name, but I'm somewhat skeptical that this would be of enough value to support automatically.
Signed-off-by: Dominik Rosiek <[email protected]>
…tion of symlink targets Signed-off-by: Dominik Rosiek <[email protected]>
Signed-off-by: Dominik Rosiek <[email protected]>
e2fcca1
to
e4a6bbe
Compare
Signed-off-by: Dominik Rosiek <[email protected]>
e4a6bbe
to
07b59ea
Compare
Signed-off-by: Dominik Rosiek <[email protected]>
@@ -138,24 +142,36 @@ func (c InputConfig) Build(context operator.BuildContext) ([]operator.Operator, | |||
filePathField = entry.NewAttributeField("file_path") | |||
} | |||
|
|||
fileNameResolvedField := entry.NewNilField() | |||
if c.IncludeFileNameResolved { | |||
fileNameResolvedField = entry.NewAttributeField("file_name_resolved") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should look into standardizing file related attribute names as Otel semantic conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
valid point. Are they any ideas for that? Should it be added to https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/resource/semantic_conventions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think we need to decide on how we group this attributes, perhaps introduce a file.*
namespace and put everything there. Perhaps also some of the attributes discussed here should in that namespace, e.g. stream
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be added to https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/resource/semantic_conventions?
Maybe it's just a question of where is the right place to add the conventions, but this particular location seems to imply that we are defining a file as a resource. I'm not opposed to this, but my understanding is that, in the context of the file_input
operator / filelog
receiver, we should not consider files to be resources. They are not the things that emitted the logs. They are essentially just a mechanism by which logs are transmitted.
So just to be clear, are we talking about defining file-related fields as a resource, but then just using the same convention to structure our attributes here? Or do we need to add a parallel section to the spec, which specifically defines attribute conventions for files?
Maybe this isn't an important nuance, but I want to make sure we're not missing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass proposal for a new file.*
namespace:
file.name
file.path
file.name.resolved
file.path.resolved
file.stream
If we can agree on a general structure here, then perhaps we can switch to that in this PR, and formalize the semantic convention asynchronously, and backport to this repo if changes are made there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, these are not resource attributes, they are log record attributes.
bd34ffc
to
51d22df
Compare
Signed-off-by: Dominik Rosiek <[email protected]>
51d22df
to
4049dee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
We have one outstanding item (tracked in #191) to address regarding the establishment of a file.*
namespace in the semantic conventions, but I believe we are using reasonable attribute names and can adapt them later if necessary.
Any objections to merging now @tigrannajaryan?
No objection. Please file a spec repo issue and add to the next Log SIG agenda. |
Add
file_name_resolved
andfile_path_resolved
attributes. They are going to keep information about absolute path and file name after symlinks resolutionfixes #181
Replacement for #184 (new head repository)