Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add log.origin fields #563

Merged
merged 7 commits into from
Sep 19, 2019
Merged

Add log.origin fields #563

merged 7 commits into from
Sep 19, 2019

Conversation

felixbarny
Copy link
Member

No description provided.

schemas/log.yml Outdated
@@ -43,3 +43,28 @@
short: Name of the logger.
description: >
The name of the logger inside an application. This is usually the name of the class which initialized the logger, or can be a custom name.

- name: source.file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry if what we call source here will conflict with what we have in source under https://github.com/elastic/ecs/blob/master/schemas/source.yml At the same time I think some argument could be it is similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time I think some argument could be it is similar?

I'd say it's actually not really similar to source. We could also rename to log.origin.* to avoid ambiguities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on origin.

@felixbarny felixbarny changed the title Add log.source fields Add log.origin fields Sep 17, 2019
felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 17, 2019
felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 17, 2019
schemas/log.yml Outdated
example: init
short: The function which originated the log event.
description: >
The name of the function or method which originated the log event.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: If it's a method, would we expect it to also indicate what class it belongs to? Something like Foo.bar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it would just be the bar in this case. We could, however, add an optional origin.class for languages which support that concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, SGTM. Lets only add it if we already have a use case for it. But

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do have a use case for logging. For java logging libraries, we usually get both the file name and the class name of the origin. See also https://docs.oracle.com/javase/7/docs/api/java/lang/StackTraceElement.html#getClassName().

I didn't add it at first because not all languages have the concept of a class. But when it's optional, why not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could name quite a few that have a similar concept ;-) In the case of Golang I would also use .class even though that is not 100% correct. @axw Hope you agree :-)

+1 on adding class directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I'm fine with class as a name, but I would have made it a qualified name like Foo.bar as you described. @felixbarny what was the reason behind excluding the class name? Is it useful to search on them independently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason why the class name should be included log.origin.function, am I missing something?

We can definitely have an additional field for the class (log.origin.class) which contains the fully qualified class name like org.elasticsearch.bootstrap.Bootstrap. But we currently don't have that field in the APM schema where we have module instead, which would translate to a Java package.

I'd like to defer these potentially controversial discussions and get the lowest common denominator in which, I think, is log.origin.file.name, log.origin.file.line and log.origin.function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the description it mentions, it can be a function or method. So if it is a method, would it be wrong to put here Bootstrap.foo? What I'm getting at, it would be nice if the shortcut to this discussion is that both options are fine. Note: I left out the package name on purpose, was not even thinking of it before @felixbarny brought up the example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would seem a bit weird to me as the method name is foo, not Bootstrap.foo. https://docs.oracle.com/javase/7/docs/api/java/lang/StackTraceElement.html#getMethodName() would also just return the method name.

What would be the benefit of concatenating the simple class name and the method name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny I'm not too fussed if this is deferred. I don't like to sound like a broken record, but the use case I have in mind is once again profiling, and I was hoping we could find a common definition of fields.

I'd say the benefit of having them combined is having a simple display name, so that UIs don't need to know how to recombine the class and method name. e.g. it's "Class::Method" in C++.

schemas/log.yml Outdated
@@ -43,3 +43,28 @@
short: Name of the logger.
description: >
The name of the logger inside an application. This is usually the name of the class which initialized the logger, or can be a custom name.

- name: origin.file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reuse the file object here: https://github.com/elastic/ecs/blob/master/schemas/file.yml#L15

So this becomes log.origin.file.path I guess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! It would be log.origin.file.name though. Should we then also do log.origin.file.line instead of log.origin.line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense I guess. I wonder if we "must" extend file for this or we just use it here? I'm good with just adding it here but I'm sure at @webmat will have an opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with directly adding 2-3 specific fields for now 👍

We're going for something like this for package.* as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Using the file.* object here would require a change to ECS, since it is not currently defined as a reusable (nestable) object. I'm not saying this is wrong, as the idea has come up before to capture events like file-copy events, but we should consider this as we work through this PR, addressing where else file.* could be nested. For now, I would suggest we don't re-use the file.* object here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not using the full file object here but re-uses the same structure. Coming up with slightly different names just not to re-use the file object structure doesn't feel quite right. But I think I don't quite understand what you have in mind here. Which names would you suggest?

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin SGTM.

I somehow remember a long time ago we had discussions around origin and source in Filebeat. Because of this pinging @urso @ph for them to have a quick look if they also agree, origin is good here.

felixbarny added a commit to elastic/ecs-logging-java that referenced this pull request Sep 18, 2019
CHANGELOG.next.md Outdated Show resolved Hide resolved
@ruflin
Copy link
Member

ruflin commented Sep 19, 2019

@felixbarny @axw Suggestion for getting this PR in quickly: Remove function from this PR and open a separate one with it like this we can merge the other two fields quickly.

@axw
Copy link
Member

axw commented Sep 19, 2019

@ruflin @felixbarny after further consideration, I'm personally happy for this to go in as-is. What I have in mind is still a bit future-looking, and could just as easily be implemented as a different field, such as "displayname".

schemas/log.yml Outdated Show resolved Hide resolved
Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM to also merge as is. @felixbarny really sorry, spotted the core / extended part just now. Can you change and then we get it in? There is also a merge conflict ..

@felixbarny felixbarny merged commit c6e5685 into elastic:master Sep 19, 2019
@felixbarny felixbarny deleted the log-source branch September 19, 2019 10:38
felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 19, 2019
felixbarny added a commit to elastic/ecs-logging-java that referenced this pull request Sep 19, 2019
@webmat
Copy link
Contributor

webmat commented Sep 23, 2019

Sorry for not responding to that PR last week, when this was happening.

I'm not 100% sold on the addition of origin under log.*. In another context, we had been discussing people's confusion between log.* and event.*, and therefore wanted to introduce this clarification to the description of the log field set. So the thinking was: log.* is for the archaic or low level details, whereas event.* is for the more interesting or relevant details.

I'm not saying we need to move this into event.origin.* necessarily, but simply that I think we need to move this out from under log.* and find a better place for it.

@MikePaquette
Copy link
Contributor

Likewise, sorry for the late response. I have concerns similar to what @webmat has raised. To help with ECS spec consistency and to avoid ECS user confusion, we've been providing high-level guidance for the understanding of certain fields and field sets (e.g., #516 (comment), #290 (comment)) as follows:

  1. Where the event comes from:
    event.module, event.dataset
  2. How the event should be analyzed/visualized/correlated:
    event.kind, event.category, event.action, event.type, event.outcome
  3. How the event was created or transported (e.g., syslog details)
    log.*
  4. Everything else about the event
    all other ECS field sets

I am not sure I fully understand the proposed use of log.origin, but it seems to me that it fits better into No. 1 and/or No. 4 above rather than into No. 3.

@felixbarny
Copy link
Member Author

  1. How the event was created or transported (e.g., syslog details)
    log.*

Sounds like log.origin.* makes sense in this case then 🙂

It captures in which source file (incl. name of the function and line number) the log event was created. For example:

print 'Loading user defined schemas: {0}'.format(include_glob)

log.origin.file.name: generator.py
log.origin.file.line: 27
log.origin.funtion: main

@webmat
Copy link
Contributor

webmat commented Sep 24, 2019

Filebeat currently populates log.file.name with the name of the log file an event was read from. This is not currently an ECS field, but could eventually become one.

We're now introducing log.origin.file.name, and I get the feeling people will start using this to track the log file name / line where a log originated from (e.g. /var/log/system.log), and be confused why Filebeat uses a different field.

I think creating a code-specific field set would add a lot of semantic meaning to these events. What about this?

code.file.name: generator.py
code.file.line: 27
code.function: main

The code field set could be used for the source of errors/messages, and also for other things eventually. I'm sure there's events in the upcoming Code Kibana app that would benefit from having a code field set.

@felixbarny
Copy link
Member Author

I get the feeling people will start using this to track the log file name / line where a log originated from (e.g. /var/log/system.log)

That would be a misuse of log.origin then. Something we can't really avoid with any field. But we could improve the documentation to be more clear about that.

I'm generally fine with code too but down the line, we probably need a structure which represents a stack trace element which is reusable in different contexts. For example, to represent profiling data, as @axw mentioned, to represent parsed stack traces, which contain an array of stack trace elements and also as a single stack trace element for the origin of a log event. I'm not sure code is the best namespace for the logging use case. Yes, it's about a stack trace element which basically represents a code reference but it's in the context of logging. Hence my vote would still go for log.origin.

Let's include @mw-ding and @zfy0701 from the code team to this discussion who are working on APM/Code integration.

@mw-ding
Copy link

mw-ding commented Sep 24, 2019

Thanks @felixbarny for cc. It's great to know that we can have the origin of the logs. This will definitely enable some use cases for Code integrations.

Questions. This change is for elastic common schema, so when will this be available in the actually data? Also, even with this schema, the data is not guaranteed to have them, right? Any ideas on the coverage?

@ruflin
Copy link
Member

ruflin commented Sep 25, 2019

Also +1 to continue with log.origin and improve documentation and tackle code afterwards.

@felixbarny
Copy link
Member Author

when will this be available in the actually data? Also, even with this schema, the data is not guaranteed to have them, right? Any ideas on the coverage?

This data should always be considered optional. It's basically just a recommendation for people to use these fields. Additionally, I have created https://github.com/elastic/java-ecs-logging which is a plugin for the most commonly used Java loggers which format log events to an ECS compatible structure. By default, they don't include the log.origin.* fields but it can be enabled with a flag. The reason it's disabled by default is that this information is quite expensive to gather. The loggers basically have to capture a stack trace for each log event.

@webmat
Copy link
Contributor

webmat commented Sep 25, 2019

@mw-ding The coverage will be dependent on adding support in the sources of these events, so it will come over time. You should sync up with folks implementing the sources you have in mind to get their take on implementing this -- I assume it's mostly APM clients? But in short, your solutions needs to support this being present or absent.

These fields being in ECS also means that users will be able to populate these fields in their custom events as well. This means more opportunities for the Code app to hook up with arbitrary sources for integration :-)

@webmat
Copy link
Contributor

webmat commented Sep 25, 2019

So the consensus is to use log.origin.* for now, and potentially do a breaking change in 8.0 / ECS 2.0?

Seems like an unnecessary pain, since this is still new and unpublished (master is just a dev branch).

@ruflin
Copy link
Member

ruflin commented Sep 25, 2019

I think I will be happy with log.origin.* for logs also in the long run, so no breaking changes needed ;-)

@mw-ding
Copy link

mw-ding commented Sep 25, 2019

Yes, this is new and super helpful for Code. Will keep this in mind for future integrations. @felixbarny @webmat

@webmat
Copy link
Contributor

webmat commented Sep 25, 2019

@felixbarny @ruflin I've been finishing up the work on #540 -- meant to clarify the purpose of log.*.

As you can see from #540 (comment), I've come around: I agree with log.origin.*, after all. I retract my hesitations with these new fields.

I think I was conflating the log.origin.* details with the work on stack traces :-)

@mustafacagataytulun
Copy link

I have a follow-up question if I may ask: Is it the correct approach to concatenate full namespace with method name for log.origin.function field? Or is it more related to event.dataset field? Suppose that I have an application named TestWebApp and a code block like this:

namespace MyCompany.TestWebApp
{
  public class AccountsController
  {
    public List<Account> GetAccounts()
    {
      logger.Info("Accounts have been retrieved successfully.");
      
      // Do other things.
      // ...
    }
  }
}

What is the correct approach?

  1. log.origin.function: GetAccounts and event.dataset: MyCompany.TestWebApp.AccountsController
  2. log.origin.function: MyCompany.TestWebApp.AccountsController.GetAccounts
  3. Or something else?

@felixbarny
Copy link
Member Author

Here's what I would recommend:

log.origin.file.name: AccountsController.cs
log.origin.file.line: 42
log.origin.function: GetAccounts
log.logger: MyCompany.TestWebApp.AccountsController

event.dataset is not recommended to use to store the name of the logger. See #281 (comment) for more details.

@mustafacagataytulun
Copy link

Thank you @felixbarny, that explanation is very useful for me.

One more question about log.origin.file.name: Is this field intended just for file name or full path? I mean by full path something like this: /home/beawolf/MyProjects/TestWebApp/AccountsController.cs

@felixbarny
Copy link
Member Author

That would be just the name. Do you need to store the full path? A good place for that would be log.origin.file.path, although that field is not part of ECS yet.

Another potentially useful addition would be log.origin.class which would be the fully qualified class name of the class which originated the log event - MyCompany.TestWebApp.AccountsController in your case.

Not sure if that's possible in .NET but in Java, the logger name (log.logger) may be set to something other than the class name.

@mustafacagataytulun
Copy link

In .NET side, log4net provides full location information in a single field, hence my question was based on that actually. We have just started our implementation, so I haven't got any solid argument for usefulness of storing file path right now.

log4net is the counterpart of log4j in .NET world, and you can set logger name other than the class name as you suspected.

@felixbarny
Copy link
Member Author

Could you link at the docs for which properties are available for the location information?

@mustafacagataytulun
Copy link

Here is the related doc page: log4net LocationInfo (Not so informative unfortunately.)

And here are some actual example values for it in my application:

ClassName: TestWebApp.AccountsController
FileName: /home/beawolf/MyProjects/TestWebApp/AccountsController.cs
FullInfo: TestWebApp.AccountsController.GetAccounts(/home/beawolf/MyProjects/TestWebApp/AccountsController.cs:46)
LineNumber: 46
MethodName: GetAccounts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants