Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error.stack_trace #562

Merged
merged 3 commits into from
Sep 19, 2019
Merged

Add error.stack_trace #562

merged 3 commits into from
Sep 19, 2019

Conversation

felixbarny
Copy link
Member

Based on discussions in #154 and elastic/ecs-logging-java#25.

One missing piece when using ECS for logging is that there's no dedicated field for the stack trace.

We have discussed having a schema for a parsed version of the stack trace. But I don't think that's needed to get started and for some users, that may not even be desired as it would be less readable in the Discover tab in Kibana.

We can always add a schema for a parsed variant of a stack trace later and use the key error.stack_trace_parsed or similar for that. But at the moment we don't really have a use-case of what to do with a parsed stack trace.

That's why this PR just adds a dedicated field for the plain-text stack trace to the error schema.

With this addition, we can do the following mapping:
error.code: exception class name
error.message: exception message
error.stack_trace: plain-text exception stack trace

/cc @pgomulka

@felixbarny felixbarny requested a review from webmat September 16, 2019 13:23
@felixbarny
Copy link
Member Author

Question: would it be allowed schema-wise if that field was an array? I think it would, right? One use-case for that is to put the different lines of a stack trace in different array elements so that they are human-readable: https://github.com/elastic/beats/pull/9988/files#diff-a639ebd9050d80e47bd50d53210aab49R57

One challenge with that approach is that Filebeat does not really like multi-line JSON. But there's a way to make it work:

filebeat.inputs:
  - type: log
    paths: test.log.json
    # we can't use the normal json parsing as they require the json to be a single line only
    # json.keys_under_root: true
    # json.message_key: message
    multiline.pattern: '^{'
    multiline.negate: true
    multiline.match: after
processors:
  - decode_json_fields:
      fields: message
      max_depth: 1
      target: ""
      overwrite_keys: true
  # in case error.stack_trace can't be an array, join the error.stack_trace_lines to a string
  - script:
      when:
        has_fields: ['error.stack_trace_lines']
      lang: javascript
      id: flatten_stack_trace_lines
      source: >
        function process(event) {
            event.Put("error.stack_trace", event.Get("error.stack_trace_lines").join("\n"));
            event.Delete("error.stack_trace_lines");
        }

@felixbarny
Copy link
Member Author

When we get this in I think it makes sense to update the default layout of the Logs UI to include the stack trace.

/cc @weltenwort

felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 16, 2019
felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 16, 2019
@mustafacagataytulun
Copy link

@felixbarny I'm not the reviewer but here is my two cents: I haven't think or used stack trace as an array in my entire career, and also haven't require it to be an array. Moreover, stack trace is a single string field in most of programming languages. It might add a little processing overhead to split it line by line into an array.

felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 18, 2019
@felixbarny
Copy link
Member Author

felixbarny commented Sep 18, 2019

I should have provided more context.

The array is solely used to make the logs more readable. Having the whole stacktrace in one JSON string forces the contents to be in a single line, which is not very readable to say the least.

Compare that to this variant where each line in a stack trace is rendered as a different array element in a new line:

{"@timestamp":"2019-09-17T13:16:48.038Z", "log.level":"ERROR", "message":"Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown] with root cause", "process.thread.name":"http-nio-8080-exec-1","log.logger":"org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/].[dispatcherServlet]","log.origin":{"file":"DirectJDKLog.java","function":"log","line":175},"error.code":"java.lang.RuntimeException","error.message":"Expected: controller used to showcase what happens when an exception is thrown","error.stack_trace":[
	"java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown",
	"\tat org.springframework.samples.petclinic.system.CrashController.triggerException(CrashController.java:33)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)",
	"\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)",
	"\tat java.lang.reflect.Method.invoke(Method.java:498)",
	"\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)",
	"\tat java.lang.Thread.run(Thread.java:748)"]}

That makes the JSON logs actually human-readable so that there shouldn't be a need to have plain-text logs as a fallback. Even during development, this format should be good enough, at least for my taste.

With the Filebeat configuration I have pasted above you can also convert the array back to a regular string. But if ECS supports a value to either be a scalar or array value (as Elasticsearch/Lucene does), joining the array to a single string would be optional.

It might add a little processing overhead to split it line by line into an array.

It takes some processing overhead to merge the array back to a string but rendering as an array does not cause overhead. At least not with my implementation here 🙂:
https://github.com/elastic/java-ecs-logging/blob/33ae6d4eeaabb84b904ecfaad80b965a7c02bb47/ecs-logging-core/src/main/java/co/elastic/logging/EcsJsonSerializer.java#L192-L208

@@ -28,3 +28,11 @@
type: keyword
description: >
Error code describing the error.

- name: stack_trace
level: core
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this extended.

@ruflin
Copy link
Contributor

ruflin commented Sep 18, 2019

@felixbarny You opened unintentionally an other can of worms with this. Overall, ++ on the proposal and yes every field can also be array in lucene. So lets get to the other problem: What does actually error mean in ECS? Is it the error from the event like a stack trace or error in the sense of APM or is it an error that can happen during processing (for example from the ingest pipeline). Unfortunately we use it for both at the moment. As long as only 1 error happens, all is fine, but if we start mixing the two in one event, things become awkward.

@felixbarny
Copy link
Member Author

@webmat already hinted that it probably makes sense to store pipeline errors in a pipeline namespace. I agree with that. The pipeline should have a reserved namespace so that it does not conflict or otherwise interfere with the rest of the ECS spec.

For the time being, I think we can live with the fact that error.message is also used if there are processing errors. Luckily, in that case, it adds a message instead of overriding it, making error.message an array. It's awkward but it's not the end of the world.

An error can be an APM error or an error logged by a logger. Or even both. In the Java agent, we are creating APM errors when logger.error("message", exception); is called. We then annotate the log with the error.id used by APM so that the log line can be correlated with the APM error.

@mustafacagataytulun
Copy link

Having the whole stacktrace in one JSON string forces the contents to be in a single line, which is not very readable to say the least.

Actually, it is readable in Kibana because line endings (\r\n) are included in stack trace (at least when using C#). Kibana elegantly shows them line-by-line. However, as I look at your example, @felixbarny, you refer the readability of raw JSON. Fair enough. :)

It takes some processing overhead to merge the array back to a string but rendering as an array does not cause overhead. At least not with my implementation here 🙂:
https://github.com/elastic/java-ecs-logging/pull/28/files#diff-02b493ca1092b92c519dc566f2ee10eeR192-R208

I was actually referring to the conversion of the single string of stack trace into an array line-by-line, because programming languages generally gives you the whole stack trace as a single string.

Here is another consideration (it comes to my mind when I read @ruflin's comment): Semantically speaking, making stack_trace field an array may be interpreted as each element of the array is a different stack trace. But it can also be my misinterpretation of course. :)

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with merging this as is. I think all the other discussions we can follow up. Also if array or not: In Lucene everything is an array anyways, so it's more a question of documentation.

@felixbarny felixbarny merged commit 4109312 into elastic:master Sep 19, 2019
@felixbarny felixbarny deleted the stack-trace branch September 19, 2019 09:27
@webmat
Copy link
Contributor

webmat commented Sep 23, 2019

The error field set should only encapsulate the metadata of the error, such as the stack trace. So I agree with this addition.

@webmat
Copy link
Contributor

webmat commented Sep 23, 2019

As for the array v string discussion: I think it would make sense keep this as the original string for now. If we want an array representation in Elasticsearch, I think we need to merge this idea with that of the structured / parsed exception details.

From ECS point of view, the log readability discussion is unrelated, IMO. You're free to do what you want in your log file format, as long as it looks like a single block of text when it's saved to Elasticsearch.

In other words, if you need readable JSON logs (do these exist? ;-) ), feel free to split into an array in there, but then re-join them to a block of text prior to indexing.

@weltenwort
Copy link
Member

Even though every field can be an array in Elasticsearch there's still a semantic difference between "line1\nline2" and ["line1", "line2"] in terms of analysis and querying, right? The former might work better with various Elasticsearch features like custom tokenization or proximity, slop and other span-based queries.

@felixbarny
Copy link
Member Author

felixbarny commented Sep 24, 2019

I think this is actually pretty readable and probably good enough so that most people don't need fallback plain-text logs anymore.

{"@timestamp":"2019-08-06T12:09:12.375Z", "log.level": "INFO", "message":"Tomcat started on port(s): 8080 (http) with context path ''", "service.name":"spring-petclinic","process.thread.name":"restartedMain","log.logger":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer"}
{"@timestamp":"2019-08-06T12:09:12.379Z", "log.level": "INFO", "message":"Started PetClinicApplication in 7.095 seconds (JVM running for 9.082)", "service.name":"spring-petclinic","process.thread.name":"restartedMain","log.logger":"org.springframework.samples.petclinic.PetClinicApplication"}
{"@timestamp":"2019-08-06T14:08:40.199Z", "log.level":"DEBUG", "message":"init find form", "service.name":"spring-petclinic","process.thread.name":"http-nio-8080-exec-8","log.logger":"org.springframework.samples.petclinic.owner.OwnerController","transaction.id":"28b7fb8d5aba51f1","trace.id":"2869b25b5469590610fea49ac04af7da"}
{"@timestamp":"2019-09-17T13:16:48.038Z", "log.level":"ERROR", "message":"Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown] with root cause", "process.thread.name":"http-nio-8080-exec-1","log.logger":"org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/].[dispatcherServlet]","log.origin":{"file.name":"DirectJDKLog.java","function":"log","file.line":175},"error.type":"java.lang.RuntimeException","error.message":"Expected: controller used to showcase what happens when an exception is thrown","error.stack_trace":[
	"java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown",
	"\tat org.springframework.samples.petclinic.system.CrashController.triggerException(CrashController.java:33)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)",
	"\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)",
	"\tat java.lang.reflect.Method.invoke(Method.java:498)",
	"\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)",
	"\tat java.lang.Thread.run(Thread.java:748)"]}

The benefit of not parsing out the different elements of a stack trace is that IDEs would still link to the source files, as they would with normal stack traces, because they recognize the structure.

Of course, having plain-text logs as a fallback may not really be needed to begin with but I think it does help people to transition to centralized logging.

feel free to split into an array in there, but then re-join them to a block of text prior to indexing.

I can live with that but I'd also like to discuss options on how to make that simpler in Filebeat. At the moment, Filebeat does not really like multi-line JSONs. Also, a built-in processor which transforms an array to a string would be cool in order to make that use-case more of a first-class citizen. Will take this to the beats repo.

Personally, I think it makes sense to normalize it to be a scalar string just from a data consistency standpoint. When looking at a log event in discover, I'd much rather see error.stack_trace as opposed to error.stack_trace.0, error.stack_trace.1, ...

@pgomulka you raised some concerns about that, wdyt?

there's still a semantic difference between "line1\nline2" and ["line1", "line2"] in terms of analysis and querying, right?

In terms of analyzing and querying it would be the same in this case as error.stack_trace is not indexed at all because the field can be quite large.

index: false

@pgomulka
Copy link

my main general concern (not only to error field but other fields too) is readability of these logs.
I like the way it looks now in your example - Compact and human readable.

I am worried however if ECS will not diverge into more machine oriented protocol. This might not be terribly bad if we assume that there always will have to be a process in the middle with a rich logic to transform human readable logs into ECS schema.
If however ECS will be a default schema for JSON logs (either Elasticsearch server or any other app) then we verbosity and readability might become an issue.
As pointed out earlier, JSON format already makes reading logs harder, it might put people off from trying to make their app's logs ECS compliant. My impression is that this is the goal in elastic ecosystem at least?

@webmat
Copy link
Contributor

webmat commented Sep 25, 2019

Elasticsearch supports plaintext and NDJSON log output, correct?

NDJSON logs are by definition a single JSON object per line. And regardless of the semantics of whether we strive for NDJSON there, I think JSON logs should be optimized for programmatic intake in general.

If a user wants human readability they should pick the plaintext log format.

Finally, if a developer needs the JSON logs to be more human-readable, they are free to set compact=false in their log4j configs, and have the whole thing prettified for human-readable JSON events.

In short, I think the JSON log outputs we offer should be NDJSON.

@pgomulka
Copy link

@webmat at the moment Elasticsearch supports both plaintext and multiline JSON logs (error stacktraces being multiline like in examples by @felixbarny ). The plan at the moment was to remove plaintext logs at some point and keep only JSON logs. This is already done for elasticsearch in docker.

If we maintain both plaintext and JSON we might skip the readability of JSON logs at all. However keeping both forces us to maintain duplicated configs and would make it hard to populate all the information in both logs. ES doesn't generate all too much logs, but disk space usage might be an issue to consider too.
I feel like this is open for a discussion though .

In regards to compact=false I don't think we would need this. If any application decides to use JSON logs as the only output, it would need them to be readable.
If it would maintain both plaintext and JSON, it might keep plaintext readable and JSON as machine intake easy as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants