Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON #8

Closed
swizzley opened this issue Jul 25, 2016 · 7 comments
Closed

JSON #8

swizzley opened this issue Jul 25, 2016 · 7 comments

Comments

@swizzley
Copy link

I suppose this is more of a feature request or I guess it could be considered documentation request if this already exists... but I'd like to be able to use your beat to input JSON arrays, for whatever reason my body message comes in as a JSON string instead of actually getting unmarshalled as JSON, this appears to be an issue with elasticsearch rather than your beat since I have to wrap my json array in an object before I can even curl it into ES manually. So ideally adding a "field" under JSONBODY would be the ideal way to do this, or perhaps that already exists and I'm just not doing it right, either way any help would be appreciated. I'm more than happy to send you a pull request if I can understand where / how this is being done.

@christiangalsterer
Copy link
Owner

Hi @swizzley,

I think what you are looking for is available with version 1.1.0.

If the HTTP endpoint returns a proper JSON structure it is also added in the field jsonBody (see https://github.com/christiangalsterer/httpbeat/blob/master/docs/fields.asciidoc). You can also modify if "dots" shall be replaced in the structure and if the structure shall be flattened, see https://github.com/christiangalsterer/httpbeat/blob/master/docs/configuration.asciidoc

@swizzley
Copy link
Author

yeah my problem is that it is returning a json array therefore it prints the json as a string, so the response field has this massive json blob as its' body. Additionally I don't want to create unique IDs in elastic search for each poll interval, I just want the documents inside the json poll to derive their ID from a given field and then simply update that document on subsequent polls. I've figured out how to do this with a for loop to derive the _id then select the document from the response with a lil jq before adding the metadata before sending in _bulk, but ideally I'd like to integrate these methods into your beat , what do you think?

@rompic
Copy link
Contributor

rompic commented Nov 19, 2016

maybe an option to deactivate the body field in the response would be an option.
I just ran into this limitation with a very big response (using graylog as a target for the logstash output):
message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="beat_response_body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '...', original message: bytes can be at most 32766 in length; got 167036]

worked after deactivating the body manually and recompiling.

http://stackoverflow.com/questions/24019868/utf8-encoding-is-longer-than-the-max-length-32766

@rompic
Copy link
Contributor

rompic commented Nov 20, 2016

We actually could set it to nil if unmarshalling succeeds.

@christiangalsterer
Copy link
Owner

christiangalsterer commented Nov 20, 2016

Hi everybody,
thank you for all your feedback. I was also wondering if I should not slightly change the behaviour in a way that you either return the jsonBody or body field.

Here I see the options:

  1. Always return both fields
  2. Set body to nil, if unmarshalling to json is successful
  3. Users need to specify the return format. body field is then either string or json.

Would be great to get some feedback on your preferences.

@rompic
Copy link
Contributor

rompic commented Nov 20, 2016

Thanks for the fast response.
I personally don't see any value in 1. as the same information is sent twice.
I'm actually not sure why anyone would want to send json info as a string, but having an option (3) could also be used to not try to unmarshal non-json output in the first place and could be a minor performance improvement. Due to the fact that already a json related setting exists (dot mode) I would vote vor 3.

christiangalsterer added a commit that referenced this issue Dec 18, 2016
* [Output format of response body is now defined via output_format parameter](#8). Default is 'string'

Bugfixes
* [Missing es2x template](#13)
* [Correct parsing of large numbers in JSON output](#12)
@christiangalsterer
Copy link
Owner

Released with version 3.0.0.

There is now a new "output_format" parameter which allows to specify the format of the response body. If it is not set or "string" then the body is returned as string in the "body" field. If set to "json" then the response body is returned as json in the "jsonBody" field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants