Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Event Json input and output codecs #4436

Merged
merged 14 commits into from
Apr 22, 2024

Conversation

kkondaka
Copy link
Collaborator

Description

Support for event_json input and output codecs

Issues Resolved

Resolves #4404

Check List

  • [X ] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <[email protected]>
Krishna Kondaka added 2 commits April 18, 2024 00:36
.getThis();
final JacksonEvent event = (JacksonEvent)logBuilder.build();
final Record<Event> record = new Record<>(event);
final String externalOriginationTime = (String)metadata.get(EventJsonDefines.EXTERNAL_ORIGINATION_TIME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably have Jackson handle these conversions by annotating the DefaultEventMetadata properties and deserializing the metadata to the DefaultEventMetadata class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the time an Instant in production use cases?

final JacksonLog.Builder logBuilder = JacksonLog.builder()
.withData(data)
.withEventMetadataAttributes((Map<String, Object>)metadata.get(EventJsonDefines.ATTRIBUTES))
.withTimeReceived(Instant.parse((String)metadata.get(EventJsonDefines.TIME_RECEIVED)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this configurable and default to not passing anything here so that the new time the Event reaches the source is applied to the Event.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event time received should not be optional, I think. Tags, attributes and externalorigination time are optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior when using this codec should be to create a new time received time, which is the default behavior when an Event is created.

Signed-off-by: Krishna Kondaka <[email protected]>
@graytaylor0
Copy link
Member

There's still some missing test coverage in data-prepper-api

Comment on lines 83 to 91
/*
public void setEventType(String eType) {
eventType = eType;
}

public void setTimeReceived(Instant receivedTime) {
timeReceived = receivedTime;
}
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this ?

Signed-off-by: Krishna Kondaka <[email protected]>
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/
package org.opensearch.dataprepper.plugins.codec.eventjson;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Java convention for packages with spaces is to use underscore. So you could rename to:

package org.opensearch.dataprepper.plugins.codec.event_json;

outputCodec.writeEvent(event, outputStream);
outputCodec.complete(outputStream);
assertThat(outputCodec.getExtension(), equalTo(EventJsonOutputCodec.EVENT_JSON));
inputCodec.parse(new ByteArrayInputStream(outputStream.toByteArray()), record -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that this consumer is even called.

A simple change is to update the consumer to put the records into a collection. Then assert the size. Then assert the format.

List<Record> records = new LinkedList<>();
inputCodec.parse(new ByteArrayInputStream(outputStream.toByteArray()), records::add);

assertThat(records.size(), equalTo(1));
for(Record record : records) {
  Event e = record.getData();
  // All the same from here.
}

public void start(OutputStream outputStream, Event event, OutputCodecContext context) throws IOException {
Objects.requireNonNull(outputStream);
generator = factory.createGenerator(outputStream, JsonEncoding.UTF8);
generator.writeStartObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that you are starting an object. But, there is no array. So won't it look like this?

{{"data":..., "metadata"...},{"data":..., "metadata"...},{"data":..., "metadata"...}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Modified to generate an array.

outputCodec.start(outputStream, null, null);
outputCodec.writeEvent(event, outputStream);
outputCodec.complete(outputStream);
inputCodec.parse(new ByteArrayInputStream(outputStream.toByteArray()), record -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as below about the assertions here.

}

@Test
public void extendedTest() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test that operates against multiple events.

import java.util.UUID;
import java.util.stream.Collectors;

public class EventJsonInputOutputCodecTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this test is asserting symmetry between the codecs.

We should also have test suites for the individual codecs with known values to ensure that the format is consistent. That is, somebody could make a change which causes the input codec to no longer work with the same values. The current tests would pass, but we would have broken backward compatibility which could pose problems when using this to write to S3.

*/
package org.opensearch.dataprepper.plugins.codec.eventjson;

public class EventJsonOutputCodecConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keeping it for future use where we may want to control what fields should be included in the output.

public void emptyTest() throws Exception {
ByteArrayInputStream inputStream = new ByteArrayInputStream("".getBytes());
inputCodec = createInputCodec();
inputCodec.parse(inputStream, record -> { });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these different tests for isolation:

@ParameterizedTest
@ValueSource(strings = { "[]", "[{}]", "{}" })
...

inputCodec = createInputCodec();
inputCodec.parse(inputStream, record -> { });
inputStream = new ByteArrayInputStream("[]".getBytes());
inputCodec.parse(inputStream, record -> { });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do something to assert this.

Probably:

Consumer<Record<Event>> consumer = mock(Consumer.class);
inputCodec.parse(inputStream, consumer);
verifyNoInteractions(consumer);

Krishna Kondaka added 4 commits April 19, 2024 05:26
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
@@ -44,22 +44,24 @@ public String getExtension() {
public void start(OutputStream outputStream, Event event, OutputCodecContext context) throws IOException {
Objects.requireNonNull(outputStream);
generator = factory.createGenerator(outputStream, JsonEncoding.UTF8);
generator.writeStartObject();
generator.writeStartArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the writeStartObject.

This could allow us to also include metadata such as the version.

e.g.

{
  "metadata" : {"version" : "2.8"},
  "events" : [
    {"data":...,"metadata"},
    {"data":...,"metadata"},
    {"data":...,"metadata"},
  ]
}

See this code:

generator.writeStartObject();
generator.writeFieldName(config.getKeyName());
generator.writeStartArray();

Though, I don't think we need to make the events key configurable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. But the following should be enough.

{
  "version" : "2.8",
  "events" : [
    {"data":...,"metadata"},
    {"data":...,"metadata"},
    {"data":...,"metadata"},
  ]
}

if (data == null) {
return null;
}
if (overrideTimeReceived) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to run this part if override_time_received is false

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, the overrideTimeReceived flag "true" means it will override with the value from the data and "false" means it will take Instant.now()?

}

public EventJsonInputCodec createInputCodec() {
when(eventJsonInputCodecConfig.getOverrideTimeReceived()).thenReturn(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to setup and then add a test for when the override_time_received is true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as per the suggestion

Krishna Kondaka added 2 commits April 20, 2024 01:59
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
final Map<String, Object> innerJson = objectMapper.readValue(jsonParser, Map.class);
final List<Map<String, Object>> maps = (List<Map<String, Object>>)innerJson.get(EventJsonDefines.EVENTS);
final String version = (String)innerJson.get(EventJsonDefines.VERSION);
if (!version.equals(DataPrepperVersion.getCurrentVersion().toString())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need strict matching here? One thought is that we could be strict on the major version and ignore the minor version.

if (jsonParser.getCurrentToken() == JsonToken.START_OBJECT) {
final Map<String, Object> innerJson = objectMapper.readValue(jsonParser, Map.class);
final List<Map<String, Object>> maps = (List<Map<String, Object>>)innerJson.get(EventJsonDefines.EVENTS);
final String version = (String)innerJson.get(EventJsonDefines.VERSION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to move this check into a small method to help make the code more readable.

graytaylor0
graytaylor0 previously approved these changes Apr 22, 2024
Signed-off-by: Krishna Kondaka <[email protected]>
@@ -48,6 +48,22 @@ public EventJsonInputCodec(final EventJsonInputCodecConfig config) {
this.overrideTimeReceived = config.getOverrideTimeReceived();
}

private boolean isCompatibleVersion(Map<String, Object> json) {
final String versionStr = (String)json.get(EventJsonDefines.VERSION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataPrepperVersion class provides a lot of this.

final String versionStr = (String)json.get(EventJsonDefines.VERSION);
final DataPrepperVersion definedVersion = DataPrepperVersion.parse(versionStr);
if(definedVersion.getMajorVersion() != DataPrepperVersion.getCurrentVersion().getMajorVersion()) {
  LOG.error("Version mismatch! Current version {} Received data version {}", DataPrepperVersion.getCurrentVersion(), definedVersion);
            return false;
}

Krishna Kondaka added 2 commits April 22, 2024 20:32
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this and making these improvements to the feature!

@kkondaka kkondaka merged commit df17cc5 into opensearch-project:main Apr 22, 2024
72 of 74 checks passed
srikanthjg pushed a commit to srikanthjg/data-prepper that referenced this pull request Apr 24, 2024
* Event Json input and output codecs

Signed-off-by: Krishna Kondaka <[email protected]>

* Modified test case to check for event metadata attributes

Signed-off-by: Krishna Kondaka <[email protected]>

* Modified the coverage to 0.9

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Fixes for failing coverage tests

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Fixed test coverage

Signed-off-by: Krishna Kondaka <[email protected]>

* Added more tests

Signed-off-by: Krishna Kondaka <[email protected]>

* Added more tests for coverage

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Fixed code coverage failure

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

---------

Signed-off-by: Krishna Kondaka <[email protected]>
Co-authored-by: Krishna Kondaka <[email protected]>
@kkondaka kkondaka added this to the v2.8 milestone May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data Prepper Event Json Codec
4 participants