Skip to content

Releases: opensearch-project/data-prepper

2.10.2

10 Dec 21:25
7d9913b
Compare
Choose a tag to compare

2024-12-09 Version 2.10.2


Bug Fixes

  • FIX: missed exception in plugin error (#5105)

Security

  • Fix otel_logs_source server configuration for getHttpAuthenticationService. Fixes CVE-2024-55886. (#5215)

2.10.1

21 Oct 18:57
99dadc6
Compare
Choose a tag to compare

2024-10-21 Version 2.10.1


Bug Fixes

  • [BUG] Kinesis source is failing on startup (#5084)

2.10.0

15 Oct 20:09
18b2049
Compare
Choose a tag to compare

2024-10-15 Version 2.10.0


Features

  • Kafka source: support SASL/SCRAM mechanisms (#4241)
  • OpenSearch Bulk API Source (#248)
  • Support AWS Kinesis Data Streams as a Source (#1082)
  • Support OpenTelemetry logs in S3 source (#5028)

Enhancements

  • Use HTML in JsonPropertyDescription instead of Markdown (#4984)
  • Variable drain time when shutting down via shutdown API (#4966)
  • Make max connections and acquire timeout configurable on S3 sink client (#4949)
  • Support BigDecimal data type in expressions (#4817)
  • Caching implementation of EventKeyFactory (#4843)
  • Json codec changes with specific json input codec config (#5054)

Bug Fixes

  • [BUG] Close Opensearch RestHighLevelClient in OpenSearchClientRefresher on shutdown and initialization failure (#4770)

Security

Maintenance

  • Fixes and improvements for AbstractSinkTest (#5021)
  • Update the test logging to include the failed assertion (#4987)

2.9.0

28 Aug 16:17
ca5a12e
Compare
Choose a tag to compare

2024-08-28 Version 2.9.0


Features

  • Support sets and set operations in Data Prepper expressions (#3854)
  • Add startsWith expression function (#4840)
  • Support default route option for Events that match no other route (#4615)
  • Delete input for processors which expand the event (#3968)
  • Dynamic Rule Detection (#4600)
  • Kafka Source should support message headers (#4565)
  • Aggregate processor : add option to allow raw events (#4598)
  • Add support for start and end times in count and histogram aggregate actions (#4614)
  • Add an option to count unique values of specified key(s) to CountAggregateAction (#4644)
  • Flatten processor: option for keys wihout brackets (#4616)
  • Modify Key Value processor to support string literal grouping (#4599)
  • Make AWS credential management available in data-prepper-config.yaml (#2570)

Enhancements

  • Support enhanced configuration of the Kafka source and buffer loggers (#4126)
  • Update the rename_keys and delete_entries processors to use EventKey (#4636)
  • Update the mutate string processors to use the EventKey. (#4649)
  • OpenSearch Sink add support for sending pipeline parameter in BulkRequest (#4609)
  • Add support for Kafka headers and timestamp in the Kafka Source (#4566)

Bug Fixes

  • [BUG] Visibility duplication protection fails when using S3 source for large files and receiving 10 messages from SQS queue (#4812)
  • [BUG] ChangeVisibilityTimeout call failure during pipeline shutdown. (#4575)
  • [BUG] Service-map relationship should be created regardless of missing traceGroupName (#4821)
  • [BUG] Unable to create stateful processors with multiple workers. (#4660)
  • [BUG] Routes: regex doesn't work (#4763)
  • [BUG] Grok plugin CLOUDFRONT_ACCESS_LOG pattern does not compile (#4604)
  • [BUG] The user_agent processor throws exceptions with multiple threads. (#4618)
  • [BUG] DynamoDB source export converts Numbers ending in 0 to scientific notation (#3840)
  • Fix null document in DLQ object (#4814)
  • Fix KeyValue Processor value grouping bug (#4606)

Security

Maintenance

  • Removes Zookeeper from Data Prepper (#4707)
  • Tools to generate User Agent strings in the performance-test project (#4620)

2.8.1

02 Aug 00:55
6d4776d
Compare
Choose a tag to compare

2024-08-01 Version 2.8.1


Bug Fixes

  • Jackson 2.17.0 LockFreePool causes memory issues (#4729)

Maintenance

  • Updates Jackson to 2.17.2 (#4753)
  • Updates to Armeria 1.29.0 (#4741)
  • Parquet codec tests fix (#4742)

2.8.0

16 May 20:36
10b94f1
Compare
Choose a tag to compare

2024-05-16 Version 2.8.0


Features

  • Support Full load and CDC from AWS DocumentDB [#4534] (#4534)
  • Support conditional expression to evaluate based on the data type for a given field (#4478 #4523, #4500))
  • Allow using event fields in s3 sink object_key [#3310] (#3310)
  • Support ndjson with a codec [#2700] (#2700)
  • Support S3 bucket ownership validation on the S3 sink (#4468)
  • Support encoding JSON (#832 #4514)
  • Support for Event Json input and output codecs (#4436)
  • Add support for dynamic bucket and default bucket in S3 sink (#4402)
  • Add support to export/full load MongoDB/DocumentDB collection with _id field of different data type (#4503)

Enhancements

  • HTTP data chunking support for kafka buffer (#4475)
  • ENH: automatic credential refresh in kafka source (#4258)
  • Add creation and aggregation of dynamic S3 groups based on events (#4346)
  • Truncate Processor: Add support to truncate all fields in an event (#4317)
  • Provide validations of AWS accountIds (#4398)
  • Better metrics on OpenSearch document errors (#4344)
  • Better metrics for OpenSearch duplicate documents (#4343)
  • Address route and subpipeline for pipeline tranformation (#4528)
  • Add support for BigDecimal in ConvertType processor (#4316)
  • Checkpoint records at an interval for TPS case when AckSet is enabled (#4526)
  • Write stream events that timeout to write to internal buffer in separate thread (#4524)
  • Key value processor enhancements (#4521)
  • Add bucket owner support to s3 sink (#4504)
  • Initial work to support core data types in Data Prepper (#4496)
  • Changing logging level for config transformation and fixing rule (#4466)
  • Add folder-based partitioning for s3 scan source (#4455)
  • Pipeline Configuration Transformation (#4446)
  • Added support for multiple workers in S3 Scan Source (#4439)
  • Bootstrap the RuleEngine package (#4442)
  • Make s3 partition size configurable and add unit test for S3 partition creator classes (#4437)
  • Remove creating S3 prefix path partition upfront (#4432)
  • Change s3 sink client to async client (#4425)
  • Create new codec for each s3 group in s3 sink (#4410)
  • Validate the AWS account Id in the S3 source using a new annotation (#4400)
  • Add server connections metric to http and otel sources (#4393)
  • Log the User-Agent when Data Prepper shuts down from POST /shutdown (#4390)
  • Add aggregate_threshold with maximum_size to s3 sink (#4385)
  • Refactor PipelinesDataFlowModelParser to take in an InputStream instead of a file path (#4289)
  • Add support to use old ddb stream image for REMOVE events (#4275)

Bug Fixes

  • Fix count aggregation exemplar data (#4341)
  • Revert HTTP data chunking changes for kafka buffer done in PR 4266 (#4329)
  • Fix Router performance issue (#4327)
  • Do not require field_split_characters to not be empty for key_value processor (#4358)
  • Do not write empty lists of DlqObject to the DLQ (#4403)
  • Fix transient test failure for subpipelines (#4479)
  • Fix JacksonEvent to propagate ExternalOriginalTime if its set at the time of construction (#4489)
  • FIX: null certificate value should be valid in opensearch connection (#4494)
  • [BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" (#4340)
  • [BUG] Empty DLQ entries when version conflicts occur (#4301)
  • [BUG] otel sources should show a more clear exception when receiving data that cannot be processed based on the configured compression type (#4022)
  • [BUG] : unable to set field_delimiter_regex (#2946)
  • Fix aggregate processor local mode (#4529)
  • Add long as a target type for convert_entry_type processor (#4120)
  • Fix write json basic test (#4527)
  • Fix depth field in template (#4509)
  • Fix for S3PartitionCreatorScheduler ConcurrentModification Exception (#4473)
  • Fix acknowledgements in DynamoDB (#4419)
  • Fix DocumentDB source S3PathPrefix null or empty (#4472)
  • Fix an issue that exception messages are masked (#4416)
  • Fix bug where using upsert or update without routing parameter caused… (#4397)
  • Fix bug in s3 sink dynamic bucket and catch invalid bucket message (#4413)
  • Fix flaky PipelineConfigurationFileReaderTest (#4386)
  • Aggregate Processor: local mode should work when there is no when condition (#4380)

Security

Maintenance

  • Gradle 8.7 (#4417)
  • Adds a Gradle convention plugin for Maven publication (#4421)
  • MAINT: allow latest schema version if not specified in confluent schema (#4453)
  • Publish expression and logstash-configuration to Maven (#4474)
  • Create unit test report as html (#4384)
  • Update Stream Ack Manager unit test and code refactor (#4383)
  • Grpc exception handler: Modified to return BADREQUEST for some internal errors (#4387)
  • Remove unexpected event handle message (#4388)
  • Bump parquet version to 1.14.0. (#4520)
  • Clear system property to disable s3 scan when stream worker exits, set s3 sink threshold to 15 s...
Read more

2.7.0

27 Mar 15:36
5d8c59b
Compare
Choose a tag to compare

2024-03-27 Version 2.7.0


Features

  • Add a GeoIP processor. (#253, #3941, #3942)
  • Flatten json processor (#4128)
  • Add select_entries processor (#4147)
  • Decompress processor (#4016)
  • Support parsing of XML fields in Events (#4165, #4024)
  • Processor for parsing Amazon Ion documents (#3730)
  • Append values to lists in an event (#4129)
  • MapToList processor (#3935)
  • Date processor to convert from epoch_second, epoch_milli, or epoch_nano (#2929, #4076)
  • Support reading of old image for delete events on DynamoDB source (#4261)
  • Add string truncate processor to the family of mutate string processor (#3925)
  • Add join function (#4075)

Enhancements

  • Support format expressions for routing in the opensearch sink (#3833)
  • Allow . and @ characters to be part of json pointer in expressions (#4130)
  • Support maximum request length configurations in the HTTP and OTel sources (#3931)
  • Provide a config option to do node local aggregation (#4306)
  • Allow peer forwarder to skip sending events to remote peer (#3996)
  • Include encrypted data key in Kafka buffer message. (#3655)
  • Support larger message sizes in Kafka Buffer (#3916)
  • Modify S3 Source to allow multiple SQS workers (#4239)
  • Add support for tracking performance of individual Events in the grok processor (#4196)
  • Support codec on the file source to help with testing (#4018)
  • Provide a delay processor to put a delay in the processor for debugging and testing (#3938)
  • Support ByteCount in plugin parser (#3191)
  • Add Buffer Latency Metric (#4237)
  • Adds an append mode to the file sink (#3687)

Bug Fixes

  • Attempting to evaluate if a key is null throws an Exception if the value is a List for conditional expressions (#4109)
  • Data Prepper process threads stop when processors throw exceptions (#4103)
  • Upsert action requires existing document in OpenSearch (#4036)
  • Many Grok failures do not tag events (#4031)
  • Using update, upsert, or delete actions without specifying document_id crashes the pipeline with NPE (#3988)
  • OpenSearch Sink upsert action fails to create new document if it doesn't exist already (#3934)
  • DynamoDb source global state not found for export (#3579)
  • Missing Configuration details in Kafka documentation (#3157)
  • File Source fails to process large files. (#707)
  • Add key_value_when conditional to key_value processor (#4246)
  • Adds Kafka producer metrics for buffer usage (#4139)
  • Throw a more useful error when the S3 source is unable to determine bucket ownership (#4021)
  • Add sts_header_overrides to s3 dlq configuration (#3845)
  • Delay reading from the Kafka buffer as long as the circuit breaker is open (#4135)
  • Use timer for sink latency metrics (#4174)
  • Fix bug where process worker would shut down if a processor drops all events (#4262)
  • Send acknowledgements to source when events are forwarded to remote peer (#4305)
  • Injecting timestamp in index name that is not a suffix throws IllegalArgumentException (#3957)

Security

Maintenance

  • Create Kafka buffer integration tests for KMS (#3980, #4040)
  • Fixes Dependabot updates are not configured for all projects (#3301)

2.6.2

19 Feb 19:13
e6e9583
Compare
Choose a tag to compare

2024-02-19 Version 2.6.2


Enhancements

  • Add 4xx aggregate metric and shard progress metric for dynamodb source (#3913)

Bug Fixes

  • S3 Scan has potential to filter out objects with the same timestamp (#4123)
  • Kafka buffer attempts to create a topic when disabled (#4111)
  • Grok processor match requests continue after timeout (#4026)
  • Serialization error during peer-forwarding (#3981)
  • BlockingBuffer.bufferUsage metric does not include records in-flight (#3936)
  • Null Pointer Exception in Key Value Processor (#3928)
  • Incomplete route set leads to duplicates when E2E ack is enabled. (#3866)
  • Data Prepper is losing connections from S3 pool (#3809)
  • Key value processor will throw NPE if source key does not exist in the Event (#3496)
  • Exception in substitute string processor shuts down processor work but not pipeline (#2956)
  • Add 4xx aggregate metric and shard progress metric for dynamodb source (#3921)

Security

2.6.1

07 Dec 20:02
4fee8cd
Compare
Choose a tag to compare

2023-12-07 Version 2.6.1


Enhancements

  • Add aggregate metrics for ddb source export and stream (#3728)

Bug Fixes

  • Update and upsert bulk actions do not include changes from document_root_key, exclude_keys, etc. (#3745)
  • S3 source processes SQS notification when S3 folder is created (#3727)

Security

2.6.0

28 Nov 17:41
c9bcacd
Compare
Choose a tag to compare

2023-11-28 Version 2.6.0


Features

  • Support DynamoDB as a source. (#2932)
  • Use Kafka as a buffer (#3322)
  • Support dynamically changing the visibility timeout for S3 Source with SQS queue (#2485)
  • Create or update Amazon OpenSearch Serverless network policy (#3577)
  • Sink level metric for end to end latency (#3494)

Enhancements

  • Use Amazon Linux as base Docker image (#3505)
  • Allow the Kafka buffer (and others that do not require the heap) to bypass the heap circuit breaker (#3616)
  • Improve gRPC request exception logging (#3621)
  • Configure the delay in the random string source (#3601)
  • Add distribution_version flag to opensearch source (#3636)

Bug Fixes

  • Data Prepper is writing empty DLQ objects (#3644)
  • Bulk Operation Retry Strategy should print cause of error (#3504)
  • ISM index rollover actions fail because of missing setting for otel-v1-apm-span-* indices (#3506)
  • AWS opensearch source error: ElasticsearchVersionInfo.buildFlavor (#3640)
  • No permissions for writing to Amazon OpenSearch Serverless collection only shows errors after max_retries limit is reached (#3508)
  • Bulk Operation Retry Strategy should print cause of error (#3504)
  • NullPointer exception in DefaultKafkaClusterConfigSupplier get API (#3528)
  • Fix bug so global read-only items do not expire from TTL in DynamoDB source coordination store (#3703)
  • Check if failedDeleteCount is positive before logging an SQS error (#3686)
  • Docker image jre-jammy contains Berkeley DB (#3543)
  • Race condition in DefaultEventHandle (#3617)

Security

Maintenance

  • Update to the Gradle 8.x version which supports Java 21. Gradle 8.3 is supporting up to Java 20. (#3330)
  • Start building Data Prepper on Java 21 (#3329)
  • Integration tests to validate data going to OpenSearch (#3678)
  • Unit tests fail on Windows machine (#3459)
  • Fix disabled E2E ack integration tests in PipelinesWithAcksIT.java (#3472)
  • Remove the @Deprecated from Record (#3536)
  • Remove all unnecessary projects in the 2.6 branch (#3605)
  • Update end-to-end tests to run from the released Docker image (#3566)