Skip to content

Commit

Permalink
Merge branch 'main' into 438-upgrade-java-version
Browse files Browse the repository at this point in the history
  • Loading branch information
patchwork01 authored Jan 10, 2023
2 parents 21edc92 + 46f0305 commit 182cb50
Show file tree
Hide file tree
Showing 121 changed files with 1,313 additions and 276 deletions.
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,46 @@ Releases
This page documents the releases of Sleeper. Performance figures for each release
are available [here](docs/12-performance-test.md)

## Version 0.13.0

This contains the following improvements:

General code improvements:
- The various compaction related modules are now all submodules of one parent compaction module.
- Simplified the names of Cloudwatch log groups.

Standard ingest:
- Refactored standard ingest code to simplify it and make it easier to use.
- Observability of ingest jobs: it is now possible to see the status of ingest jobs (i.e. whether they are queued,
in progress, finished) and how long they took to run.
- Fixed bug where standard ingest would fail if it needed to upload a file greater than 5GB to S3. This was done
by replacing the use of put object with transfer manager.
- The minimum part size to be used for uploads is now configurable and defaults to 128MB.
- Changed the default value of `sleeper.ingest.arrow.max.local.store.bytes` from 16GB to 2GB to reduce the latency
before data is uploaded to S3.
- Various integration tests were converted to unit tests to speed up the build process.

Bulk import:
- Added new Dataframe based approach to bulk import that uses a custom partitioner so that Spark partitions the
data according to Sleeper's leaf partitions. The data is then sorted within those partitions. This avoids the
global sort required by the other Dataframe based approach, and means there is one fewer pass through the data
to be loaded. This reduced the time of a test bulk import job from 24 to 14 minutes.
- EBS storage can be configured for EMR clusters created for bulk import jobs.
- Bumped default EMR version to 6.8.0 and Spark version to 3.3.0.

Compactions:
- Compactions can now be run on Graviton Fargate containers.

Scripts:
- The script to report information about the partitions now reports more detailed information about the number of
elements in a partition and whether it needs splitting.
- System test script reports elapsed time.

Build:
- Various improvments to github actions reliability.
- Created a Docker image that can be used to deploy Sleeper. This avoids the user needing to install multiple tools
locally.

## Version 0.12.0

This contains the following improvements:
Expand Down
1 change: 1 addition & 0 deletions docs/12-performance-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,4 @@ otherwise not have been noticed.
|----------------|------------|-----------------------------|---------------------------------
| 0.11.0 | 13/06/2022 | 366000 | 160000
| 0.12.0 | 18/10/2022 | 378000 | 146600
| 0.13.0 | 06/01/2023 | 326000 | 144000
2 changes: 1 addition & 1 deletion example/basic/instance.properties
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ sleeper.region=eu-west-2

# The version of Sleeper to use. This property is used to identify the correct jars in the S3JarsBucket and to
# select the correct tag in the ECR repositories.
sleeper.version=0.13.0-SNAPSHOT
sleeper.version=0.14.0-SNAPSHOT

# The id of the VPC to deploy to.
sleeper.vpc=1234567890
Expand Down
24 changes: 22 additions & 2 deletions example/full/instance.properties
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright 2022 Crown Copyright
# Copyright 2023 Crown Copyright
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -53,7 +53,7 @@ sleeper.region=eu-west-2

# The version of Sleeper to use. This property is used to identify the correct jars in the S3JarsBucket and to
# select the correct tag in the ECR repositories.
sleeper.version=0.13.0-SNAPSHOT
sleeper.version=0.14.0-SNAPSHOT

# The id of the VPC to deploy to.
sleeper.vpc=1234567890
Expand Down Expand Up @@ -181,6 +181,26 @@ sleeper.ingest.arrow.max.local.store.bytes=2147483648
# (arrow-based ingest only) [1K]
sleeper.ingest.arrow.max.single.write.to.file.records=1024

# The implementation of the async S3 client to use for upload during ingest.
# Valid values are 'java' or 'crt'. This determines the implementation of S3AsyncClient that gets used.
# With 'java' it makes a single PutObject request for each file.
# With 'crt' it uses the AWS Common Runtime (CRT) to make multipart uploads.
# Note that the CRT option is recommended. Using the Java option may cause failures if any file is >5GB in size, and
# will lead to the following warning:
# "The provided S3AsyncClient is not an instance of S3CrtAsyncClient, and thus multipart upload/download feature is not
# enabled and resumable file upload is not supported. To benefit from maximum throughput, consider using
# S3AsyncClient.crtBuilder().build() instead."
# (async partition file writer only)
sleeper.ingest.async.client.type=crt

# The part size in bytes to use for multipart uploads.
# (CRT async ingest only) [128MB]
sleeper.ingest.async.crt.part.size.bytes=134217728

# The target throughput for multipart uploads, in GB/s. Determines how many parts should be uploaded simultaneously.
# (CRT async ingest only)
sleeper.ingest.async.crt.target.throughput.gbps=10

# The name of a bucket that contains files to be ingested via ingest jobs. This bucket should already
# exist, i.e. it will not be created as part of the cdk deployment of this instance of Sleeper. The ingest
# and bulk import stacks will be given read access to this bucket so that they can consume data from it.
Expand Down
2 changes: 1 addition & 1 deletion java/athena/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>

<modelVersion>4.0.0</modelVersion>
Expand Down
2 changes: 1 addition & 1 deletion java/build/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/bulk-import/bulk-import-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<artifactId>bulk-import</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>

<modelVersion>4.0.0</modelVersion>
Expand Down
2 changes: 1 addition & 1 deletion java/bulk-import/bulk-import-runner/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<groupId>sleeper</groupId>
<artifactId>bulk-import</artifactId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/bulk-import/bulk-import-starter/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<groupId>sleeper</groupId>
<artifactId>bulk-import</artifactId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/bulk-import/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>

<packaging>pom</packaging>
Expand Down
2 changes: 1 addition & 1 deletion java/cdk-custom-resources/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
<parent>
<groupId>sleeper</groupId>
<artifactId>aws</artifactId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>

<artifactId>cdk-custom-resources</artifactId>
Expand Down
2 changes: 1 addition & 1 deletion java/cdk-environment/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 1 addition & 1 deletion java/cdk/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
2 changes: 2 additions & 0 deletions java/cdk/src/main/java/sleeper/cdk/stack/IngestStack.java
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
import static sleeper.configuration.properties.SystemDefinedInstanceProperty.INGEST_CLUSTER;
import static sleeper.configuration.properties.SystemDefinedInstanceProperty.INGEST_JOB_DLQ_URL;
import static sleeper.configuration.properties.SystemDefinedInstanceProperty.INGEST_JOB_QUEUE_URL;
import static sleeper.configuration.properties.SystemDefinedInstanceProperty.INGEST_LAMBDA_FUNCTION;
import static sleeper.configuration.properties.SystemDefinedInstanceProperty.INGEST_TASK_DEFINITION_FAMILY;
import static sleeper.configuration.properties.UserDefinedInstanceProperty.ECR_INGEST_REPO;
import static sleeper.configuration.properties.UserDefinedInstanceProperty.ID;
Expand Down Expand Up @@ -312,6 +313,7 @@ private void lambdaToCreateIngestTasks(IBucket configBucket, Queue ingestJobQueu
.schedule(Schedule.rate(Duration.minutes(instanceProperties.getInt(INGEST_TASK_CREATION_PERIOD_IN_MINUTES))))
.targets(Collections.singletonList(new LambdaFunction(handler)))
.build();
instanceProperties.set(INGEST_LAMBDA_FUNCTION, handler.getFunctionName());
instanceProperties.set(INGEST_CLOUDWATCH_RULE, rule.getRuleName());
}

Expand Down
2 changes: 1 addition & 1 deletion java/clients/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.13.0-SNAPSHOT</version>
<version>0.14.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import org.apache.hadoop.conf.Configuration;
import sleeper.ClientUtils;
import sleeper.configuration.jars.ObjectFactory;
import sleeper.configuration.jars.ObjectFactoryException;
import sleeper.configuration.properties.InstanceProperties;
Expand All @@ -37,6 +36,7 @@
import sleeper.statestore.StateStore;
import sleeper.statestore.StateStoreException;
import sleeper.statestore.StateStoreProvider;
import sleeper.util.ClientUtils;
import sleeper.utils.HadoopConfigurationProvider;

import java.io.IOException;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.sqs.AmazonSQS;
import com.amazonaws.services.sqs.AmazonSQSClientBuilder;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.configuration.properties.table.TableProperties;
import sleeper.configuration.properties.table.TablePropertiesProvider;
Expand All @@ -34,6 +33,7 @@
import sleeper.query.tracker.TrackedQuery;
import sleeper.query.tracker.exception.QueryTrackerException;
import sleeper.statestore.StateStoreException;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@
import com.amazonaws.services.sqs.model.ReceiveMessageResult;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.core.record.Record;
import sleeper.core.record.ResultsBatch;
import sleeper.core.record.serialiser.JSONResultsBatchSerialiser;
import sleeper.core.schema.Schema;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.util.List;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,14 @@
import com.google.gson.JsonObject;
import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.configuration.properties.SystemDefinedInstanceProperty;
import sleeper.configuration.properties.table.TableProperties;
import sleeper.configuration.properties.table.TablePropertiesProvider;
import sleeper.query.model.Query;
import sleeper.query.model.QuerySerDe;
import sleeper.statestore.StateStoreException;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.net.URI;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import com.facebook.collections.ByteArray;
import org.apache.commons.codec.binary.Base64;
import org.apache.hadoop.conf.Configuration;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.configuration.properties.table.TableProperties;
import sleeper.configuration.properties.table.TablePropertiesProvider;
Expand All @@ -37,6 +36,7 @@
import sleeper.statestore.StateStore;
import sleeper.statestore.StateStoreException;
import sleeper.statestore.StateStoreProvider;
import sleeper.util.ClientUtils;

import java.io.BufferedWriter;
import java.io.FileOutputStream;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import sleeper.ClientUtils;
import sleeper.compaction.job.CompactionJobStatusStore;
import sleeper.compaction.status.store.job.DynamoDBCompactionJobStatusStore;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.console.ConsoleInput;
import sleeper.status.report.compaction.job.CompactionJobStatusReportArguments;
import sleeper.status.report.compaction.job.CompactionJobStatusReporter;
import sleeper.status.report.job.query.JobQuery;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.time.Clock;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import sleeper.ClientUtils;
import sleeper.compaction.status.store.task.DynamoDBCompactionTaskStatusStore;
import sleeper.compaction.task.CompactionTaskStatusStore;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.status.report.compaction.task.CompactionTaskQuery;
import sleeper.status.report.compaction.task.CompactionTaskStatusReportArguments;
import sleeper.status.report.compaction.task.CompactionTaskStatusReporter;
import sleeper.util.ClientUtils;

import java.io.IOException;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@
import com.amazonaws.services.sqs.model.QueueAttributeName;
import com.amazonaws.services.sqs.model.ReceiveMessageRequest;
import com.amazonaws.services.sqs.model.ReceiveMessageResult;
import sleeper.ClientUtils;
import sleeper.compaction.job.CompactionJobSerDe;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.configuration.properties.table.TablePropertiesProvider;
import sleeper.job.common.CommonJobUtils;
import sleeper.query.model.QuerySerDe;
import sleeper.splitter.SplitPartitionJobDefinitionSerDe;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.util.Map;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import org.apache.hadoop.conf.Configuration;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.configuration.properties.table.TablePropertiesProvider;
import sleeper.statestore.StateStore;
Expand All @@ -32,6 +31,7 @@
import sleeper.status.report.filestatus.FileStatusReporter;
import sleeper.status.report.filestatus.JsonFileStatusReporter;
import sleeper.status.report.filestatus.StandardFileStatusReporter;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.util.HashMap;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
import com.amazonaws.services.sqs.AmazonSQS;
import com.amazonaws.services.sqs.AmazonSQSClientBuilder;
import com.amazonaws.services.sqs.model.QueueAttributeName;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.console.ConsoleInput;
import sleeper.ingest.job.status.IngestJobStatusStore;
Expand All @@ -32,6 +31,7 @@
import sleeper.status.report.ingest.job.IngestJobStatusReportArguments;
import sleeper.status.report.ingest.job.IngestJobStatusReporter;
import sleeper.status.report.job.query.JobQuery;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.time.Clock;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.ingest.status.store.task.DynamoDBIngestTaskStatusStore;
import sleeper.ingest.task.IngestTaskStatusStore;
import sleeper.status.report.ingest.task.IngestTaskQuery;
import sleeper.status.report.ingest.task.IngestTaskStatusReportArguments;
import sleeper.status.report.ingest.task.IngestTaskStatusReporter;
import sleeper.util.ClientUtils;

import java.io.IOException;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
import com.amazonaws.services.sqs.model.SendMessageRequest;
import org.apache.commons.lang3.tuple.ImmutablePair;
import org.apache.commons.lang3.tuple.Pair;
import sleeper.ClientUtils;
import sleeper.configuration.properties.InstanceProperties;
import sleeper.util.ClientUtils;

import java.io.IOException;
import java.util.HashSet;
Expand Down
Loading

0 comments on commit 182cb50

Please sign in to comment.