How to use

Before using, please give our design docs a quick read.

Binaries and dependencies

For gradle projects add: implementation group: 'com.adobe.s3fs', name: 'S3-FileSystem', version: '1.0' in your build.gradle

For maven projects:

<dependency>
<groupId>com.adobe.s3fs</groupId>
<artifact>IdS3-FileSystem</artifactId>
<version>1.0</version>
</dependency>

in your pom.xml

The S3-FileSystem library comes with most of its dependencies shaded inside a "fat" jar. There are 2 dependencies which are not shaded and should be provided by users on the classpath:

org.apache.hadoop:hadoop-common
org.apache.hadoop:hadoop-mapreduce-client-core. This must only be provided when running fsck.

External resources

DynamoDB metadata table

A DynamoDB table with the following schema needs to be created by the user:

partitionKey: path(String)
sortKey: children(String)

Configuration

All the configuration properties can be passed via the Hadoop configuration.

Most of the properties also contain a bucket discriminator. This allows the same hadoop config and process to access different buckets with different configurations.

Additionally, some properties are context aware. This allows different clients that access the same bucket to have different configurations. It is mostly useful to supply different configurations to workers (M/R tasks, Spark executors etc.) and coordinators (Spark driver, AppMaster etc.) A context is specified when a java process is started via the system property -Dfs.s3k.metastore.context.id=name. Afterwards the same property in the same hadoop configuration can be suffixed with two different context ids. For example, you can have both fs.s3k.metastore.dynamo.max.http.conn.some-bucket.sparkdriver and fs.s3k.metastore.dynamo.max.http.conn.some-bucket.sparkexecutor in the same hadoop config

Configuring the filesystem scheme

Although you can map S3-FileSystem to any scheme you wish, we recommend you use a different scheme from s3n and s3a. This is because once you write data using S3-FileSystem, you can read it back using only S3-FileSystem. On the other hand, the other filesystems that are usually mapped to the other schemes can be used interchangeably on the same data.

You can configure the scheme like this : fs.s3k.impl=com.adobe.s3fs.filesystem.HadoopFileSystemAdapter.

Configuration properties for the metadata store

Property	Description	Default	Context aware
fs.s3k.metastore.dynamo.table.<some-bucket>	Identifies the DynamoDB table used as metadata store for the S3 bucket named `some-bucket`	No default	No
fs.s3k.metastore.dynamo.suffix.count.<some-bucket>	Identifies the number of suffixes(sub-partitions) that will be used to distribute load in a directory. Please see the "Considerations on performance" section for more details	No Default	No
fs.s3k.metastore.dynamo.base.exponential.delay.<some-bucket>	Base exponential delay(in milliseconds) used by the DynamoDB client's backoff strategy	80	Yes
fs.s3k.metastore.dynamo.max.exponential.delay.<some-bucket>	Maximum exponential delay(in milliseconds) used by the DynamoDB's client backoff strategy	60000	Yes
fs.s3k.metastore.dynamo.backoff.full.jitter.<some-bucket>	Whether to use full jitter or equal jitter (true for full and false for equal jitter)	true	Yes
fs.s3k.metastore.dynamo.max.retries.<some-bucket>	Maximum retries used by the DynamoDB's client backoff strategy	50	Yes
fs.s3k.metastore.dynamo.max.http.conn.<some-bucket>	Maximum connection pool size of the DynamoDB client	50	Yes
fs.s3k.metastore.dynamo.access.<some-bucket>	AWS access key id used by the DynamoDB client. This property is optional, IAM roles can be used instead	No default	No
fs.s3k.metastore.dynamo.secret.<some-bucket>	AWS secret access key used by the DynamoDB client. This property is optional, IAM roles can be used instead	No default	No
fs.s3k.metastore.dynamo.endpoint.<some-bucket>	AWS DynamoDB endpoint. This property is optional	No default	No
fs.s3k.metastore.dynamo.signing.region.<some-bucket>	AWS DynamoDB signing region. This property is optional	No default	No

Configuration properties for the metadata operation log

Property	Description	Default	Context aware
fs.s3k.metastore.operation.log.factory.class	Configures the operation log implementation used. You can disable the operation log by setting this to `com.adobe.s3fs.metastore.api.DisabledMetadataOperationLog`. Note that while this will improve performance you will lose all the recovery information	`com.adobe.s3fs.operationlog.S3MetadataOperationLogFactory`	No
fs.s3k.operationlog.s3.bucket.<some-bucket>	The bucket in which the operation log will be written. Due to a current limitation this property should always be set to the bucket in which the data is stored	No default	no
fs.s3k.operationlog.s3.base.exponential.delay.<some-bucket>	Base exponential delay(in milliseconds) used the S3 client of the operation log	10	Yes
fs.s3k.operationlog.s3.max.exponential.delay.<some-bucket>	Maximum exponential delay(in milliseconds) used by the S3's client backoff strategy	30000	Yes
fs.s3k.operationlog.s3.backoff.full.jitter.<some-bucket>	Whether to use full jitter or equal jitter (true for full and false for equal jitter)	true	Yes
fs.s3k.operationlog.s3.max.retries.<some-bucket>	Maximum retries used by the S3's client backoff strategy	50	Yes
fs.s3k.operationlog.s3.max.http.conn.<some-bucket>	Maximum connection pool size of the S3 client	220	Yes
fs.s3k.operationlog.s3.access.<some-bucket>	AWS access key id used by the S3 client. This property is optional, IAM roles can be used instead	No default	No
fs.s3k.operationlog.s3.secret.<some-bucket>	AWS secret access key used by the S3 client. This property is optional, IAM roles can be used instead	No default	No
fs.s3k.operationlog.s3.endpoint.<some-bucket>	AWS S3 endpoint. This property is optional	No default	No
fs.s3k.operationlog.s3.signing.region.<some-bucket>	AWS S3 signing region. This property is optional	No default	No

Configuring access to the S3 data itself

For writing and reading actual data to and from S3, S3-FileSystem relies on a separate implementation of the FileSystem. This can be emrfs if you are running your workload in EMR, or it can be S3A if you are running outside EMR.

To configure the underlying filesystem you need to set fs.s3k.storage.underlying.filesystem.scheme.<some-bucket> to the scheme that the other filesystem is using.

To pass configuration to the underlying file system you can set properties in following manner: fs.s3k.storage.underlying.fs.prop.marker.actual-property.<some-bucket>=value

Configuring parallelism for directory operations (rename and delete)

Property	Description	Default	Context aware
fs.s3k.thread.pool.size.<some-bucket>	The thread pool size. Note that this property must be correlated with the HTTP connection pool sizes of the DynamoDB client and S3 operation log client	10	Yes
fs.s3k.thread.pool.max.tasks.<some-bucket>	Maximum pending operations that can be queued if the thread pool is full	50	Yes

Considerations on performance

S3 and DynamoDB warm-up

Bear in min that with newly created S3 buckets and DynamoDB tables you will have a single partition. You should slowly ramp up your workload and allow auto-scaling to happen behind the scenes.

In the case of DynamoDB you can pre-provision throughput, but we advise against it as it is not very cost-efficient.

Single directory operations performance

Because the first level children of a directory are stored under the same partition key, the operation throughput when adding/removing/renaming the first level children is limited by the throughput(3,000 RCU/s and 1,000 RCU/s) of a single partition in DynamoDB.

To improve this, you can use the fs.s3k.metastore.dynamo.suffix.count.bucket property. Essentially, this property will shard the data even further inside a directory giving you first child level operation throughput of suffix-count*3,000 RCU/s and suffix-count*1,000 WCU/s.

Note that increasing this property too much will decrease listing performance. Running S3-FileSystem at scale for Adobe we have found that a value of10 is good enough for most workloads

Root path hotspotting

Some operations(e.g. mkdir and create) of the FileSystem contract have the implied semantics that the parent path may not exist and should be created all the way up to the root of the directory tree. When running these types of operations in massively parallel context, there will be a large number of requests on the root paths of the directory tree, which will lead to the DynamoDB partitions hosting the data for those paths to potentially become overloaded.

This behaviour is mostly visible with very flat directory trees. For example: an M/R job creating temporary folders for attempts s3://bucket/job-output/task_attempt_[1..N]

At the moment, the only mitigation we have for this is to configure a very relaxed backoff strategy for the metastore's DynamoDB client. For example:

fs.s3k.metastore.dynamo.suffix.count.<some-bucket>=80
fs.s3k.metastore.dynamo.max.exponential.delay.<some-bucket>=60000
fs.s3k.metastore.dynamo.backoff.full.jitter.<some-bucket>=true
fs.s3k.metastore.dynamo.max.retries.<some-bucket>=50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage.md

Usage.md

How to use

Binaries and dependencies

External resources

DynamoDB metadata table

Configuration

Configuring the filesystem scheme

Configuration properties for the metadata store

Configuration properties for the metadata operation log

Configuring access to the S3 data itself

Configuring parallelism for directory operations (rename and delete)

Considerations on performance

S3 and DynamoDB warm-up

Single directory operations performance

Root path hotspotting

Files

Usage.md

Latest commit

History

Usage.md

File metadata and controls

How to use

Binaries and dependencies

External resources

DynamoDB metadata table

Configuration

Configuring the filesystem scheme

Configuration properties for the metadata store

Configuration properties for the metadata operation log

Configuring access to the S3 data itself

Configuring parallelism for directory operations (rename and delete)

Considerations on performance

S3 and DynamoDB warm-up

Single directory operations performance

Root path hotspotting