Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: cleanup #774

Merged
merged 3 commits into from
Aug 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ link:https://github.com/rajasekarv/vega[vega], etc. It also provides bindings to

* Local file system
* AWS S3
* Azure Data Lake Storage Gen 2 (link:docs/ADLSGen2-HOWTO.md[HOW-TO])
* Azure Blob Storage / Azure Datalake Storage Gen2
* Google Cloud Storage

.Support features
Expand Down
3 changes: 0 additions & 3 deletions TODO

This file was deleted.

46 changes: 0 additions & 46 deletions build/setup_localstack.sh

This file was deleted.

14 changes: 0 additions & 14 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,3 @@ services:
image: mcr.microsoft.com/azure-storage/azurite
ports:
- 10000:10000

# setup-localstack:
# image: localstack/localstack:0.14.4
# depends_on:
# - localstack
# entrypoint: "/bin/bash"
# command:
# - /setup_localstack.sh
# volumes:
# - "./build/setup_localstack.sh:/setup_localstack.sh"
# - "./rust/tests/data/golden:/data/golden"
# - "./rust/tests/data/simple_table:/data/simple_table"
# - "./rust/tests/data/simple_commit:/data/simple_commit"
# - "./rust/tests/data/concurrent_workers:/data/concurrent_workers"
47 changes: 0 additions & 47 deletions docs/ADLSGen2-HOWTO.md

This file was deleted.

22 changes: 21 additions & 1 deletion python/docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,31 @@ To load the current version, use the constructor:
>>> dt = DeltaTable("../rust/tests/data/delta-0.2.0")

Depending on your storage backend, you could use the ``storage_options`` parameter to provide some configuration.
Currently only AWS S3 is supported.
Configuration is defined for specific backends - `s3 options`_, `azure options`_.

.. code-block:: python

>>> storage_options = {"AWS_ACCESS_KEY_ID": "THE_AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY":"THE_AWS_SECRET_ACCESS_KEY"}
>>> dt = DeltaTable("../rust/tests/data/delta-0.2.0", storage_options=storage_options)

The configuration can also be provided via the environment, and the basic service provider is derived from the URL
being used. We try to support many of the well-known formats to identify basic service properties.

__S3__:

* s3://<bucket>/<path>
* s3a://<bucket>/<path>

__Azure__:

* az://<container>/<path>
* adl://<container>/<path>
* abfs://<container>/<path>

__GCS__:

* gs://<bucket>/<path>

Alternatively, if you have a data catalog you can load it by reference to a
database and table name. Currently only AWS Glue is supported.

Expand All @@ -61,6 +79,8 @@ Besides local filesystems, the following backends are supported:
* Google Cloud Storage, detected by the prefix ``gs://``.

.. _`specific instructions`: https://github.com/delta-io/delta-rs/blob/main/docs/ADLSGen2-HOWTO.md
.. _`s3 options`: https://github.com/delta-io/delta-rs/blob/17999d24a58fb4c98c6280b9e57842c346b4603a/rust/src/builder.rs#L423-L491
.. _`azure options`: https://github.com/delta-io/delta-rs/blob/17999d24a58fb4c98c6280b9e57842c346b4603a/rust/src/builder.rs#L524-L539


Time Travel
Expand Down
94 changes: 82 additions & 12 deletions rust/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,22 @@
use crate::delta::{DeltaTable, DeltaTableError};
use crate::schema::DeltaDataTypeVersion;
use crate::storage::file::FileStorageBackend;
#[cfg(any(feature = "s3", feature = "s3-rustls"))]
use crate::storage::s3::{S3StorageBackend, S3StorageOptions};
use crate::storage::DeltaObjectStore;
use chrono::{DateTime, FixedOffset, Utc};
use object_store::path::Path;
use object_store::{DynObjectStore, Error as ObjectStoreError, Result as ObjectStoreResult};
use std::collections::HashMap;
use std::sync::Arc;
use url::Url;

#[cfg(any(feature = "s3", feature = "s3-rustls"))]
use crate::storage::s3::{S3StorageBackend, S3StorageOptions};
#[cfg(any(feature = "s3", feature = "s3-rustls"))]
use object_store::aws::AmazonS3Builder;
#[cfg(feature = "azure")]
use object_store::azure::MicrosoftAzureBuilder;
#[cfg(feature = "gcs")]
use object_store::gcp::GoogleCloudStorageBuilder;
use object_store::path::Path;
use object_store::{DynObjectStore, Error as ObjectStoreError, Result as ObjectStoreResult};
use std::collections::HashMap;
use std::sync::Arc;
use url::Url;

/// possible version specifications for loading a delta table
#[derive(Debug, Clone, PartialEq, Eq)]
Expand Down Expand Up @@ -280,9 +281,32 @@ impl StorageUrl {
///
/// # Well-known formats
///
/// The lists below enumerates some well known uris, that are understood by the
/// parse function. We parse uris to refer to a specific storage location, which
/// is accessed using the internal storage backends.
///
/// ## Azure
///
/// URIs according to <https://github.com/fsspec/adlfs#filesystem-interface-to-azure-datalake-gen1-and-gen2-storage>:
///
/// * az://<container>/<path>
/// * adl://<container>/<path>
/// * abfs(s)://<container>/<path>
///
/// URIs according to <https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri>:
///
/// * abfs(s)://<file_system>@<account_name>.dfs.core.windows.net/<path>
///
/// and a custom one
///
/// * azure://<container>/<path>
///
/// ## S3
/// * s3://<bucket>/<path>
/// * s3a://<bucket>/<path>
///
/// ## GCS
/// * gs://<bucket>/<path>
pub fn parse(s: impl AsRef<str>) -> ObjectStoreResult<Self> {
let s = s.as_ref();

Expand Down Expand Up @@ -329,6 +353,11 @@ impl StorageUrl {
self.url.scheme()
}

/// Returns the URL host
pub fn host(&self) -> Option<&str> {
self.url.host_str()
}

/// Returns this [`StorageUrl`] as a string
pub fn as_str(&self) -> &str {
self.as_ref()
Expand All @@ -338,8 +367,7 @@ impl StorageUrl {
pub fn service_type(&self) -> StorageService {
match self.url.scheme() {
"file" => StorageService::Local,
"az" | "abfs" | "abfss" | "adls2" | "azure" | "wasb" => StorageService::Azure,
// TODO is s3a permissible?
"az" | "abfs" | "abfss" | "adls2" | "azure" | "wasb" | "adl" => StorageService::Azure,
"s3" | "s3a" => StorageService::S3,
"gs" => StorageService::GCS,
_ => StorageService::Unknown,
Expand Down Expand Up @@ -395,11 +423,53 @@ fn get_storage_backend(
}
#[cfg(feature = "azure")]
StorageService::Azure => {
let url: &Url = storage_url.as_ref();
// TODO we have to differentiate ...
let container_name = url.host_str().ok_or(ObjectStoreError::NotImplemented)?;
let (container_name, url_account) = match storage_url.scheme() {
"az" | "adl" | "azure" => {
let container = storage_url.host().ok_or(ObjectStoreError::NotImplemented)?;
(container.to_owned(), None)
}
"adls2" => {
log::warn!("Support for the 'adls2' scheme is deprecated and will be removed in a future version. Use `az://<container>/<path>` instead.");
let account = storage_url.host().ok_or(ObjectStoreError::NotImplemented)?;
let container = storage_url
.prefix
.parts()
.next()
.ok_or(ObjectStoreError::NotImplemented)?
.to_owned();
(container.as_ref().to_string(), Some(account))
}
"abfs" | "abfss" => {
// abfs(s) might refer to the fsspec convention abfs://<container>/<path>
// or the convention for the hadoop driver abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>
let url: &Url = storage_url.as_ref();
if url.username().is_empty() {
(
url.host_str()
.ok_or(ObjectStoreError::NotImplemented)?
.to_string(),
None,
)
} else {
let parts: Vec<&str> = url
.host_str()
.ok_or(ObjectStoreError::NotImplemented)?
.splitn(2, '.')
.collect();
if parts.len() != 2 {
Err(ObjectStoreError::NotImplemented)
} else {
Ok((url.username().to_owned(), Some(parts[0])))
}?
}
}
_ => todo!(),
};
let mut builder = get_azure_builder_from_options(options.unwrap_or_default())
.with_container_name(container_name);
if let Some(account) = url_account {
builder = builder.with_account(account);
}
if let Some(allow) = allow_http {
builder = builder.with_allow_http(allow);
}
Expand Down
70 changes: 0 additions & 70 deletions rust/src/delta_dataframe.rs

This file was deleted.

Loading