Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write mode doesn't work with Azure storage #955

Closed
Anna050689 opened this issue Nov 23, 2022 · 3 comments · Fixed by #912
Closed

Write mode doesn't work with Azure storage #955

Anna050689 opened this issue Nov 23, 2022 · 3 comments · Fixed by #912
Labels
bug Something isn't working

Comments

@Anna050689
Copy link

Anna050689 commented Nov 23, 2022

Environment

Delta-rs version: 0.6.3

Binding: python 3.9.13

Environment:

  • Cloud provider: Azure
  • OS: Windows
  • Other:

Bug

What happened:
There is an error when I try to write the dataframe to Azure Storage with provided valid credentials - AZURE_STORAGE_CONNECTION_STRING, AZURE_STORAGE_CONTAINER_NAME, AZURE_STORAGE_BLOB_NAME.

What you expected to happen:
I expect to write dataframe to the file

How to reproduce it:

from typing import Optional
import os

from deltalake import PyDeltaTableError
import pandas as pd
from deltalake.writer import write_deltalake


def save_data(path: Optional[str], df: pd.DataFrame, storage_options):
    """
    Save data in Delta format to Azure Storage

    :param
    path:
    str which should be should be in the next format for the connection to Azure Storage: "azure://{container_name}/{blob_name}"
    """
    if df is not None:
        try:
            write_deltalake(path, df, storage_options=storage_options)
        except PyDeltaTableError as error:
            if "Missing configuration AZURE_STORAGE_ACCOUNT" in error.args[0] \
                    or "Failed to find valid credential" in error.args[0]:
                message = f"It seems that the authentication to Azure Storage was failed due to absent or " \
                          f"invalid credentials.\nThe details of the error - {error}\n" \
                          f"Please set up valid credentials in order to write the table to {path}."
                print(message)
                raise PermissionError(message)
            elif 'A socket operation was attempted to an unreachable network' in error.args[0] \
                    or 'Failed to read checkpoint content: Generic S3 error' in error.args[0]:
                message = f"It seems that the authentication to AWS S3 bucket was failed due to absent or " \
                          f"invalid credentials.\nThe details of the error - {error}\n" \
                          f"Please set up valid credentials in order to write the table to {path}."
                print(message)
                raise PermissionError(message)


if __name__ == "__main__":
    df = pd.DataFrame([['Name', 'Surname'], ['Anna', 'Girl']])
    save_data(
        "azure://databricks/deltafiles/TestDimDate",
        df,
        storage_options={
            "AZURE_STORAGE_ACCOUNT_NAME": os.getenv("AZURE_STORAGE_ACCOUNT_NAME"),
            "AZURE_STORAGE_ACCOUNT_KEY": os.getenv("AZURE_STORAGE_ACCOUNT_KEY"),
            "AZURE_STORAGE_CONNECTION_STRING": os.getenv("AZURE_STORAGE_CONNECTION_STRING")
        })

More details:
Traceback of the error:

Traceback (most recent call last):
  File "C:\Users\Hanna_Imshenetska\Projects\Project_for_selfeducation\loader\write_delta_table.py", line 39, in <module>
    save_data(
  File "C:\Users\Hanna_Imshenetska\Projects\Project_for_selfeducation\loader\write_delta_table.py", line 19, in save_data
    write_deltalake(path, df, storage_options=storage_options)
  File "C:\Users\Hanna_Imshenetska\Projects\Project_for_selfeducation\venv\lib\site-packages\deltalake\writer.py", line 168, in write_deltalake
    storage_options = dict(
TypeError: dict() got multiple values for keyword argument 'AZURE_STORAGE_ACCOUNT_NAME
@Anna050689 Anna050689 added the bug Something isn't working label Nov 23, 2022
@0xdarkman
Copy link

I do experience same error

deltalake 0.6.3

import pandas as pd
from deltalake.writer import write_deltalake


account_name = "AAA"
account_key = "BBB"


write_deltalake(
        table_or_uri="abfss://[email protected]/TABLENAME",
        data=df,
        mode="overwrite",
        storage_options = {
                "AZURE_STORAGE_ACCOUNT_NAME": account_name,
                "AZURE_STORAGE_ACCOUNT_KEY": account_key,
            },
    )
**(table._storage_options or {}), **(storage_options or {})
TypeError: type object got multiple values for keyword argument 'AZURE_STORAGE_ACCOUNT_NAME'

@0xdarkman
Copy link

0xdarkman commented Nov 24, 2022

storage_options = {
  "AZURE_STORAGE_ACCOUNT_NAME": account_name, 
  "AZURE_STORAGE_ACCOUNT_KEY": account_key,
}

table_path = "abfss://[email protected]/TABLE_NAME"
dt = DeltaTable(table_path, storage_options=storage_options)

write_deltalake(table_or_uri=dt, df=df, mode="overwrite")

@0xdarkman
Copy link

remember to do

import pyarrow as pa
tb = pa.Table.from_pandas(df, preserve_index=False)
write_deltalake(table_or_uri=dt, data=tb, mode="overwrite")

wjones127 added a commit that referenced this issue Nov 30, 2022
# Description

Adding Azure integration tests to the Python bindings.

~~Will need to rebase after we merge #893.~~

# Related Issue(s)

- fixes #955

# Documentation

<!---
Share links to useful documentation
--->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants