Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default write mode to overwrite to be consistent with pandas #1209

Merged
merged 1 commit into from
Jan 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 15 additions & 12 deletions databricks/koalas/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3278,7 +3278,7 @@ def cache(self):
"""
return _CachedDataFrame(self._internal)

def to_table(self, name: str, format: Optional[str] = None, mode: str = 'error',
def to_table(self, name: str, format: Optional[str] = None, mode: str = 'overwrite',
partition_cols: Union[str, List[str], None] = None,
**options):
"""
Expand All @@ -3297,8 +3297,9 @@ def to_table(self, name: str, format: Optional[str] = None, mode: str = 'error',
- 'json'
- 'csv'

mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default 'error'.
Specifies the behavior of the save operation when the table exists already.
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default
'overwrite'. Specifies the behavior of the save operation when the table exists
already.

- 'append': Append the new data to existing data.
- 'overwrite': Overwrite existing data.
Expand Down Expand Up @@ -3333,7 +3334,7 @@ def to_table(self, name: str, format: Optional[str] = None, mode: str = 'error',
self.to_spark().write.saveAsTable(name=name, format=format, mode=mode,
partitionBy=partition_cols, **options)
Comment on lines 3334 to 3335
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh,, should this lines be self.to_spark().write.mode(mode).saveAsTable(...)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ditto for other to_xxx related to Spark IO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, do you mean for consistency? saveAsTable in Spark seems having mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made the changes back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I thought we should explicitly specify mode with write.mode(mode), but it's not needed.


def to_delta(self, path: str, mode: str = 'error',
def to_delta(self, path: str, mode: str = 'overwrite',
partition_cols: Union[str, List[str], None] = None, **options):
"""
Write the DataFrame out as a Delta Lake table.
Expand All @@ -3342,8 +3343,9 @@ def to_delta(self, path: str, mode: str = 'error',
----------
path : str, required
Path to write to.
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default 'error'.
Specifies the behavior of the save operation when the destination exists already.
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default
'overwrite'. Specifies the behavior of the save operation when the destination
exists already.

- 'append': Append the new data to existing data.
- 'overwrite': Overwrite existing data.
Expand Down Expand Up @@ -3391,7 +3393,7 @@ def to_delta(self, path: str, mode: str = 'error',
self.to_spark_io(
path=path, mode=mode, format="delta", partition_cols=partition_cols, **options)

def to_parquet(self, path: str, mode: str = 'error',
def to_parquet(self, path: str, mode: str = 'overwrite',
partition_cols: Union[str, List[str], None] = None,
compression: Optional[str] = None):
"""
Expand All @@ -3401,8 +3403,9 @@ def to_parquet(self, path: str, mode: str = 'error',
----------
path : str, required
Path to write to.
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default 'error'.
Specifies the behavior of the save operation when the destination exists already.
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'},
default 'overwrite'. Specifies the behavior of the save operation when the
destination exists already.

- 'append': Append the new data to existing data.
- 'overwrite': Overwrite existing data.
Expand Down Expand Up @@ -3445,7 +3448,7 @@ def to_parquet(self, path: str, mode: str = 'error',
path=path, mode=mode, partitionBy=partition_cols, compression=compression)

def to_spark_io(self, path: Optional[str] = None, format: Optional[str] = None,
mode: str = 'error', partition_cols: Union[str, List[str], None] = None,
mode: str = 'overwrite', partition_cols: Union[str, List[str], None] = None,
**options):
"""Write the DataFrame out to a Spark data source.

Expand All @@ -3461,8 +3464,8 @@ def to_spark_io(self, path: Optional[str] = None, format: Optional[str] = None,
- 'orc'
- 'json'
- 'csv'
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default 'error'.
Specifies the behavior of the save operation when data already.
mode : str {'append', 'overwrite', 'ignore', 'error', 'errorifexists'}, default
'overwrite'. Specifies the behavior of the save operation when data already.

- 'append': Append the new data to existing data.
- 'overwrite': Overwrite existing data.
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ JSON
:toctree: api/

read_json
DataFrame.to_json

HTML
----
Expand Down