Change default write mode to overwrite to be consistent with pandas #1209

HyukjinKwon · 2020-01-22T02:49:57Z

pandas always overwrites. Currently, only JSON and CSV overwrites in Koalas. This PR proposes to consistently overwrite for now.

>>> import pandas as pd
>>> pd.DataFrame({'a':[1,2,3]})
   a
0  1
1  2
2  3
>>> pd.DataFrame({'a':[1,2,3]}).to_parquet("/tmp/abc.parquet")
>>> pd.DataFrame({'a':[1,2,3]}).to_parquet("/tmp/abc.parquet")
>>> pd.DataFrame({'a':[1,2,3]}).to_parquet("/tmp/abc.parquet")

codecov-io · 2020-01-22T03:21:54Z

Codecov Report

Merging #1209 into master will decrease coverage by 1.36%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1209      +/-   ##
==========================================
- Coverage   95.18%   93.82%   -1.37%     
==========================================
  Files          35       35              
  Lines        7201     7201              
==========================================
- Hits         6854     6756      -98     
- Misses        347      445      +98

Impacted Files	Coverage Δ
databricks/koalas/frame.py	`96.96% <100%> (ø)`	⬆️
databricks/koalas/usage_logging/__init__.py	`24.32% <0%> (-72.98%)`	⬇️
databricks/koalas/usage_logging/usage_logger.py	`50% <0%> (-50%)`	⬇️
databricks/koalas/__init__.py	`78.72% <0%> (-6.39%)`	⬇️
databricks/conftest.py	`94% <0%> (-4%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 535424b...7062bb8. Read the comment docs.

ueshin

I noticed the other but related thing.

ueshin · 2020-01-22T04:27:14Z

databricks/koalas/frame.py

        self.to_spark().write.saveAsTable(name=name, format=format, mode=mode,
                                          partitionBy=partition_cols, **options)


oh,, should this lines be self.to_spark().write.mode(mode).saveAsTable(...)?

And ditto for other to_xxx related to Spark IO?

Eh, do you mean for consistency? saveAsTable in Spark seems having mode.

I just made the changes back.

nvm, I thought we should explicitly specify mode with write.mode(mode), but it's not needed.

ueshin

LGTM.

ueshin · 2020-01-22T17:55:00Z

Thanks! merging.

ueshin reviewed Jan 22, 2020

View reviewed changes

Change default write mode to overwrite to be consistent with pandas

7062bb8

HyukjinKwon force-pushed the default-mode branch from c4952e4 to 7062bb8 Compare January 22, 2020 06:27

ueshin approved these changes Jan 22, 2020

View reviewed changes

ueshin merged commit 473bee8 into databricks:master Jan 22, 2020

HyukjinKwon deleted the default-mode branch September 11, 2020 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default write mode to overwrite to be consistent with pandas #1209

Change default write mode to overwrite to be consistent with pandas #1209

HyukjinKwon commented Jan 22, 2020

codecov-io commented Jan 22, 2020 •

edited

Loading

ueshin left a comment

ueshin Jan 22, 2020

ueshin Jan 22, 2020

HyukjinKwon Jan 22, 2020

HyukjinKwon Jan 22, 2020

ueshin Jan 22, 2020

ueshin left a comment

ueshin commented Jan 22, 2020

		self.to_spark().write.saveAsTable(name=name, format=format, mode=mode,
		partitionBy=partition_cols, **options)

Change default write mode to overwrite to be consistent with pandas #1209

Change default write mode to overwrite to be consistent with pandas #1209

Conversation

HyukjinKwon commented Jan 22, 2020

codecov-io commented Jan 22, 2020 • edited Loading

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

ueshin Jan 22, 2020

Choose a reason for hiding this comment

ueshin Jan 22, 2020

Choose a reason for hiding this comment

HyukjinKwon Jan 22, 2020

Choose a reason for hiding this comment

HyukjinKwon Jan 22, 2020

Choose a reason for hiding this comment

ueshin Jan 22, 2020

Choose a reason for hiding this comment

ueshin left a comment

Choose a reason for hiding this comment

ueshin commented Jan 22, 2020

codecov-io commented Jan 22, 2020 •

edited

Loading