-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude Index columns for exposed Spark DataFrame and disallow Koalas DataFrame with no index #655
Conversation
f361f81
to
4ca989e
Compare
cf533d2
to
9740e70
Compare
9740e70
to
6f3c98e
Compare
kdf = DataFrame(pdf) | ||
return_schema = kdf._sdf.schema | ||
if len(pdf) <= limit: | ||
return kdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly just a refactoring to disallow _InternalDataFrame(sdf=sdf, index_map=index_map)
.
… DataFrame with no index
dc131cc
to
ad958b8
Compare
@@ -1915,10 +1910,27 @@ def rename(index): | |||
index_name if index_name is not None else rename(index_name))) | |||
index_map.remove(info) | |||
|
|||
new_data_columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These codes are needed because previously it removed the index when reset_index
. Now it sets the default index.
Codecov Report
@@ Coverage Diff @@
## master #655 +/- ##
==========================================
+ Coverage 93.19% 93.43% +0.23%
==========================================
Files 32 32
Lines 5294 5314 +20
==========================================
+ Hits 4934 4965 +31
+ Misses 360 349 -11
Continue to review full report at Codecov.
|
Softagram Impact Report for pull/655 (head commit: c4bfbce)⭐ Change Overview
⭐ Details of Dependency Changes
📄 Full report
Give feedback on this report to [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks Takuya!! |
Why disallow the KDFs with no index? |
Biggest reason was: #633 introduced operations on different Koalas DataFrame which is based upon index. So, if no index is set, operations cannot be performed on different Koalas DataFrames (because this feature is based upon a join with index column). Namely, previously the cases below ks.range(10) + ks.range(10) # This was an actual usecase reported
spark_df = spark.read.redshift("...")
ks.DataFrame(spark_df) + ks.DataFrame(spark_df) were disallowed because it has no index ^. Now, it has a default index - #639. Secondly, it makes much easier implementing other APIs. Currently, many APIs such as cumulative sum, max, etc. don't work when index does not exist because the implementation is dependent on index column. Lastly, since Koalas DataFrame, form now on, always sets the default index (as pandas does), we can allow those cases now, which makes more sense and closer to pandas' https://github.com/databricks/koalas/pull/655/files#diff-ebfd51978d1aa348373b958a1b27aa47R936 |
cc @rxin, BTW, this is the PR we disabled no-index case. |
This PR is a followup of and proposes two things:
So, for instance,
to_spark()
now shows:and
index_map
is not expected to be empty in Koalas DataFrame, always. It sets the default index explicitly per #639