Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ExtensionDtypes as type arguments. #2106

Merged
merged 12 commits into from
Mar 25, 2021

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Mar 18, 2021

Support ExtensionDtypes as type arguments by reusing NameTypeHolder for DataFrame's type annotation.
Also support to infer Spark DataType from the return type annotation with ExtensionDtypes.

Before:

>>> ks.Series[pd.Int32Dtype()]
Traceback (most recent call last):
...
TypeError: Parameters to generic types must be types. Got Int32Dtype().

After:

>>> ks.Series[pd.Int32Dtype()]
databricks.koalas.typedef.typehints.SeriesType[databricks.koalas.series.NameType]
>>> def a() -> ks.Series[pd.Int32Dtype()]:
...   pass
...
>>> infer_return_type(a)
SeriesType[IntegerType]

@codecov-io
Copy link

codecov-io commented Mar 18, 2021

Codecov Report

Merging #2106 (e682727) into master (f4408e3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2106   +/-   ##
=======================================
  Coverage   95.36%   95.37%           
=======================================
  Files          60       60           
  Lines       13549    13573   +24     
=======================================
+ Hits        12921    12945   +24     
  Misses        628      628           
Impacted Files Coverage Δ
databricks/koalas/frame.py 96.55% <100.00%> (+0.01%) ⬆️
databricks/koalas/series.py 96.92% <100.00%> (+0.05%) ⬆️
databricks/koalas/typedef/typehints.py 94.53% <100.00%> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4408e3...e682727. Read the comment docs.

@ueshin ueshin requested a review from HyukjinKwon March 18, 2021 21:01
@ueshin ueshin marked this pull request as ready for review March 18, 2021 21:01
self.tpe = types.StructType(
[
types.StructField(n if n is not None else ("c%s" % i), t)
types.StructField(name_like_string(n) if n is not None else ("c%s" % i), t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

new_params.append(new_class)
else:
new_params.append(param.type if isinstance(param, np.dtype) else param)
return Tuple[tuple(new_params)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by the function name at the first sight. It might be helpful to add a docstring or simply annotate the return type.

@xinrong-meng
Copy link
Contributor

The PR looks great to me. Thanks!

@ueshin
Copy link
Collaborator Author

ueshin commented Mar 25, 2021

Thanks! Let me merge this now. Please feel free to leave comments if any. cc @HyukjinKwon @itholic

@ueshin ueshin merged commit 0d2cef0 into databricks:master Mar 25, 2021
@ueshin ueshin deleted the infer_return_type branch March 25, 2021 20:13
@@ -334,6 +334,12 @@


def _create_tuple_for_frame_type(params):
"""
This is a workaround to support variadic generic in DataFrame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's clear, thanks!

@HyukjinKwon
Copy link
Member

👍

@itholic
Copy link
Contributor

itholic commented Mar 31, 2021

Looks all clear to me. Thanks for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants