-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
types use new for generics. no metaprogramming #4016
Conversation
python/src/iceberg/types.py
Outdated
|
||
_implemented = {} # type: ignore | ||
|
||
def __new__(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it is better to move this logic into a Singleton class and any class can extend it to be a singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think the benefits of that might be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible reuse later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably continue to use Singleton so that you don't have to call object.__new__
and can use super().__new__
instead. That seems safer to me for some reason.
python/src/iceberg/types.py
Outdated
if cls in cls._implemented: | ||
return cls._implemented[cls] | ||
else: | ||
ret = object.__new__(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep a set here? Seems just _instance = None
should be OK, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a dictionary, but I put it there so that get
would work otherwise I have to first check for None
right?
python/src/iceberg/types.py
Outdated
doc: Optional[str] = None, | ||
): | ||
key = is_optional, field_id, name, field_type, doc | ||
cls._implemented[key] = cls._implemented.get(key, object.__new__(cls)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, will it always create a new object, object.__new__(cls)
and pass it to get method call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right. Unnecessarily inefficient. changed it now to use or
instead
This is looking good, but I just committed @jun-he's change that adds eq since it was simple and unblocks him. Can you rebase and merge in those changes? That has a Singleton class that can be used in a couple places. |
add missing docstrings; comments in __new__ icebergtype back to type; new on all classes; manual str, repr remove instantiating a new object each check in __new__ implemented to instances Type back to IcebergType add whitespace before/after Example in docstrings
I took a look at this test failure and it looks like it might be transient. I suspect rerunning the 3.8 test should succeed. That being said I noticed that |
|
||
def __new__(cls, precision: int, scale: int): | ||
key = precision, scale | ||
cls._instances[key] = cls._instances.get(key) or object.__new__(cls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to pass precision and scale in for __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not in __new__
's semantics to deal with initialization. It could be done but I think that's bad practice and it's left to __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand. So Python is still going to call __init__
after this method on the object that this returns?
Is it a concern that __init__
is called every time this is returned?
python/src/iceberg/types.py
Outdated
|
||
class NestedField(IcebergType): | ||
"""equivalent of `NestedField` type from Java implementation""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc strings should use normal sentence case, so this should start with a capital letter.
Also, I don't think that we want to refer to the Java implementation. This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked.
python/src/iceberg/types.py
Outdated
|
||
class StructType(Type): | ||
def __init__(self, fields: list): | ||
def __new__(cls, fields: List[NestedField]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to change this to *fields
instead of List[NestedField]
? That would allow a bit more natural syntax:
s: StructType = StructType(NestedField(True, 1, "col", StringType()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure is. looks much better this way. all set now
python/src/iceberg/types.py
Outdated
|
||
class NestedField(IcebergType): | ||
"""This represents a field of a struct, a map key, a map value, or a list element. This is where field IDs, names, docs, and nullability are tracked.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc string should be wrapped at 130 characters, right?
python/src/iceberg/types.py
Outdated
|
||
Example: | ||
>>> MapType(key_id=1, key_type=StringType(), value_id=2, value_type=IntegerType(), value_is_optional=True) | ||
MapType(key=NestedField(is_optional=False, field_id=1, name='key', field_type=StringType(), doc=None), value=NestedField(is_optional=True, field_id=2, name='value', field_type=IntegerType(), doc=None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this also needs to be wrapped to the line length.
python/src/iceberg/types.py
Outdated
): | ||
super().__init__( | ||
f"list<{element_type}>", | ||
f"ListType(element_is_optional={element_is_optional}, element_id={element_id}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It would be nice if the argument order were consistent. This is slightly misleading because element_is_optional
is last if you're passing by position.
|
||
_instances: Dict[Tuple[bool, int, str, IcebergType, Optional[str]], "NestedField"] = {} | ||
|
||
def __new__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing this out in ipython, I missed a couple args and got this error message:
__new__() missing 2 required positional arguments: 'name' and 'field_type'
Is there a way to improve that? Or is that helpful enough?
Thanks, @CircArgs! Just a couple minor fixes to go and then we can get this in. The only semi-major thing is my question about |
Hey @rdblue the |
Thanks a lot @rdblue. I've pulled in your commits here. Sorry I wasn't quite on the same page with what you were thinking, makes sense with the flag in your changes |
This PR is in contrast to #3981 which sought to use a pattern similar to the python typings e.g.
List, Dict, Union, etc
to create types that could be used inside iceberg and also server as types that could be statically checked when used to type code.After discussions with @samredai and @rdblue I've revised it further so there is no real metaprogramming yet we still get much of the value.
The same syntax as the current code is used to create types:
IntegerType(), StructType( [ NestedField(True, 1, "required_field", StringType()), NestedField(False, 2, "optional_field", IntegerType()), ] )
yet we get
==
for free (no dedicated__eq__
methods) and can useisinstance
to check types instead ofissubclass
as was the case in #3981.Take this example:
This
types.py
is about 100 lines less code than the current one with greater functionality as describedThe centerpiece of this PR is simply the
__new__
method on the baseIcebergType
which checks the attribute_implemented: Dict[Tuple[str, Tuple[Any]], "IcebergType"]
which you can see keeps track ofIcebergType
instances by storing keys to the type's name and attributes (as defined it the init)Thank you to @samredai and @rdblue for helping to inspire this change.
Note: if this PR is accepted #3981 should be closed