diff --git a/rfcs/20200211-tf-types.md b/rfcs/20200211-tf-types.md new file mode 100644 index 000000000..ba027eeb2 --- /dev/null +++ b/rfcs/20200211-tf-types.md @@ -0,0 +1,343 @@ +# TensorFlow Canonical Type System + +| Status | Accepted | +:-------------- |:---------------------------------------------------- | +| **RFC #** | [208](https://github.com/tensorflow/community/pull/208) +| **Author(s)** | Dan Moldovan (mdan@google.com) | +| **Sponsor** | Gaurav Jain (gjn@google.com) | +| **Updated** | 2020-03-21 | + +## Objective + +This RFC proposes a new TensorFlow module and namespace (`tf.types`) dedicated to storing implementation-free type definitions, similar to Java interfaces or C++ forward declarations. This module has no other dependencies inside TensorFlow, so any other internal module can depend on it to ensure interoperability without the risk of creating circular dependencies. These definitions can also be used by external users, for example in pytype annotations. +The RFC focuses on the Python API, however the design should be reviewed with cross-language consistency in mind. + +## Motivation + +**Interoperability and composability**. A set of standard types that formalize an interface and decouples it from implementation ensures composability between components, especially when multiple implementations are involved. + +**Supports the [acyclic dependencies principle](https://en.wikipedia.org/wiki/Acyclic_dependencies_principle)**. In many instances, circular dependencies are caused between low-level complex components that need to compose (in this [example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/operators/control_flow.py#L361), AutoGraph needs to recognize datasets, and datasets need to use AutoGraph). Interface extraction is a common pattern for breaking such cycles. + +**Supports pytype**. A set of static types that is consistent under Python’s `isinstance`/`issubclass` is required to support [PEP-484 type annotations](https://www.python.org/dev/peps/pep-0484/) in TensorFlow. This module can serve as the basis for that. + +**Helps formalize requirements for new APIs**. Having a formal, implementation-independent definition for things such as tensors, variables, iterables, iterators makes it easy to document and test compatibility between APIs. + +## User Benefit + +Application developers may use these canonical definitions for pytype annotations. + +Library developers can more easily define their API interfaces by referring to this namespace. + +Developers of modules internal to TensorFlow can use this module to avoid creating circular dependencies. + +## Design Proposal + +### The `tf.types` Namespace / Module + +All the declarations exposed under the `tf.types` namespace reside in the `python/types/*.py` module. These are [abstract base classes](https://docs.python.org/3.7/library/abc.html) with a bare minimum of method definitions and minimal or no implementation, which serve to formalize and document the contract of common types such as `Tensor`, `Variable`, etc. + +These definitions may be used as PEP 484 type hints, although in some cases they may be type- or shape- erased (for example, `tf.types.Tensor` may not necessarily be parametrized by `dtype` or `shape`). Note however that designs which parametrize on shape do exist, see for instance [tensorflow#31579](https://github.com/tensorflow/tensorflow/issues/31579). + +The type definitions are consistent with standard Python [subtyping mechanics](https://docs.python.org/3.8/library/typing.html#nominal-vs-structural-subtyping) such as instance checks or protocols (in versions prior to Python 3.8, it is difficult to simultaneously support both). + +### General Principles + +This module should not contain any implementation code. An advantage of that is that users exploring the implementation of specific types will not need to inspect this module. However, users who do not wish to inspect the code may visit the documentation of these generic types to better understand specifically what are the concrete subclasses of this type expected to do. + +The `tf.types` module may depend on external packages (such as `numpy`) _strictly for the purpose of defining type annotations and documentation_. No dependencies to other TensorFlow interfaces are allowed. Any dependencies on external packages which themselves depend on TensorFlow are expressly forbidden. + +Changes to definitions inside `tf.types` must be approved by TensorFlow leads, and typically should be accompanied by an RFC. + +All type declarations are based on PEP-484 and related specifications, and defined using [typing](https://docs.python.org/3/library/typing.html), with the aim of being compatible with static type checkers like [pytype](https://github.com/google/pytype), [mypy](http://mypy-lang.org/), [pyre](https://pyre-check.org/). + +It is recommended that internal and external type annotations, `isinstance` and `issubclass` checks use these types, eventually deprecating helpers like `tf.is_tensor`. However, concrete types continue to exist - for example, variables are instances of `tf.Variable`, which is now a subclass of `tf.types.Variable`. + +Class type definitions define a minimum of abstract methods and properties which are required for pytype compatibility. + +### Custom `Tensor` types and `tf.register_tensor_conversion_function` + +Custom objects can be used in standard TF operations using [tf.register_tensor_conversion_function](https://www.tensorflow.org/api_docs/python/tf/register_tensor_conversion_function). This dependency injection mechanism allows implicit casting from existing types such as list and tuple without modifying the type definition of these objects. + +``` +>>> class MyClass: +... pass +>>> def conversion_func(value, dtype=None, name=None, as_ref=False): +... return tf.constant(1) +>>> tf.register_tensor_conversion_function(MyClass, conversion_func) +>>> obj = MyClass() +>>> tf.convert_to_tensor(obj) + +``` + +However, `register_tensor_conversion_function` is not compatible with static type checking. + +As a side note, types which that can be converted to a [NumPy array](https://docs.scipy.org/doc/numpy/user/basics.dispatch.html#basics-dispatch) can leverage that mechanism instead, because TensorFlow supports implicit conversion from `ndarray`: + +``` +>>> class MyClass: +... def __array__(self): +... return np.array(1) +>>> obj = MyClass() +>>> tf.convert_to_tensor(obj) + +``` + +For true custom `Tensor` objects, we propose a protocol approach similar to NumPy’s, as alternative to `register_tensor_conversion_function`: + +``` +>>> class MyClass: +... def __tf_tensor__(self): +... return tf.constant(1) +>>> obj = MyClass() +>>> tf.convert_to_tensor(obj) + +``` + +Note that the mechanism above can be made compatible with static type checks using [typing.Protocol](https://www.python.org/dev/peps/pep-0544/#defining-a-protocol): + +``` +>>> class SupportsTensor(Protocol): +... def __tf_tensor__(self): +... pass +>>> def f(x: SupportsTensor): +... pass +>>> obj = MyClass() +>>> f(obj) # passes static type checks +``` + +Ultimately, `TensorLike` is to become a union of the protocol type along with any other types supported through legacy mechanisms: + +``` +TensorLike = Union[List, Tuple, tf.Tensor, SupportsTensor, ...] +``` + +The `Protocol` type is only standardized in Python 3.8. Backports exist through [typing_extensions](https://github.com/python/typing/tree/master/typing_extensions), although they still don’t support Python 3.5. Therefore, typing annotations will only be supported in 3.6+, and complete support is only available in 3.8+. + +Note that `Protocol` subtypes require [@runtime_checkable](https://www.python.org/dev/peps/pep-0544/#runtime-checkable-decorator-and-narrowing-types-by-isinstance) in order to be compatible with `isinstance`. However, that degrades the performance of `isinstance` in a way similar to `abc.ABCMeta`. For that reason, TensorFlow internal logic is encouraged to use the the more direct `hasattr` test for structural type checks of this kind. + +Although this RFC proposes the deprecation of `register_tensor_conversion_function`, it does not set a timeline for removing it. It remains an open question whether interim support for type annotations should be added to `register_tensor_conversion_function`. + +### Support for `tf.function`'s `input_signature` + +Note: this section is non-normative, and only establishes direction for future work. + +The type system listed here can be expanded to allow input signatures using type annotations, see for instance [this thread](https://github.com/tensorflow/tensorflow/issues/31579). + +Presently, the [input_signature](https://www.tensorflow.org/api_docs/python/tf/function) optional mechanism uses [tf.TensorSpec](https://www.tensorflow.org/api_docs/python/tf/TensorSpec) to describe the function arguments: + +``` +>>> @function(input_signature=[TensorSpec([3], dtype=int32)]) +... def f(x): +... tf.print(x) +>>> f(constant([1, 2, 3])) +[1 2 3] +>>> f(constant([1, 2])) # Shape mismatch +ValueError: Python inputs incompatible with input_signature +>>> f(constant([1.0, 2.0, 3.0])) # DType mismatch +ValueError: Python inputs incompatible with input_signature +``` + +It is expected however that some or all of this information will be repeated by the function's type annotations. Type annotations may be generic, for example by only specifying a dtype: + +``` +>>> @function(input_signature=[TensorSpec([3], dtype=int32)]) +... def f(x: Tensor[int32]): +... ... +``` + +In such cases, `tf.function` is expected to verify that the type annotation matches the `input_signature`. + +In the long term, this RFC recommends that type annotations fully replace the `input_signature`, so far as the Python type annotation system allows it. This RFC does not prescribe a scheme for such type annotations; the implementation should make best use of the available Python capabilities and standards. + +An example of such annotations could be: + +``` +>>> class BatchSize(Dimension): +... value = 32 +>>> @function +... def f(x: Tensor[Int32, Shape2D[BatchSize, DynamicSize, DynamicSize]]): +... ... +``` + +Internally, such type annotations should still be represented as `tf.TypeSpec` objects, ensuring backward compatbility. + +### Initial Type Hierarchy + +TensorFlow generally adopts an incremental development method. This RFC aims to remain consistent with that. + +Below are listed the major types presently used in TensorFlow. All types included in this list are subject to [normal compatibility rules](https://www.tensorflow.org/guide/versions), so they are unlikely to change in the future. It is therefore preferable to maintain a strict minimum of orthogonal declarations and carefully vet any additions. + +Most of these symbols will not be initially exported as public symbols. Only internal submodules will be able to use unexported types. The unexported types may be gradually exposed under `tf.types` or under `tf.types.experimental`. + +The initial type hierarchy is focused on V2 symbols. We expect to encounter places where these symbols would not be compatible with V1 code; in such cases, the V1 symbols will not be affected. + +#### Types created by this RFC + +These types will be added with the initial creation of the `tf.types` namespace. + +* Core tensor types + + * `DType` + * `Shape` + * `Tensor` - generic dense tensor + * `TensorLike` - any type that can be implicitly converted to `Tensor` (see for example https://github.com/tensorflow/addons/blob/master/tensorflow_addons/utils/types.py) + +#### Potential types for subsequent implementation + +These types are raised for discussion by this RFC, but are not part of the original implementation, unless they are strictly required for consistency (to be determined during the initial submission). + +Many of these are expected to be required when breaking the cyclic dependencies that currently exist between submodules. However, it is hoped that opening them up for discussion early can help create a more coherent type system. + +* Core types + + * Tensor specializations + * `Symbol` - the regular graph tensor + * `Value` - eager tensors + * `Variable` + +* Container types + + * `Composite` - low-level static structure (opaque to GraphDef/IR) + * `Module` - builder for structures of `Variables` (invisible to GraphDef/IR) + * `Optional` - basic programming construct, currently prototyped in `tf.data.experimental.Optional`; unlike `typing.Optional`, it doesn't include `None` + * `List` - superclass for `TensorArray`, `Queue`, etc. (opaque to GraphDef/IR) + +* Higher-level types + * `Dataset` - ETL pipeline + * `Iterator` - basic stateful programming construct + * `Iterable` - basic stateless programming construct + * `Function` - basic programming construct + * `Error` - superclass of all TF-specific errors + + * Distributed types + * `DistributedDataset` - collective ETL + * `DistributedIterator` - collective iterator + + * Low-level execution primitives + * `Graph` - GraphDef/IR program + * `FunctionGraph` - IR of a single concrete function + +#### Adding new types + +This module may contain public symbols, exported using `@tf_export`, and private (unexported) symbols. Private symbols are reserved exclusively for internal submodules. Only public types are subject to the normal compatibility guarantees. + +Private types should only be added here if more than one submodule requires them. + +Public types represent established, well-known types that are critical to the TensorFlow API. They may only be added with API owners approval. In general, a type should be thoroughly scrutinized before being made public. Prefer to err on the side of keeping it private, when in doubt. Ideally, new public types should be introduced using the RFC process. Using `experimental` to pilot new types before a complete implementation is encouraged. + +A good candidate for a public `tf.types` definition meets the following criteria: + * has at least two concrete implementations, and at least one is part of the core TensorFlow API + * represents a well-established programming abstraction (e.g. list, map, object) + * does not overlap with existing definitions + * is consistent with existing definitions + * is compatible with all applicable core APIs + +#### Detailed notes + +##### Optional + +`tf.types.Optional` is the [nullable type](https://en.wikipedia.org/wiki/Nullable_type) in TensorFlow. + +Example graph code: + +``` + >>> ds = tf.data.Dataset.range(3) + >>> itr = iter(ds) + >>> opt_next = tf.data.experimental.get_next_as_optional(itr) + >>> tf.print(opt_next.has_value(), opt_next.get_value()) + 1 0 +``` + +It is not fully equivalent with `typing.Optional`, as TensorFlow has no explicit type or value for `None`. + +### Alternatives Considered + +* N/A + +### Performance Implications + +* There is a potential performance concern if using `abc` for the abstract base types, which are about an order of magnitude slower for `isinstance` checks. The cost of `isinstance` may be non-negligible for eager execution or scaling to large graphs. In such cases, we may want to avoid using `abc`. See https://github.com/tensorflow/community/pull/208#discussion_r382494902. + +### Dependencies + +* None, by definition. + +### Engineering Impact + +* Engineering impact: Separate interfaces allow for faster loading times by reducing coupling between modules. +* Maintenance: Minimal maintenance overhead since there is no functionality involved. The TensorFlow team and contributors will maintain the documentation up to date. Changes should be reviewed and approved by the TensorFlow team leads. + +### Platforms and Environments + +* Platforms: Python only, in the first stage. However, the type system should be aligned as much as possible with the core types in the TensorFlow runtime, and be language-independent as much as possible. +* Execution environments: The type system is independent of platform. This also implies that no platform-specific types (such as `TPUTensor`) exist. + +### Best Practices + +* This set of type definitions support the acyclic dependencies principle, by requiring that implementations avoid lateral dependencies (e.g. with a linter rule). + +### Tutorials and Examples + +* As the design matures, we plan to showcase libraries that leverage this pattern. +* Type annotations will be included in existing tutorials as definitions become final. + +### Compatibility + +* New minor version. Existing classes (`tf.Tensor`) will become subclasses of the new type interfaces. +* Most subcomponents of TF (Lite, distributed, function, SavedModel) will depend on this new module, although their functionality is not impacted. +* Libraries which depend on TensorFlow are encouraged to refer to `tf.types` definitions, rather than the concrete implementations for better future compatibility. + +### User Impact + +* Users will see a new `tf.types` module, that may be referenced from documentation and type annotations. + +## Design Review Notes + +Changes since initial PR: + * New section discusses overlap of type system with existing register tensor conversion function: + * TLDR: The python protocol mechanism is much more compatible with type annotations compared to register_tensor_conversion_function. + * Python Protocol works with type annotations. + * We should be able to deprecate `register_tensor_conversion_function`. + * Protocol support for type annotations introduced with typing in 3.8, backported to 3.x except 3.5, so type annotations will not work correctly in 3.5. + * New section discussing relationship with TensorSpec/TypeSpec/input signatures: + * TLDR: Will continue to have input_signature in conjunction with function type annotations. A complete design for annotation-based input_signature is future work and currently out of scope. + * We will need to verify that input_signarture agrees with type annotations when both are present. + * There is concern about having two ways of expressing signatures (type annotations and input_signature). + * There were many questions around the best way to represent known and unknown shapes, interaction with concrete functions, the concrete type of the result of tf.function, etc. A few points raised: + * `input_signature` cannot be recognized as type annotation. + * Supporting parameterized types with values as required for shapes is tricky. + * Dimension values can be known or not known in today’s implementation; `Literal` may be able specify dimension sizes, but that needs to be verified. + * Supporting meta functions where signature is an argument still requires the input signature mechanism (e.g. get_concrete_function). + * If we add this to `tf.function`, should the function take a signature argument that looks like the type annotation? Unclear whether that’s possible or makes sense. + * Will need to support both at the same time at least for a while. + * If both are present then they will likely need to be merged to the most specific common signature. + * There is a concern about whether to choose specific or generic shapes when the type annotation is incomplete. + * Union types will also need to instantiate multiple concrete functions (like `tf.function` does today). + * A proposal was to allow a function with types to be passed to tf.function as long as you don’t pass a `input_signature`. +This also applies to `tf.map_fn` and other APIs, not just `tf.function`. + * Another question was about using type annotations to parametrize structures of types, e.g. where output has the same structure but different types. This may be beyond the expressivity of the current type annotation system. + * New section containing guidance on adding new public generic types: + * TLDR: The RFC only adds a very small number of types, a complete type system is out of scope. + * There is some concern about the graph-vs-eager distinction when composing `Tensor` objects, e.g. with `CompositeTensor`. + * For now we’ll try to avoid exporting specific types for Graph vs. Eager until the `CompositeTensor` issue is resolved. Most likely we’ll need some type erasure. + * RFC proposes guidance for adding new types in the future. + * Other changes: + * Added mention on `Tensor`-like types + * List of lists can be converted to tensor but only if they are not ragged. The type annotation system will not enforce that. + * Clarified difference between `tf.Optional` and `typing.Optional`. + * Added clarifications that the spirit of RFC is to be compatible with Python standards and support all python static checkers, not just pytype. + * `isinstance` is not necessary the recommended mechanism, but the type system should be compatible with it and other Python mechanisms + * Discussion on open questions + * Having multiple type definitions (e.g. `tf.Tensor`, `tf.types.Tensor`) can be confusing to users: + * We want to try to expose only one type. + * The preference is to only expose the generic type, but in some cases the specialized one is already exported. In some cases, like `tf.Tensor` we can make the switch. + * In other cases, like `tf.Variable`, we can’t because `Variable` objects are instantiated directly. + * So we’ll punt on `tf.Variable` for now. + * Flat vs. hierarchical namespace: + * Does not need to be answered now, since the initial number of types is small. + * A good default is to mimic the existing namespace structure (e.g. for datasets use `tf.types.data.Dataset`) + * Should we include highly specialized types? e.g. `FuncGraph` classes for cond/while branches: + * They may be needed for Keras. + * In general, if an external library needs a type that’s an indication that it might be useful to export that type. + * `isinstance` support for `register_tensor_conversion_function` using e.g. ABC magics. + * Decision made to accelerate the deprecation of register_tensor_conversion_function instead.