From 7af3f0b67f091ed7108944cd0255cd57543d5995 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Thu, 6 Feb 2020 10:15:10 -0800 Subject: [PATCH 01/21] Initial commit for RFC standalone Keras repository --- rfcs/20200205-standalone-keras-repository.md | 323 +++++++++++++++++++ 1 file changed, 323 insertions(+) create mode 100644 rfcs/20200205-standalone-keras-repository.md diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md new file mode 100644 index 000000000..4a3242ceb --- /dev/null +++ b/rfcs/20200205-standalone-keras-repository.md @@ -0,0 +1,323 @@ +# Standalone Keras Repository + +| Status | Proposed | +:-------------- |:---------------------------------------------------- | +| **RFC #** | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #)| +| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) | +| **Sponsor** | Francois Chollet (fchollet@google.com), Karmel Allison (karmel@google.com) | +| **Updated** | 2020-02-05 | + +## Objective + +Split the Keras code from Tensorflow main Github repository to its own +repository, and use Tensorflow as a public dependency. + +Currently any contribution to Keras code will require building entire +Tensorflow, which is quite expensive to do with normal hardware and toolset +setting. Having a separate repository will allow Keras to just build its own +code without building Tensorflow. This should greatly improve the external +developer velocity when they contribute to Keras code. + +On the TensorFlow side, the split should increase maintainability and greater +modularity by ensuring that the TensorFlow high-level API can be built entirely +on top of public TensorFlow low-level APIs. + +## Motivation + +Building open source Tensorflow project end to end is an extensive exercise. +With a standard GCP instance, it might take more than 1 hour to finish the whole +build process, it might take longer with a mac laptop. Although the local build +cache might help speed up the follow up builds, the initial time cost is too +high for normal software development. Internally Google has a distributed build +and caching service, which we heavily rely on, and can finish all Keras tests +within 5 mins, sadly we can't expose this to external contributors. + +This lead to a few issues: + +* Discourage contribution since external developers can't test their change and +make sure it is correct. +* External developers send unverified PR and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the +lengthy feedback loop. + +There are other side benefits if we split the repository. Keras, as the high +level API, probably should have similar access level as end user for low level +API. When splitting the repository, Keras will have to import Tensorflow and +rely on TF public APIs. If Keras ends up using many TF private functions, it +might be an indication of tight coupling of implementation details, or if some +certain functions is used extensively, we might want to consider exposing them +as public low level API. + +This design is also aligned with design for [Modular Tensorflow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the Tensorflow into smaller components, and not tightly coupled +together. + +## User Benefit + +External contributors should experience much shorter turn-around time when +building/testing Keras since they don't need to build TF anymore. This should +have a positive effect for developer engagement. + +## Design Proposal + +### New location of the code + +Github: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. tf.Keras will also replace Keras on PyPI. + +Also considered: tensorflow/keras. + +| keras-team/keras | tensorflow/keras | +:------------------- |:------------------------------------------- | +|Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.| +|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on Github. Could rely on a lot of existing setup in Tensorflow.| +|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here| + +### Source of Truth +Tensorflow uses Google internal code repository as the source of truth. Every PR +is converted to Google internal change first, submitted in internal system, and +then copied to Github as commits. At the same time, PR is marked as merged with +the corresponding commit hash. + +For Keras, since we are trying to promote user engagement, we hope it will use +Github as a source of truth. This will have the following implications: + +* We expect the majority of the code development/contribution from Github +and the devtools/tests/scripts should focus on the Github development use +case. See more details below. +* Keras CI/presubmit build for Github repo is targeting tf-nightly pip +package as dependency. This means any change to TF will take at most 24 +hours to be reflected to Keras. +* Code will be mirrored to Google internal code repository via Google internal +tools within a very short time window. The Google internal CI tests will run on +HEAD for both Keras and TF code. +* CI build for Github might break when it sees a new version of tf-nightly, if +certain behavior has been changed and didn't caught by unit tests. We have +observed a few similar cases on [tf/addons] +(https://github.com/tensorflow/addons). We hope this can be reduced by stronger +unit test coverage by Google internel sysmte, when both TF and Keras code are +tested on HEAD. +* PIP package management. Keras will now follow the tf-estimator approach. +"pip install tensorflow" should also install keras as well. There are more +details for the PIP package in the [Improved pip package structure] +(https://github.com/tensorflow/community/pull/182). + +### Dependency Cleanup +As the high level API, Keras should have the direct dependency to TF low level +API, but not the other way around. Unfortunately there is some existing reverse +dependency in the TF code that relies on Keras, which we should update/remove +when we split the repository. + +The current usage of Keras from Tensorflow are: +* Unit tests, which should be converted to integration tests, or port the tests +to Keras repository. +* feature_column. +* Legacy tf.layers in v1 API. +* legacy RNN cells. +* TPU support code for optimizer_v2. +* SavedModel. +* TF Lite. + +All the imports can be changed to use dynamic import like below: + +```python +try: + from tensorflow.python.keras.engine import base_layer +except ImportError: + tf.logging.error('keras is not installed, please pip install keras') + base_layer = None +``` + +### Use Keras to use public TF API symbol + +The current Keras code will still work if they still do: +```python +from tensorflow.python.ops import array_ops + +ones = array_ops.ones([2, 3]) +``` + +On the other hand, since Keras is a separate repository, having it only use TF +public API symbol will heavily reduce the chance of breakage because of relying +on private methods or implementation details. We think this is a item that is +critial to the health of the repository. This also allows TF to change without +breaking Keras. + +The converted code should look like: + +```python +import tensorflow as tf + +ones = tf.ones([2, 3]) +``` + +During this conversion, we might notice that certain functions used in Keras are +not public API. Decision should be made on a case by case base for whether: + +* Copy the function to Keras +* Replace the usage with other alternative public API +* Make the function a new TF public API. + +Note that this work can be contributed by the open source community. + +### Two stage change process + +For any change that is changing both Tensorflow and Keras, they will need to be +split into two, one as a PR to TF, and the other one as PR to Keras. Here is +some common scenario to split the change. + +1. Adding a new behavior to Tensorflow, and let Keras rely on it. Note that the +TF change needs to be submitted first, and keras PR needs to wait for the new TF +nightly PIP. Also note that the rollback of TF PR will cause Keras to break, the +rollback sequence should be PR 33333 and then PR 22222. The Google internal test +for TF should catch this. + +```python +# Existing scenario. +# PR 11111 (2 files updated) +# +++ tensorflow/python/ops/array_ops.py +def some_new_function(inputs): + ... + +# +++ tensorflow/python/keras/layers/core.py + +class new_layer(Layer): + + def call(inputs): + array_ops.some_new_function(inputs) + ... +``` + +```python +# New scenario. +# PR 22222 (1 file updated) +# +++ tensorflow/python/ops/array_ops.py +@tf.export('some_new_function') +def some_new_function(inputs): + ... + +================================== +# PR 33333 (1 file updated) +# +++ tensorflow/python/keras/layers/core.py + +class new_layer(Layer): + + def call(inputs): + tf.some_new_function(inputs) + ... +``` + +2. Changing an existing behavior of TF function signature. +Note that the PR 22222 needs to submit with both new and old function since +Google internal CI is still testing from HEAD. The existing_function can be +deleted after PR 33333 is submitted. Also note that this issue is caused by +Keras not using public TF API but the implementation details. Moving towards +public API should reduce the chance of this kind of change. + +```python +# Existing scenario. +# PR 11111 (2 files updated) +# tensorflow/python/ops/array_ops.py +<<< +def existing_function(inputs): + ... +>>> +def new_function(inputs, knob1=False, knob2=1): + ... +# tensorflow/python/keras/layers/core.py + +class existing_layer(Layer): + + def call(inputs): +<<< + array_ops.existing_function(inputs) +>>> + array_ops.new_function( + inputs, + knob1=True, + knob2=3) +``` + +```python +# New scenario. +# PR 22222 (1 file updated) +# tensorflow/python/ops/array_ops.py +<<< +def existing_function(inputs): + ... +>>> +def existing_function(inputs): + return new_function( + inputs, + knob1=False, + knob2=1) + +def new_function(inputs, knob1, knob2=1): + ... + +================================== +# PR 33333 (1 file updated) +# tensorflow/python/keras/layers/core.py +class existing_layer(Layer): + + def call(inputs): +<<< + array_ops.existing_function(inputs) + ... +>>> + array_ops.new_function( + inputs, + knob1=True, + knob2=3) +``` + + +### Performance Implications +Probably will have some performance implications on python if we change to use +public APIs. Need some benchmark to ensure the status quo. + +### Dependencies +Tensorflow PIP package will auto install keras package, which shouldn't cause +any difference on end user side. Under the hood, Keras will be a different +package imported by tf_core, same as we do for TF estimator. + +### Engineering Impact +* The build and test time locally should be greatly reduced, since compiling TF +is no longer needed, and also Keras is pure python which doesn't need any +complication. +* The cross boundary change will require some extra handling since the change +needs to be split into two or more. Same as the rollback. +* Tooling on Github side is not as good as the Google internal tool. This will +impact the engineer velocity for existing Keras team members. + +### Best Practices, Tutorials and Examples +* The new Keras repository should have a new contribution guide about how to +setup a local test environment and iterate based on that. Similar one in +tf/addons can be used as an example. +* The existing TF doc needs to be updated to advocate that Keras code now lives +in a different repository, and the new process of sending PR etc. +* When filing an issue, people might need to consider where to send the issue, +eg if it is a Keras issue or an issue caused by TF but surfaced by Keras. The +different ownership of the repository will also cause difficulties for +transferring the issue. + +### User Impact +* No end user facing change for current TF user, only affecting the developer, +eg in flight PR during the transition period. +* For current Keras PIP package users, they will get the new TF keras package +when they update their PIP, which should have more features than the current +Keras-team version. + +## Questions and Discussion Topics + +1. Tools for issue tracking: We can't rely on Google internal bug tracking tool +since it's not publicly visible, also if managing Github issues across the ong +is hard, we might need to find some alternatives for tracking bugs/features etc. +2. OSS test for TPU related code. Since the TPU is not available during local +test, The verification will have to be done when the PR is mirror to Google +internal. +3. Transition period for moving the Keras code from tensorflow/tensorflow to +keras-team/keras. All the inflight PR/issue will be affected, either they need +to be copied to keras-team/keras, and if they also touch tensorflow, then they +need to split into two. \ No newline at end of file From 0145eea24e3a7d37d411caae1aba11f074d95875 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Thu, 6 Feb 2020 10:58:24 -0800 Subject: [PATCH 02/21] Update wording and nits. --- rfcs/20200205-standalone-keras-repository.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 4a3242ceb..92b50aa0c 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -129,7 +129,7 @@ except ImportError: base_layer = None ``` -### Use Keras to use public TF API symbol +### Update Keras to use public TF API symbol The current Keras code will still work if they still do: ```python @@ -139,10 +139,10 @@ ones = array_ops.ones([2, 3]) ``` On the other hand, since Keras is a separate repository, having it only use TF -public API symbol will heavily reduce the chance of breakage because of relying +public API symbol will heavily reduce the chance of breakage caused by relying on private methods or implementation details. We think this is a item that is -critial to the health of the repository. This also allows TF to change without -breaking Keras. +critial to the health of the project. This also allows TF to change internal +implementation without worrying breaking Keras. The converted code should look like: @@ -155,23 +155,23 @@ ones = tf.ones([2, 3]) During this conversion, we might notice that certain functions used in Keras are not public API. Decision should be made on a case by case base for whether: -* Copy the function to Keras -* Replace the usage with other alternative public API -* Make the function a new TF public API. +* Copy the functionality from TF to Keras. +* Replace the usage with other alternative TF public API. +* Make the functionality a new TF public API. Note that this work can be contributed by the open source community. ### Two stage change process For any change that is changing both Tensorflow and Keras, they will need to be -split into two, one as a PR to TF, and the other one as PR to Keras. Here is +split into two, one as a PR to TF, and the other PR to Keras. Here is some common scenario to split the change. 1. Adding a new behavior to Tensorflow, and let Keras rely on it. Note that the TF change needs to be submitted first, and keras PR needs to wait for the new TF nightly PIP. Also note that the rollback of TF PR will cause Keras to break, the rollback sequence should be PR 33333 and then PR 22222. The Google internal test -for TF should catch this. +for TF should catch the error if the rollback sequence is not correct. ```python # Existing scenario. From 0405d803912771e21eec0590cb70fb9d63fdc421 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Chollet?= Date: Thu, 6 Feb 2020 12:06:45 -0800 Subject: [PATCH 03/21] Update 20200205-standalone-keras-repository.md --- rfcs/20200205-standalone-keras-repository.md | 292 ++++++++++--------- 1 file changed, 160 insertions(+), 132 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 92b50aa0c..69eb2466e 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -4,122 +4,139 @@ :-------------- |:---------------------------------------------------- | | **RFC #** | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #)| | **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) | -| **Sponsor** | Francois Chollet (fchollet@google.com), Karmel Allison (karmel@google.com) | -| **Updated** | 2020-02-05 | +| **Sponsor** | Karmel Allison (karmel@google.com) | +| **Updated** | 2020-02-05 | ## Objective -Split the Keras code from Tensorflow main Github repository to its own -repository, and use Tensorflow as a public dependency. +Move the Keras code from the TensorFlow main GitHub repository to its own +repository, with TensorFlow as a dependency. -Currently any contribution to Keras code will require building entire -Tensorflow, which is quite expensive to do with normal hardware and toolset -setting. Having a separate repository will allow Keras to just build its own -code without building Tensorflow. This should greatly improve the external -developer velocity when they contribute to Keras code. +## Motivation -On the TensorFlow side, the split should increase maintainability and greater -modularity by ensuring that the TensorFlow high-level API can be built entirely -on top of public TensorFlow low-level APIs. +### Build times -## Motivation +Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole +build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a +distributed build and caching service, which Googlers heavily rely on, +that can build TensorFlow and run all Keras tests within 5 mins. Sadly, +we can't expose this to external contributors. + +Currently, any contribution to Keras code will require building all of +TensorFlow, which is quite expensive to do for average users. +Having a separate repository will allow the Keras package to be built +without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code. -Building open source Tensorflow project end to end is an extensive exercise. -With a standard GCP instance, it might take more than 1 hour to finish the whole -build process, it might take longer with a mac laptop. Although the local build -cache might help speed up the follow up builds, the initial time cost is too -high for normal software development. Internally Google has a distributed build -and caching service, which we heavily rely on, and can finish all Keras tests -within 5 mins, sadly we can't expose this to external contributors. +### Community Benefit -This lead to a few issues: +The difficulty of building TensorFlow from scratch in order to make a PR +to Keras code has been a significant source of issues: -* Discourage contribution since external developers can't test their change and -make sure it is correct. -* External developers send unverified PR and Google reviewers spend time back +* It discouraged contributions, since many external developers couldn't test +their changes and make sure they were correct. +* External developers would send unverified PRs, and Google reviewers spend time back and forth, fixing the PR. Sometimes PR is just not moving forward because of the lengthy feedback loop. -There are other side benefits if we split the repository. Keras, as the high -level API, probably should have similar access level as end user for low level -API. When splitting the repository, Keras will have to import Tensorflow and -rely on TF public APIs. If Keras ends up using many TF private functions, it -might be an indication of tight coupling of implementation details, or if some -certain functions is used extensively, we might want to consider exposing them -as public low level API. +With the new standalone Keras repository, external contributors +should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore. +This should have a positive impact on building a vibrant open-source +developer community. + +In addition, by getting the Keras team at Google to start developing Keras +using the same public tools and infrastructure as third-party developers, +we make the development process more transparent and more community-oriented. + +### TensorFlow API modularity -This design is also aligned with design for [Modular Tensorflow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), -which splits the Tensorflow into smaller components, and not tightly coupled -together. +There are other side-benefits if we split the repository. Currently, Keras +has to rely on a number of private TensorFlow APIs. However, a litmus test +of the quality of the public TensorFlow low-level APIs is that they should +be strictly sufficient to a higher-level API like Keras. +After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow +private features, it might be an indication of tight coupling of +implementation details. If certain private features are extensively used, +we might want to consider exposing them as public low level API. -## User Benefit +This design is also aligned with thee design for +[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not +tightly coupled together. -External contributors should experience much shorter turn-around time when -building/testing Keras since they don't need to build TF anymore. This should -have a positive effect for developer engagement. ## Design Proposal ### New location of the code -Github: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), joining the other Keras SIG projects and replacing the current external Keras -codebase. tf.Keras will also replace Keras on PyPI. +codebase. `tf.Keras` will also replace Keras on PyPI. -Also considered: tensorflow/keras. +Also considered: `tensorflow/keras`. | keras-team/keras | tensorflow/keras | :------------------- |:------------------------------------------- | |Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.| -|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on Github. Could rely on a lot of existing setup in Tensorflow.| +|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.| |Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here| ### Source of Truth -Tensorflow uses Google internal code repository as the source of truth. Every PR -is converted to Google internal change first, submitted in internal system, and -then copied to Github as commits. At the same time, PR is marked as merged with -the corresponding commit hash. - -For Keras, since we are trying to promote user engagement, we hope it will use -Github as a source of truth. This will have the following implications: - -* We expect the majority of the code development/contribution from Github -and the devtools/tests/scripts should focus on the Github development use -case. See more details below. -* Keras CI/presubmit build for Github repo is targeting tf-nightly pip + +TensorFlow uses a Google-internal code repository as its source of truth. Every PR +submitted though GitHub is converted to a Google-internal change first, +submitted through the internal system, and then copied to GitHub as commits. +At the same time, PR is marked as merged with the corresponding commit hash. + +Likewise, issue tracking and code review takes place through Google-internal tools. + +For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications: + +* We expect the majority of the code development/contribution from GitHub +and the dev tools / tests / scripts should focus on the GitHub development use +case. See below for more details. +* Keras CI/presubmit build for the GitHub repo should target the `tf-nightly` pip package as dependency. This means any change to TF will take at most 24 -hours to be reflected to Keras. -* Code will be mirrored to Google internal code repository via Google internal -tools within a very short time window. The Google internal CI tests will run on -HEAD for both Keras and TF code. -* CI build for Github might break when it sees a new version of tf-nightly, if -certain behavior has been changed and didn't caught by unit tests. We have -observed a few similar cases on [tf/addons] -(https://github.com/tensorflow/addons). We hope this can be reduced by stronger -unit test coverage by Google internel sysmte, when both TF and Keras code are -tested on HEAD. -* PIP package management. Keras will now follow the tf-estimator approach. -"pip install tensorflow" should also install keras as well. There are more -details for the PIP package in the [Improved pip package structure] -(https://github.com/tensorflow/community/pull/182). +hours to be reflected on the Keras side. +* The Keras code will be mirrored to a Google-internal code repository via Google-internal +tools within a very short time window after each change. +The Google-internal CI tests will run on HEAD for both Keras and TF code. +* The CI build for the repository on GitHub might break when it sees a +new version of `tf-nightly`, if certain behavior has been changed and wasn't +caught by unit tests. We have observed a few similar cases with +[tf/addons](https://github.com/tensorflow/addons). +We hope this can be reduced by stronger +unit test coverage by Google internel systems, when both TF and Keras code are +tested at HEAD. +* pip package management. Keras will now follow the `tf-estimator` approach. +"pip install tensorflow" should also install Keras (from PyPI) as well. +There are more details for the pip package in the +[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC. ### Dependency Cleanup -As the high level API, Keras should have the direct dependency to TF low level -API, but not the other way around. Unfortunately there is some existing reverse -dependency in the TF code that relies on Keras, which we should update/remove + +As the high-level API of TensorFlow, Keras should have a direct dependency on +TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse +logic in the TF code that relies on Keras, which we should update/remove when we split the repository. -The current usage of Keras from Tensorflow are: +The current usage of Keras from TensorFlow are: * Unit tests, which should be converted to integration tests, or port the tests to Keras repository. -* feature_column. -* Legacy tf.layers in v1 API. +* `feature_column`. +* Legacy `tf.layers` in v1 API. * legacy RNN cells. -* TPU support code for optimizer_v2. +* TPU support code for `optimizer_v2`. * SavedModel. * TF Lite. -All the imports can be changed to use dynamic import like below: +All Keras imports in integration tests can be changed to use dynamic import like below: ```python try: @@ -129,22 +146,22 @@ except ImportError: base_layer = None ``` -### Update Keras to use public TF API symbol +### Update Keras to only use public TF APIs -The current Keras code will still work if they still do: +The current Keras code will still work if we do e.g.: ```python from tensorflow.python.ops import array_ops ones = array_ops.ones([2, 3]) ``` -On the other hand, since Keras is a separate repository, having it only use TF -public API symbol will heavily reduce the chance of breakage caused by relying -on private methods or implementation details. We think this is a item that is +However, since Keras is a separate repository, having it only use TF +public APIs will heavily reduce the chance of breakage caused by relying +on private methods or implementation details. We think this point is critial to the health of the project. This also allows TF to change internal -implementation without worrying breaking Keras. +implementation details without worrying about breaking Keras. -The converted code should look like: +The converted code should look like e.g.: ```python import tensorflow as tf @@ -152,26 +169,29 @@ import tensorflow as tf ones = tf.ones([2, 3]) ``` -During this conversion, we might notice that certain functions used in Keras are -not public API. Decision should be made on a case by case base for whether: +During this conversion, we might notice that certain TF features used in Keras are +not public. A decision should be made on a case-by-case basis: * Copy the functionality from TF to Keras. -* Replace the usage with other alternative TF public API. +* Replace the usage with another alternative TF public API. * Make the functionality a new TF public API. -Note that this work can be contributed by the open source community. +**Note that the open-source community is encouraged to contribute to this effort.** + +### Two-stage change process -### Two stage change process +For any change that is affecting both TensorFlow and Keras, the change +will need to be split into two, one as a PR to the TF repo, +and the other as a PR to the Keras repo. Here are some common scenarios: -For any change that is changing both Tensorflow and Keras, they will need to be -split into two, one as a PR to TF, and the other PR to Keras. Here is -some common scenario to split the change. +1. Adding a new feature to TensorFlow, and having Keras rely on it. Note that the +TF change needs to be submitted first, and the Keras PR needs to wait for the new TF +nightly to become available on PyPI. -1. Adding a new behavior to Tensorflow, and let Keras rely on it. Note that the -TF change needs to be submitted first, and keras PR needs to wait for the new TF -nightly PIP. Also note that the rollback of TF PR will cause Keras to break, the -rollback sequence should be PR 33333 and then PR 22222. The Google internal test -for TF should catch the error if the rollback sequence is not correct. +Also note that any rollback of the TF PR will cause Keras to break, the +rollback sequence should be PR 33333 and then PR 22222 (see example below). +The Google-internal test for TF should catch the error if the rollback sequence +is not correct. ```python # Existing scenario. @@ -208,12 +228,14 @@ class new_layer(Layer): ... ``` -2. Changing an existing behavior of TF function signature. -Note that the PR 22222 needs to submit with both new and old function since -Google internal CI is still testing from HEAD. The existing_function can be +2. Changing the behavior of an existing TF API. + +Note that the PR 22222 needs to be submitted with both the new and old +function since Google internal CI is still testing from HEAD. +The previous function can be deleted after PR 33333 is submitted. Also note that this issue is caused by -Keras not using public TF API but the implementation details. Moving towards -public API should reduce the chance of this kind of change. +Keras not using exclusively public TF API, but relying on TF implementation details. +Moving towards only using public APIs should reduce the likelihood of this kind of issue. ```python # Existing scenario. @@ -274,50 +296,56 @@ class existing_layer(Layer): ### Performance Implications -Probably will have some performance implications on python if we change to use -public APIs. Need some benchmark to ensure the status quo. + +There may be some performance implications as we move towards only using +public TF APIs. We need to maintain a benchmark to ensure that there +is no performance regression. ### Dependencies -Tensorflow PIP package will auto install keras package, which shouldn't cause -any difference on end user side. Under the hood, Keras will be a different -package imported by tf_core, same as we do for TF estimator. - -### Engineering Impact -* The build and test time locally should be greatly reduced, since compiling TF -is no longer needed, and also Keras is pure python which doesn't need any -complication. -* The cross boundary change will require some extra handling since the change -needs to be split into two or more. Same as the rollback. -* Tooling on Github side is not as good as the Google internal tool. This will -impact the engineer velocity for existing Keras team members. + +The TensorFlow pip package will auto-install the Keras package, which shouldn't make +any difference on the end-user side. Under the hood, Keras will be a different +package imported by `tf_core`, like what we do for TF estimator. + +### Developer experience Impact + +* The local build and test times should be greatly reduced, since compiling TF +is no longer needed, and Keras is a pure-Python project. +* Cross-boundary changes will require some extra handling since such changes +needs to be split into two or more PRs. Same for rollbacks. +* Tooling on the GitHub side (for code review, etc.) is not as good as +Google-internal tools. This may impact the develoment velocity for +Keras team members at Google. ### Best Practices, Tutorials and Examples + * The new Keras repository should have a new contribution guide about how to -setup a local test environment and iterate based on that. Similar one in +setup a local test environment and iterate based on that. A similar one in tf/addons can be used as an example. -* The existing TF doc needs to be updated to advocate that Keras code now lives -in a different repository, and the new process of sending PR etc. +* The existing TF docs needs to be updated to highlight that Keras code now lives +in a different repository, with a new process for sending PRs, etc. * When filing an issue, people might need to consider where to send the issue, -eg if it is a Keras issue or an issue caused by TF but surfaced by Keras. The +e.g. is it a Keras issue or an issue caused by TF but surfaced by Keras. The different ownership of the repository will also cause difficulties for transferring the issue. ### User Impact -* No end user facing change for current TF user, only affecting the developer, -eg in flight PR during the transition period. -* For current Keras PIP package users, they will get the new TF keras package -when they update their PIP, which should have more features than the current + +* No end-user facing change for current TF users; the split would only affect +developers, e.g. in-flight PRs during the transition period. +* For current Keras pip package users, they will get the new TF keras package +when they update their pip, which should have more features than the current Keras-team version. ## Questions and Discussion Topics -1. Tools for issue tracking: We can't rely on Google internal bug tracking tool -since it's not publicly visible, also if managing Github issues across the ong +1. Tools for issue tracking: we can't rely on Google-internal bug tracking tool +since it's not publicly visible, also if managing GitHub issues across the orgs is hard, we might need to find some alternatives for tracking bugs/features etc. -2. OSS test for TPU related code. Since the TPU is not available during local -test, The verification will have to be done when the PR is mirror to Google -internal. -3. Transition period for moving the Keras code from tensorflow/tensorflow to -keras-team/keras. All the inflight PR/issue will be affected, either they need -to be copied to keras-team/keras, and if they also touch tensorflow, then they -need to split into two. \ No newline at end of file +2. OSS tests for TPU-related code. Since TPUs are not available during local +testing, the verification will have to be done when the PR is mirrored to Google's +internal systems. +3. Transition period for moving the Keras code from `tensorflow/tensorflow` to +`keras-team/keras`. All in-flight PRs / issues will be affected: they need +to be copied to `keras-team/keras`, or if they also touch TensorFlow, then they +need to split into two. From 3cce978931330e9b01347d4f970c66e670fd148c Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Thu, 6 Feb 2020 12:20:45 -0800 Subject: [PATCH 04/21] Update the RFC nubmer --- rfcs/20200205-standalone-keras-repository.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 69eb2466e..1a8765efa 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -2,7 +2,7 @@ | Status | Proposed | :-------------- |:---------------------------------------------------- | -| **RFC #** | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #)| +| **RFC #** | [202](https://github.com/tensorflow/community/pull/202) | | **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) | | **Sponsor** | Karmel Allison (karmel@google.com) | | **Updated** | 2020-02-05 | From 3f0dc9276eb4f287120aa3ac60ee37fd605098d6 Mon Sep 17 00:00:00 2001 From: Qianli Scott Zhu Date: Mon, 10 Feb 2020 10:50:17 -0800 Subject: [PATCH 05/21] Update rfcs/20200205-standalone-keras-repository.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Frédéric Branchaud-Charron --- rfcs/20200205-standalone-keras-repository.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 1a8765efa..e4da10510 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -64,7 +64,7 @@ private features, it might be an indication of tight coupling of implementation details. If certain private features are extensively used, we might want to consider exposing them as public low level API. -This design is also aligned with thee design for +This design is also aligned with the design for [Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), which splits the TensorFlow project into smaller components that are not tightly coupled together. From 0109f0f81f9920f6e11c608d2e89e63a95f184e1 Mon Sep 17 00:00:00 2001 From: Qianli Scott Zhu Date: Mon, 10 Feb 2020 10:59:51 -0800 Subject: [PATCH 06/21] Update rfcs/20200205-standalone-keras-repository.md Co-Authored-By: Gabriel de Marmiesse --- rfcs/20200205-standalone-keras-repository.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index e4da10510..da332e1ce 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -84,7 +84,7 @@ Also considered: `tensorflow/keras`. :------------------- |:------------------------------------------- | |Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.| |Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.| -|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here| +|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See [here](https://help.github.com/en/github/managing-your-work-on-github/transferring-an-issue-to-another-repository)| ### Source of Truth From b654b44ea14f07090d841efc37690790b92bed5c Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Mon, 10 Feb 2020 11:22:15 -0800 Subject: [PATCH 07/21] Update tf to Keras usage with more details. --- rfcs/20200205-standalone-keras-repository.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index da332e1ce..5df82e840 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -129,12 +129,12 @@ when we split the repository. The current usage of Keras from TensorFlow are: * Unit tests, which should be converted to integration tests, or port the tests to Keras repository. -* `feature_column`. -* Legacy `tf.layers` in v1 API. -* legacy RNN cells. +* `feature_column`, which uses Keras base layer and model. +* Legacy `tf.layers` in v1 API, which uses Keras base layer as base class. +* legacy RNN cells, which uses Keras serialization and deserialization. * TPU support code for `optimizer_v2`. -* SavedModel. -* TF Lite. +* TF Lite for keras model saving utils. +* Aliases from tf.losses/metrics/initializers/optimizers in tf.compat.v1. All Keras imports in integration tests can be changed to use dynamic import like below: From 7c57540a428d0b3c3e1fa0ac727580545396175f Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Mon, 10 Feb 2020 11:29:21 -0800 Subject: [PATCH 08/21] Change to use the list view for comparison. Make it easy for comment to point to the exact place. --- rfcs/20200205-standalone-keras-repository.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 5df82e840..50fd90513 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -80,11 +80,19 @@ codebase. `tf.Keras` will also replace Keras on PyPI. Also considered: `tensorflow/keras`. -| keras-team/keras | tensorflow/keras | -:------------------- |:------------------------------------------- | -|Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.| -|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.| -|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See [here](https://help.github.com/en/github/managing-your-work-on-github/transferring-an-issue-to-another-repository)| +Pros: +1. Under the umbrella of Keras SIG, which hosts all other Keras related projects +like keras-application, KerasTuner etc. +1. Lots of existing followers on keras-team, who may not be easily migrated to +TF project. +1. Can't easily delete keras project, which already have tons of stars and +incoming reference links. Continued existence of external Keras code will create +confusion ("why is there tensorflow/keras AND keras-team/keras?"). + +Cons: +1. The repo isn't under the same organization as tensorflow, which makes it hard +to manage issues/PRs and references across the organization. +1. Existing issue/PR under the same org can be transferred easily, but not cross the different org. See [here](https://help.github.com/en/github/managing-your-work-on-github/transferring-an-issue-to-another-repository). ### Source of Truth From 86d2c3737ba5f285e486494196a3de17383148a7 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Mon, 10 Feb 2020 11:37:08 -0800 Subject: [PATCH 09/21] Update the section for PIP package dependency. Intake the suggestion from @gabrieldemarmiesse based on the tf-addons. --- rfcs/20200205-standalone-keras-repository.md | 24 +++++++++++--------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 50fd90513..d6fee7f3d 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -109,19 +109,21 @@ GitHub as source of truth. This will have the following implications: * We expect the majority of the code development/contribution from GitHub and the dev tools / tests / scripts should focus on the GitHub development use case. See below for more details. -* Keras CI/presubmit build for the GitHub repo should target the `tf-nightly` pip -package as dependency. This means any change to TF will take at most 24 -hours to be reflected on the Keras side. -* The Keras code will be mirrored to a Google-internal code repository via Google-internal -tools within a very short time window after each change. +* Keras CI/presubmit build for the GitHub repo should target a stable PIP +version of tensorflow package as dependency. It could either be a `tf-nightly` +with explicit version, or a release candidate version, or a stable version. +Depend on a floating `tf-nightly` could cause CI build to be instable, which has +been observed in other repository +[like tf-addons](https://github.com/tensorflow/addons/pull/912). +* The Keras code will be mirrored to a Google-internal code repository via +Google-internal tools within a very short time window after each change. The Google-internal CI tests will run on HEAD for both Keras and TF code. -* The CI build for the repository on GitHub might break when it sees a -new version of `tf-nightly`, if certain behavior has been changed and wasn't -caught by unit tests. We have observed a few similar cases with +* The CI build for the repository on GitHub might break when it points to a +new version of `tf-nightly`, if certain behavior has been changed and wasn't +caught by unit tests. We have observed a few similar cases with [tf/addons](https://github.com/tensorflow/addons). -We hope this can be reduced by stronger -unit test coverage by Google internel systems, when both TF and Keras code are -tested at HEAD. +We hope this can be reduced by stronger unit test coverage by Google internel +systems, when both TF and Keras code are tested at HEAD. * pip package management. Keras will now follow the `tf-estimator` approach. "pip install tensorflow" should also install Keras (from PyPI) as well. There are more details for the pip package in the From 1504e525f8819adc4e81259bffed69e97b17a187 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Mon, 10 Feb 2020 13:08:20 -0800 Subject: [PATCH 10/21] Update the preferred PIP package version. --- rfcs/20200205-standalone-keras-repository.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index d6fee7f3d..afe6d3607 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -110,8 +110,13 @@ GitHub as source of truth. This will have the following implications: and the dev tools / tests / scripts should focus on the GitHub development use case. See below for more details. * Keras CI/presubmit build for the GitHub repo should target a stable PIP -version of tensorflow package as dependency. It could either be a `tf-nightly` -with explicit version, or a release candidate version, or a stable version. +version of tensorflow package as dependency. It could either be (preferably in +this order): + * a stable version + * a release candidate version + * a `tf-nightly` with explicit version. +Using a nightly version for testing should be motivated by the usage of a API +feature not present in the stable or pre-release version. Depend on a floating `tf-nightly` could cause CI build to be instable, which has been observed in other repository [like tf-addons](https://github.com/tensorflow/addons/pull/912). From f4af07b21f59f9a163287b2fd2b2d6772f9e7349 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Mon, 10 Feb 2020 13:30:17 -0800 Subject: [PATCH 11/21] Update more details about the Github migration. --- rfcs/20200205-standalone-keras-repository.md | 28 ++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index afe6d3607..5722323bb 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -309,6 +309,34 @@ class existing_layer(Layer): knob2=3) ``` +### Github Repository Migration + +* For any open Github PR/issue in Keras-team/keras, it need to be copied to +Tensorflow if the content is still relevant in Tensorflow. Otherwise it will +be closed as obsolete. We intend to have a clean keras-team/keras repository +before we copy any issue or PR from TF side. +* For any opening PR in Tensorflow for Keras, team will try to merge them as +much as possible before the migration. For any open PR that hasn't been merged, +we will check if it is still relevant/active, and will be copied to +keras-team/keras. +* The permission of keras-team/keras need to be updated as the codebase is new. +The access level for the repository need to be reestablished. +From least access to most access, the permission levels for an organization repository are: + + * Read: Recommended for non-code contributors who want to view or discuss the project. + * Triage: Recommended for contributors who need to proactively manage issues + and pull requests without write access. + * Write: Recommended for contributors who actively push to your project. + * Maintain: Recommended for project managers who need to manage the repository without access to sensitive or destructive actions. + * Admin: Recommended for people who need full access to the project, including sensitive and destructive actions like managing security or deleting a + repository. + +Any existing Keras-team active member should get `Triage` level for now, and +more permission will be granted once we identified active contributers. In the +meantime, Keras team in Google will manage the repository initially, and will +share more permissions with the community member. + +See more details about the project permission in https://help.github.com/en/github/setting-up-and-managing-organizations-and-teams/repository-permission-levels-for-an-organization. ### Performance Implications From 6872453f36b5025804d6f426543189d5c0913bdf Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 18 Feb 2020 10:04:20 -0800 Subject: [PATCH 12/21] Update wording wrt to pure python project. --- rfcs/20200205-standalone-keras-repository.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 5722323bb..dac3d8bbe 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -353,7 +353,8 @@ package imported by `tf_core`, like what we do for TF estimator. ### Developer experience Impact * The local build and test times should be greatly reduced, since compiling TF -is no longer needed, and Keras is a pure-Python project. +is no longer needed, and Keras is so far a pure-Python project (this might +change in future when custom c ops are added to Keras). * Cross-boundary changes will require some extra handling since such changes needs to be split into two or more PRs. Same for rollbacks. * Tooling on the GitHub side (for code review, etc.) is not as good as From 7144521f783d8dc12b85f2f949e31978766c804e Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 18 Feb 2020 11:20:48 -0800 Subject: [PATCH 13/21] Update the sectoin for TF to keras dependency. --- rfcs/20200205-standalone-keras-repository.md | 29 ++++++++++++++------ 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index dac3d8bbe..984d4ecb8 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -137,9 +137,9 @@ There are more details for the pip package in the ### Dependency Cleanup As the high-level API of TensorFlow, Keras should have a direct dependency on -TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse -logic in the TF code that relies on Keras, which we should update/remove -when we split the repository. +TF low-level APIs, but not the other way around. Unfortunately, there is some +existing reverse logic in the TF code that relies on Keras, which we should +update/remove when we split the repository. The current usage of Keras from TensorFlow are: * Unit tests, which should be converted to integration tests, or port the tests @@ -150,17 +150,28 @@ to Keras repository. * TPU support code for `optimizer_v2`. * TF Lite for keras model saving utils. * Aliases from tf.losses/metrics/initializers/optimizers in tf.compat.v1. +* Symoblic/eager tensor logic for ops that is tightly coupled with Keras due to + current implementation of functional API and TF Ops layers. -All Keras imports in integration tests can be changed to use dynamic import like below: +For usage like tf.layers to keras.layers, it can't be removed due to the API +contract and guarantee. We should use LasyLoader to walk around the cyclic +dependency issue. ```python -try: - from tensorflow.python.keras.engine import base_layer -except ImportError: - tf.logging.error('keras is not installed, please pip install keras') - base_layer = None +from tensorflow.python.util.lasy_loader import LasyLoader + +BaseLayer = LasyLoader( + 'BaseLayer', globals(), 'keras.layers.Layer') +if not BaseLayer: + raise ImportError('Keras is not installed, please pip install keras.') ``` +Other dependency should be removed as much as possible, eg move the util/code +from Keras to TF, or rework the implementation detail. + +**Note that this is a key point to prevent Keras accidentally break Tensorflow.** + + ### Update Keras to only use public TF APIs The current Keras code will still work if we do e.g.: From ecbc169ee43f60a745bcb3152455f74cd5a171da Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 25 Feb 2020 11:06:18 -0800 Subject: [PATCH 14/21] Address the comment from @apassos 1. Udpate the motivation to emphasize the modularity of TF. 2. Update the section for reverse dependency from TF to keras. --- rfcs/20200205-standalone-keras-repository.md | 77 ++++++++++---------- 1 file changed, 40 insertions(+), 37 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 984d4ecb8..b021c7a10 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -14,19 +14,35 @@ repository, with TensorFlow as a dependency. ## Motivation +### TensorFlow API modularity + +Currently, Keras has to rely on a number of private TensorFlow APIs. However, a +litmus test of the quality of the public TensorFlow low-level APIs is that they +should be strictly sufficient to a higher-level API like Keras. +After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow +private features, it might be an indication of tight coupling of +implementation details. If certain private features are extensively used, +we might want to consider exposing them as public low level API. + +This design is also aligned with the design for +[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not +tightly coupled together. + ### Build times Building the open-source TensorFlow project end-to-end is an extensive exercise. -With a standard GCP instance, it might take more than one hour to finish the whole -build process (it might take longer with a Mac laptop). Although the local build -cache might help speed up the follow-up builds, the initial time cost is too -high for regular software development workflows. Internally, Google has a +With a standard GCP instance, it might take more than one hour to finish the +whole build process (it might take longer with a Mac laptop). Although the local +build cache might help speed up the follow-up builds, the initial time cost is +too high for regular software development workflows. Internally, Google has a distributed build and caching service, which Googlers heavily rely on, that can build TensorFlow and run all Keras tests within 5 mins. Sadly, we can't expose this to external contributors. Currently, any contribution to Keras code will require building all of -TensorFlow, which is quite expensive to do for average users. +TensorFlow c++ binary, which is quite expensive to do for average users. Having a separate repository will allow the Keras package to be built without building TensorFlow. This should greatly improve the velocity of open-source developers when they contribute to Keras code. @@ -38,37 +54,20 @@ to Keras code has been a significant source of issues: * It discouraged contributions, since many external developers couldn't test their changes and make sure they were correct. -* External developers would send unverified PRs, and Google reviewers spend time back -and forth, fixing the PR. Sometimes PR is just not moving forward because of the -lengthy feedback loop. - -With the new standalone Keras repository, external contributors -should experience much shorter turn-around time when -building/testing Keras, since they don't need to build TensorFlow anymore. -This should have a positive impact on building a vibrant open-source +* External developers would send unverified PRs, and Google reviewers spend time +back and forth, fixing the PR. Sometimes PR is just not moving forward because +of the lengthy feedback loop. + +With the new standalone Keras repository, external contributors should +experience much shorter turn-around time when building/testing Keras, since they +don't need to build TensorFlow anymore. +This should have a positive impact on building a vibrant open-source developer community. In addition, by getting the Keras team at Google to start developing Keras using the same public tools and infrastructure as third-party developers, we make the development process more transparent and more community-oriented. -### TensorFlow API modularity - -There are other side-benefits if we split the repository. Currently, Keras -has to rely on a number of private TensorFlow APIs. However, a litmus test -of the quality of the public TensorFlow low-level APIs is that they should -be strictly sufficient to a higher-level API like Keras. -After splitting the repository, Keras will have to import TensorFlow and -rely exclusively on public APIs. If Keras still ends up using TensorFlow -private features, it might be an indication of tight coupling of -implementation details. If certain private features are extensively used, -we might want to consider exposing them as public low level API. - -This design is also aligned with the design for -[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), -which splits the TensorFlow project into smaller components that are not -tightly coupled together. - ## Design Proposal @@ -141,17 +140,18 @@ TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse logic in the TF code that relies on Keras, which we should update/remove when we split the repository. -The current usage of Keras from TensorFlow are: -* Unit tests, which should be converted to integration tests, or port the tests -to Keras repository. +So far there are about 120 usages for Keras within Tensorflow, the current usage +are: +* Unit tests, which relies on Keras to verify certain behavior of TF, like +distribution strategy, tf.function, and eager context. They should either be +converted to integration tests, or port the tests to Keras repository. * `feature_column`, which uses Keras base layer and model. * Legacy `tf.layers` in v1 API, which uses Keras base layer as base class. * legacy RNN cells, which uses Keras serialization and deserialization. -* TPU support code for `optimizer_v2`. +* TPU support code does a isinstance() check for `optimizer_v2`. * TF Lite for keras model saving utils. * Aliases from tf.losses/metrics/initializers/optimizers in tf.compat.v1. -* Symoblic/eager tensor logic for ops that is tightly coupled with Keras due to - current implementation of functional API and TF Ops layers. +* Keras symbolic tensor check in the ops library for tf.function. For usage like tf.layers to keras.layers, it can't be removed due to the API contract and guarantee. We should use LasyLoader to walk around the cyclic @@ -167,7 +167,10 @@ if not BaseLayer: ``` Other dependency should be removed as much as possible, eg move the util/code -from Keras to TF, or rework the implementation detail. +from Keras to TF, or rework the implementation detail. For any of the +dependencies that have to stay, need to use public Keras API only. A +check will also be added to TF to make sure there isn't any dependencies being +added in future for need of Keras. **Note that this is a key point to prevent Keras accidentally break Tensorflow.** From ce8a20f82126dd829c1afdd22d34ca8d1d5520ec Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 25 Feb 2020 13:31:46 -0800 Subject: [PATCH 15/21] Update the states for python change history. These numbers might help us decide whether the split PR execise is too much a overhead or not. --- rfcs/20200205-standalone-keras-repository.md | 43 ++++++++++++++------ 1 file changed, 31 insertions(+), 12 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index b021c7a10..eaafbf609 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -67,6 +67,9 @@ developer community. In addition, by getting the Keras team at Google to start developing Keras using the same public tools and infrastructure as third-party developers, we make the development process more transparent and more community-oriented. +In the meantime, some of the workload for repository management can be shared +with community so that Keras team member within Google won't be the bottleneck +for all the issues. ## Design Proposal @@ -198,8 +201,8 @@ import tensorflow as tf ones = tf.ones([2, 3]) ``` -During this conversion, we might notice that certain TF features used in Keras are -not public. A decision should be made on a case-by-case basis: +During this conversion, we might notice that certain TF features used in Keras +are not public. A decision should be made on a case-by-case basis: * Copy the functionality from TF to Keras. * Replace the usage with another alternative TF public API. @@ -211,11 +214,25 @@ not public. A decision should be made on a case-by-case basis: For any change that is affecting both TensorFlow and Keras, the change will need to be split into two, one as a PR to the TF repo, -and the other as a PR to the Keras repo. Here are some common scenarios: - -1. Adding a new feature to TensorFlow, and having Keras rely on it. Note that the -TF change needs to be submitted first, and the Keras PR needs to wait for the new TF -nightly to become available on PyPI. +and the other as a PR to the Keras repo. This will introduce overhead and slow +down the change for area's like distribution stragey, and other areas that +might under active development. + +With the internal change history between 2019-01-01 and 2020-01-01: +1. There are 6756 changes submitted to tensorflow/python +2. There are 5115 changes submitted to tensorflow/python but not +tensorflow/python/keras. +3. Among the 1641 changes submitted to tensorflow/keras, 1338 of them change +Keras only without touching tensorflow, and 303 of them change both Keras and +TF. +This means about 18.5% change that change Keras will change TF, and 4.4% change +that change TF will touch Keras in the meantime. + +Here are some common scenarios: + +1. Adding a new feature to TensorFlow, and having Keras rely on it. Note that +the TF change needs to be submitted first, and the Keras PR needs to wait for +the new TF nightly to become available on PyPI. Also note that any rollback of the TF PR will cause Keras to break, the rollback sequence should be PR 33333 and then PR 22222 (see example below). @@ -335,9 +352,11 @@ we will check if it is still relevant/active, and will be copied to keras-team/keras. * The permission of keras-team/keras need to be updated as the codebase is new. The access level for the repository need to be reestablished. -From least access to most access, the permission levels for an organization repository are: +From least access to most access, the permission levels for an organization +repository are: - * Read: Recommended for non-code contributors who want to view or discuss the project. + * Read: Recommended for non-code contributors who want to view or discuss the + project. * Triage: Recommended for contributors who need to proactively manage issues and pull requests without write access. * Write: Recommended for contributors who actively push to your project. @@ -360,9 +379,9 @@ is no performance regression. ### Dependencies -The TensorFlow pip package will auto-install the Keras package, which shouldn't make -any difference on the end-user side. Under the hood, Keras will be a different -package imported by `tf_core`, like what we do for TF estimator. +The TensorFlow pip package will auto-install the Keras package, which shouldn't +make any difference on the end-user side. Under the hood, Keras will be a +different package imported by `tf_core`, like what we do for TF estimator. ### Developer experience Impact From 9123f7feb31e72f7ab282e2e712d8622ec7faf72 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 25 Feb 2020 14:40:18 -0800 Subject: [PATCH 16/21] Updating the section for CI and presubmit. --- rfcs/20200205-standalone-keras-repository.md | 61 +++++++++++++++++--- 1 file changed, 54 insertions(+), 7 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index eaafbf609..5ef4f8820 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -219,14 +219,15 @@ down the change for area's like distribution stragey, and other areas that might under active development. With the internal change history between 2019-01-01 and 2020-01-01: -1. There are 6756 changes submitted to tensorflow/python -2. There are 5115 changes submitted to tensorflow/python but not +1. There are 6756 changes submitted to tensorflow/python +2. There are 5115 changes submitted to tensorflow/python but not tensorflow/python/keras. -3. Among the 1641 changes submitted to tensorflow/keras, 1338 of them change -Keras only without touching tensorflow, and 303 of them change both Keras and -TF. -This means about 18.5% change that change Keras will change TF, and 4.4% change -that change TF will touch Keras in the meantime. +3. Among the 1641 changes submitted to tensorflow/keras, 1338 of +them change Keras only without touching tensorflow, and 303 of them change both +Keras and TF. + +This means about 18.5% change that change Keras will change TF, and +4.4% change that change TF will touch Keras in the meantime. Here are some common scenarios: @@ -340,6 +341,52 @@ class existing_layer(Layer): knob2=3) ``` +### Continuous integration and presubmit test +Due to the fact that Keras code is also being used within Google, apart from +the normal Github CI (action) tests, We will also run the same tests internally +against HEAD. +1. Github CI and presubmit test will use a stable version of TF binary during +test. +2. Google CI and presubmit test will run against HEAD for both TF and Keras +code. Note that we won't allow submiting Keras code directly to Google +internal code repo, engineers within Google are still allowed to create changes +internally and run test for it. + +The gap between the HEAD version and TF used by Keras should be +as close as possible. Large gap is expect to cause issue for debugging and code +tracing. + +There are a few common cases that either CI could break: +1. Github CI could break when the version of TF it depend on is changed. We +think this can be mitigated by pinning Keras to a explicit version of TF, rather +than a floating version like `tf-nightly`. The presubmit test when changing the +verison nubmer should catch this. In the case that a new stable version is +breaking some Keras test, we should + + 1a. Disable the failed tests and move forward to minimize the gap between + TF HEAD and Keras used version. Report the isuse TF team for fix. + + 1b. In the case of major breakage, Keras will stay with old version, report + to TF team and get the issue fixed. + + We hope 1b case should be minimized since same tests are running on Google + CI as well. + +2. Google CI could break when a submitted PR for Keras is mirrored into Google +code base. We can't foresee these breakage since we don't run global presumbit +internally for every CL. In the case of breakage, since external contributor +won't notice this, Keras team in Google will: + + 2a. Rollback the original Keras PR if the fault is at Keras side (miss test + coverage, or bad code interface). + + 2b. Update the internal tests to correctly rely on Keras public contract, or + disable the failed test for the moment. + + We hope both case can be minimized with the internal dependency cleanup as + well as only relying on public TF API described above. + + ### Github Repository Migration * For any open Github PR/issue in Keras-team/keras, it need to be copied to From 11f3828d37b9635287b6bdf0d4f69e9a4404aa28 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 25 Feb 2020 14:45:25 -0800 Subject: [PATCH 17/21] Update style and fix typo. --- rfcs/20200205-standalone-keras-repository.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 5ef4f8820..7db6c83c5 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -363,24 +363,25 @@ than a floating version like `tf-nightly`. The presubmit test when changing the verison nubmer should catch this. In the case that a new stable version is breaking some Keras test, we should - 1a. Disable the failed tests and move forward to minimize the gap between - TF HEAD and Keras used version. Report the isuse TF team for fix. + * Disable the failed tests and move forward to minimize the gap between + TF HEAD and Keras used version. Report the issue TF team for fix. - 1b. In the case of major breakage, Keras will stay with old version, report + * In the case of major breakage, Keras will stay with old version, report to TF team and get the issue fixed. - We hope 1b case should be minimized since same tests are running on Google - CI as well. + We hope the second case should be minimized since same tests are running on + Google CI as well. Any change that might break Keras should be caught + with internal presubmits. 2. Google CI could break when a submitted PR for Keras is mirrored into Google code base. We can't foresee these breakage since we don't run global presumbit internally for every CL. In the case of breakage, since external contributor won't notice this, Keras team in Google will: - 2a. Rollback the original Keras PR if the fault is at Keras side (miss test + * Rollback the original Keras PR if the fault is at Keras side (miss test coverage, or bad code interface). - 2b. Update the internal tests to correctly rely on Keras public contract, or + * Update the internal tests to correctly rely on Keras public contract, or disable the failed test for the moment. We hope both case can be minimized with the internal dependency cleanup as From 5de7dfea348bb718443ba0e1b9a29c45faea8021 Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Tue, 25 Feb 2020 15:49:32 -0800 Subject: [PATCH 18/21] Adding a section for Alternative considered. --- rfcs/20200205-standalone-keras-repository.md | 26 ++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 7db6c83c5..d388b1052 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -419,6 +419,32 @@ share more permissions with the community member. See more details about the project permission in https://help.github.com/en/github/setting-up-and-managing-organizations-and-teams/repository-permission-levels-for-an-organization. +### Alternative Considered + +Split the Tensorflow python and c++ code into separate pip package, eg +tf-core and tf-python, and tf-python will use a stable version of tf-core +package to build Tensorflow python. We have to maintain the compatiblity +between c++ layer and python layer, which is currently quite stable. + + Pros: + * This should allow us to enjoy speed up of build time for OSS build, since + build/test TF won't require building all the c kernel, which is the majority + of the build time. Internal CI won't be affected since it will always run + against HEAD. + * All the python code still lives in one repository, so we don't need to + split the change into 2 if it changes Keras and TF python at the same time. + + Cons: + * The change that touch both c kernel and TF python code will need to do the + two stage commit process, if the python change relies on c kernel change. + * Less motivated to cleanup the cross dependency between TF and Keras since + it is no longer a required task. + * With Google internel code repo as source of turth, most of the workflow/ + tools will still be Google centric instead of Github centric. + * Keras-team/keras code base will still be there if we don't move new TF + code to it. Having a staled version out there is not ideal, and we should + really merge them (code/issue/community member) together. + ### Performance Implications There may be some performance implications as we move towards only using From 132f542e627fb65cdcced7f6e2060cf4199eaf7a Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Wed, 26 Feb 2020 15:10:57 -0800 Subject: [PATCH 19/21] Fix typo for LasyLoader --- rfcs/20200205-standalone-keras-repository.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index d388b1052..c49f94985 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -157,13 +157,13 @@ converted to integration tests, or port the tests to Keras repository. * Keras symbolic tensor check in the ops library for tf.function. For usage like tf.layers to keras.layers, it can't be removed due to the API -contract and guarantee. We should use LasyLoader to walk around the cyclic +contract and guarantee. We should use LazyLoader to walk around the cyclic dependency issue. ```python -from tensorflow.python.util.lasy_loader import LasyLoader +from tensorflow.python.util.lazy_loader import LazyLoader -BaseLayer = LasyLoader( +BaseLayer = LazyLoader( 'BaseLayer', globals(), 'keras.layers.Layer') if not BaseLayer: raise ImportError('Keras is not installed, please pip install keras.') From 3cb07db07eaf376268201252427f531c9272cf1c Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Wed, 4 Mar 2020 10:42:51 -0800 Subject: [PATCH 20/21] Update the RFC wrt to the design meeting outcome. --- rfcs/20200205-standalone-keras-repository.md | 37 ++++++++++---------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index c49f94985..84c105437 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -156,24 +156,22 @@ converted to integration tests, or port the tests to Keras repository. * Aliases from tf.losses/metrics/initializers/optimizers in tf.compat.v1. * Keras symbolic tensor check in the ops library for tf.function. -For usage like tf.layers to keras.layers, it can't be removed due to the API -contract and guarantee. We should use LazyLoader to walk around the cyclic -dependency issue. - -```python -from tensorflow.python.util.lazy_loader import LazyLoader - -BaseLayer = LazyLoader( - 'BaseLayer', globals(), 'keras.layers.Layer') -if not BaseLayer: - raise ImportError('Keras is not installed, please pip install keras.') -``` - -Other dependency should be removed as much as possible, eg move the util/code -from Keras to TF, or rework the implementation detail. For any of the -dependencies that have to stay, need to use public Keras API only. A -check will also be added to TF to make sure there isn't any dependencies being -added in future for need of Keras. +The conclusion from the design meetings are: +1. We prefer to have a clear cut, which means any backwards deps from TF to +keras is not accepted. LazyLoading should be used for this case. +2. `feature_column` package will be moved to Tensorflow/Estimator project, +and will still exported under tf.feature_column name space. +3. Legacy `tf.layers` and RNN cell code will move to keras and still export +under same name space. +4. TPU support code will change to use new type spec, which will be proposed by +new RFC from Dan. +5. The saving util will either be moved to TF, or copeid to TF lite. +6. Any unittest in TF that rely on Keras should either be moved to Keras, or +if it is an integration test, we will create a new package for it, which will +do verification e2e. +7. Keras symbolic tensor check in the ops library, we will do some write with +composite tensor. Since this is a implementation details, but not a hard code +level dependency, this shouldn't be a blocking issue. **Note that this is a key point to prevent Keras accidentally break Tensorflow.** @@ -208,6 +206,9 @@ are not public. A decision should be made on a case-by-case basis: * Replace the usage with another alternative TF public API. * Make the functionality a new TF public API. +So far there is a long tail of private API usage, can they shouldn't be blocking +the repo split, as long as the majority of the usage has been addressed. + **Note that the open-source community is encouraged to contribute to this effort.** ### Two-stage change process From 1343f17cd5249e251bf0aa2d02fbfbefc61b7baf Mon Sep 17 00:00:00 2001 From: qlzh727 Date: Wed, 4 Mar 2020 11:08:07 -0800 Subject: [PATCH 21/21] Update the RFC status to 'Accepted' --- rfcs/20200205-standalone-keras-repository.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/20200205-standalone-keras-repository.md b/rfcs/20200205-standalone-keras-repository.md index 84c105437..46f174ca3 100644 --- a/rfcs/20200205-standalone-keras-repository.md +++ b/rfcs/20200205-standalone-keras-repository.md @@ -1,6 +1,6 @@ # Standalone Keras Repository -| Status | Proposed | +| Status | Accepted | :-------------- |:---------------------------------------------------- | | **RFC #** | [202](https://github.com/tensorflow/community/pull/202) | | **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |