Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Standalone Keras Repository #202

Merged
merged 23 commits into from
Mar 4, 2020
Merged

Conversation

qlzh727
Copy link
Member

@qlzh727 qlzh727 commented Feb 6, 2020

Comment period is open till 2/25/2020.

Standalone Keras Repository

Status Proposed
RFC # 202
Author(s) Qianli Zhu ([email protected]), Francois Chollet ([email protected])
Sponsor Karmel Allison ([email protected])
Updated 2020-02-05

Objective

Move the Keras code from the TensorFlow main GitHub repository to its own
repository, with TensorFlow as a dependency.

rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved
rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved

### Two-stage change process

For any change that is affecting both TensorFlow and Keras, the change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we have again the cache sharing problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you detail what is the cache you are mentioning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved

## Questions and Discussion Topics

1. Tools for issue tracking: we can't rely on Google-internal bug tracking tool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This another critical point that need to be solved. We cannot have a weaker issue tracking than TF. Check the full thread at #29

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the notice. I will check with infra team and support team to see if they have any suggestions for improving this situation. Will leave this open for discussion in the design meeting.

rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved
@bhack
Copy link
Contributor

bhack commented Feb 7, 2020

Please announce this on the TF dev mailinglist(s) as usual. /cc @ewilderj

@pavanky
Copy link

pavanky commented Feb 7, 2020

Will the ci still run keras tests when changes are made to the tensorflow repository ?

* pip package management. Keras will now follow the `tf-estimator` approach.
"pip install tensorflow" should also install Keras (from PyPI) as well.
There are more details for the pip package in the
[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also considerconda this time? Anaconda is a huge part of Data Science and ML ecosystem and tf.keras not being a part of it doesn't feel right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gunan from TF build team, and also @seanpmorgan who mentioned this in the SIG-addon meeting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this item on the design review meeting. So far this is not active target we are supporting. The meeting conclusion are:

Conda uses a different toolchain, with no ABI compatibility between TF+Conda. Because we don’t have C interfaces, any custom ops built for TF will not work for Conda, and vice versa. There is a set of people who publish TF for Conda. But tf addons, io, etc. are not published for Conda. There’s no easy way around this because it would require doubling the TF release burden entirely.
Keras in theory should be easier? Insofar as it is Python only (currently). But Keras will in the future include C(++). That should be fine as long as there is a Python interface to the exposed parts, or C parts are used internally.

* Replace the usage with another alternative TF public API.
* Make the functionality a new TF public API.

**Note that the open-source community is encouraged to contribute to this effort.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice description of the problem. Since there is a lot of decision making going on there, I believe it would be nice for people in charge of decisions to open issues describing what they want to do for all the unwanted imports we'll have. It's hard to help when you don't know what will be decided to resolve the problem (here option 1? 2? or 3?).

If we don't do that, I believe the open-source community won't help much just because of a lack of guidance.

Co-Authored-By: Frédéric Branchaud-Charron <[email protected]>
@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved
rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved
rfcs/20200205-standalone-keras-repository.md Outdated Show resolved Hide resolved
rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved
* We expect the majority of the code development/contribution from GitHub
and the dev tools / tests / scripts should focus on the GitHub development use
case. See below for more details.
* Keras CI/presubmit build for the GitHub repo should target a stable PIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the disadvantage of slowing down keras's adoption of new TF features and APIs.

This document should clarify when would CI against tf stable be expected to break, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also important to highlight the release process for TF)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not supposed to slow down adoption of new features compared to targeting an unpinned nightly all the time. Maybe it's not clear in the document, but if a pull request requires a new feature in tensorflow nightly, the person making the pull request can change the targeted version of tensorflow in the CI. So we don't have to wait at all.

Do I get it right or you were referring to another problem?

rfcs/20200205-standalone-keras-repository.md Outdated Show resolved Hide resolved
rfcs/20200205-standalone-keras-repository.md Show resolved Hide resolved
rfcs/20200205-standalone-keras-repository.md Outdated Show resolved Hide resolved
developer community.

In addition, by getting the Keras team at Google to start developing Keras
using the same public tools and infrastructure as third-party developers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this point into multiple sub-points, because I believe that this argument is not sufficiently highlighted.

  • Github is not made to manage monorepos like tensorflow. Keras issues and pull requests get mixed up with the tensorflow core ones.
  • The source of truth being in github is very important because it allows the community to manage the repo. Currently, some commits pop up without any public pull request or public user, making it impossible for non-googlers to understand what is the rational behind a commit, or even who to contact if it caused a bug: e.g. tensorflow/tensorflow@9698ae1
  • As long as keras is in tensorflow, it's hard for us to have any impact on the processes/tests/tools that keras use internally. Imagine for example that we want to use Isort, or pytest, or github actions... In all cases it's a lot more work (code) and a lot more people who need to review and give us a green light. With a repository with the size of tensorflow, it's just unfeasable unless we have a face-to-face meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keras issues and pull requests get mixed up with the tensorflow core ones.

before the split, we had the same issue. Not sure how to fix that.

@x10000year
Copy link

x10000year commented Feb 18, 2020

If keras is going to be moved to a standalone repo, are v1 apis (tf.get_variable(), tf.variable_scope(), etc.) for building static graph still available in the tensorflow repo? Can we build custom models without using keras? The v1 apis are really very simple and flexible to use for building static graph, while the keras api forces people to use OOP style and gives a lot of pain for industrial engineers who are familiar with static graphs and functional style.

@qlzh727
Copy link
Member Author

qlzh727 commented Feb 18, 2020

If keras is going to be moved to a standalone repo, are v1 apis (tf.get_variable(), tf.variable_scope(), etc.) for building static graph still available in the tensorflow repo? Can we build custom models without using keras? The v1 apis are really very simple and flexible to use for building static graph, while the keras api forces people to use OOP style and gives a lot of pain for industrial engineers who are familiar with static graphs and functional style.

All the public API symbols will still remain available for v1.

1. Udpate the motivation to emphasize the modularity of TF.
2. Update the section for reverse dependency from TF to keras.
These numbers might help us decide whether the split PR execise is too
much a overhead or not.
@ematejska ematejska added the RFC: Proposed RFC Design Document label Feb 28, 2020
@qlzh727
Copy link
Member Author

qlzh727 commented Mar 4, 2020

@ematejska, I have updated RFC with latest result from the design meeting. Would u like to have the meeting notes also being posted here?

@ematejska
Copy link
Contributor

@qlzh727 Yes please.

@ematejska
Copy link
Contributor

Also, if you could update the status of the RFC to Accepted, that would be appreciated. Thanks.

@ematejska ematejska added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Mar 4, 2020
@qlzh727
Copy link
Member Author

qlzh727 commented Mar 4, 2020

Meeting notes from the design review on Mar 03 (taken by @karmel)

Notes

  • Dependencies between TF and Keras. Two ways: Keras on non-public TF; TF backwards on Keras. Majority of TF -> Keras dependencies are in higher level TF APIs (old tf.layers; feature columns).

    • For v1 things, eg layers, we will freeze a copy of necessary APIs (eg base layer v1) in core TF. This has the advantage of also preventing accidental changes.
    • Higher cost; requires a refactor. Relying on existing unit tests to tell us if we break something.
    • Lazy loading is easier in many ways.
    • But, deceptively easy; may lead to integration pains down the line.
    • If we can decouple future changes to Keras from tf.layers, we will be happier.
    • Can we move v1 keras layer to tf core? Ends up being tricky-- graph/eager is the split in Keras versus v1/v2. Version selecting.
    • If we make a copy, what is the scope of the copy? Strictly to break the circular dependencies. Trim down current base_layer to reduce the amount of code copied to TF.
    • So, concretely, what would be copied for tf.layers? There is a very long blaze dep list right now, but doesn’t need to be. Eg mixed precision imports optimizers.
    • Split out legacy tf.layers into Keras? In tf we re-export that. Requires we inject into TF namespaces from other packages, but maybe that’s not so bad.
    • Can tf.feature_column, tf.layers be separated out and depend on keras and TF?
    • tf.lite depends on some private utils from Keras saving. Some of these can be forked. They are at least public APIs in general.
    • Suggestion: move tf.feature_column to Estimator; move tf.layers to Keras (in some sort of legacy quarantine)
    • Unit tests move either to the top-most package, or into a separate integration test package. Distribution strategies is one of the largest clients here. Would be inconvenient if those tests lived in Keras (since change has to be split); moving to integration tests outside TF seems a better solution long-term. Also allows “import tensorflow as tf”, which is preferable. Can be run on nightly and head.
  • Conclusions:

    • tf.feature_column to Estimator
    • RNN Cells to Keras
    • tf.layers to Keras or frozen; whichever is less painful.
    • Integration tests to new repo
    • TPU is just instance checks; resolve with tf.types
    • TF Lite either fork utils or move to tf core
    • No LazyLoader
  • Keras deps on private TF symbols

    • Head cases (count >= 5) we can resolve by using different APIs in Keras
    • Long tail (count < 5). What to do with those? Many are in dataset APIs.
    • apassos: We are less concerned about these private usages; we can address over time. We will need to get owners of private APIs on board.
    • wicke: I don’t think we should split before we have a CL that gets us to where estimator is today. This list is too long; we need to narrow the list down to fewer of these before we can realistically split.
    • scottzhu: currently I am addressing the most-used of these. Generated CL from mihai’s script that converts all public. Can take care of head cases and leave longer tail.
    • raw_ops for ops.
    • CondBranchFuncGraph-- why do we need this? Pretty ugly check. Seems to be mostly to raise a better error. Likely what this means is that some amount of FuncGraph needs to be made public and documented so that we can allow this. Or improve error messages from core; we are solving this for Keras, but other users will have similar problems. A new public API that says, I am in a conditional branch of the graph.
    • Much of the long-tail will be case-by-case
    • How do we avoid a proliferation of experimental APIs that we depend on? That’s better than private, since at least API owners see and govern changes to experimental. We generally require global tap, and one-release deprecation before removing.
  • Source of truth is Github

    • What happens if external change to Keras can’t be brought into google3 because something internal breaks? It will happen, and it will be painful. It will eat up time, and has for Swift, MLIR, etc. We must add tests to OSS keras every time this happens. But that’s the cost of having it be OSS first.
    • That’s a cost, but also a good forcing function-- our APIs should be stable.
    • We may have to have a policy that says, if you are on private APIs, we may break you on syncs.
    • Syncing should ideally be frequent, but may fail and require manual intervention.
    • When are we ready to make the split?

apassos: no circular deps; triaging private symbols + preventing backsliding; for the gaps, we have a plan.

  • We did all this work to bring in Keras, now we’re separating it back out. In was critical for 2.0. What will we do at the next big change?

  • In many ways, the way we did the 2.0 change was too entangled. And would have been better if it had been more structured. But now we’ve learned and grown, and we can use that going forward.

  • From OSS community: conda packages?

    • Conda uses a different toolchain, with no ABI compatibility between TF+Conda. Because we don’t have C interfaces, any custom ops built for TF will not work for Conda, and vice versa. There is a set of people who publish TF for Conda. But tf addons, io, etc. are not published for Conda. There’s no easy way around this because it would require doubling the TF release burden entirely.
    • Keras in theory should be easier? Insofar as it is Python only (currently). But Keras will in the future include C(++). That should be fine as long as there is a Python interface to the exposed parts, or C parts are used internally.
  • How are we going to manage issues and PRs?

    • PRs cannot span the two repos. Must be submitted separately, without breaking CI.
    • For issues, will we transfer between the repos? We can crosslink if the issue moves to TF. Issue in Keras will stay, and link to the TF issue.
    • We will have to reset permissions for keras repo.
    • Automatically pull in PRs to CLs?

@qlzh727
Copy link
Member Author

qlzh727 commented Mar 4, 2020

Also, if you could update the status of the RFC to Accepted, that would be appreciated. Thanks.

Done.

@bhack
Copy link
Contributor

bhack commented Mar 4, 2020

We need to handle cross repositories PR/ISSUE but we don't have this feature naturally in Github. See https://codetree.com/guides/managing-issues-across-multiple-github-repositories

@ematejska ematejska merged commit 1ee72f3 into tensorflow:master Mar 4, 2020
@bhack bhack mentioned this pull request Apr 4, 2020
@bhack
Copy link
Contributor

bhack commented May 28, 2020

What Is the timeline to reactivate the standalone repository?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes RFC: Accepted RFC Design Document: Accepted by Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.