Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAY][FRONTEND] Tensorflow frontend. #2216

Merged
merged 24 commits into from
Feb 5, 2019
Merged

Conversation

srkreddy1238
Copy link
Contributor

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

@srkreddy1238 srkreddy1238 force-pushed the tf-relay branch 4 times, most recently from 79df1dc to 90b99db Compare December 6, 2018 15:06
@srkreddy1238
Copy link
Contributor Author

@jroesch @Huyuwei & @tqchen welcome to review.
Except LSTM now it's equivalent to NNVM frontend.

LSTM is suffering with some strange errors, that too different errors for python2 (LLVM/FCmp) and python3(schedule_injective). I am working on it....

@srkreddy1238 srkreddy1238 changed the title [WIP][RELAY][FRONTEND] Tensorflow frontend. [RELAY][FRONTEND] Tensorflow frontend. Dec 11, 2018
@srkreddy1238 srkreddy1238 force-pushed the tf-relay branch 2 times, most recently from 7a04cdf to 6f9be59 Compare December 31, 2018 15:21
Copy link
Member

@jroesch jroesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we should try to remove all references to NNVM from Relay code so we don't confuse new users.

python/tvm/relay/frontend/tensorflow.py Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/tensorflow.py Outdated Show resolved Hide resolved
tests/python/frontend/tensorflow/test_forward.py Outdated Show resolved Hide resolved
tutorials/relay/from_tensorflow.py Outdated Show resolved Hide resolved
@srkreddy1238 srkreddy1238 force-pushed the tf-relay branch 2 times, most recently from 901af11 to 9a704fd Compare January 7, 2019 16:32
@srkreddy1238
Copy link
Contributor Author

@jroesch thanks for the review.
I enhanced with docstrings cleanup and moved the testing.tf to relay. You may have another look.
#2382 is a dependency for this PR. Now except LSTM everything work good.

I came across some strange error for LSTM, yet to debug

Does the below log ring any bell ??

Traceback (most recent call last):
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/backend/compile_engine.py", line 78, in lower
    return _backend._CompileEngineLower(self, key)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 72, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: TVMCall CFunc Error:
Traceback (most recent call last):
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 55, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/op/op.py", line 186, in schedule_injective
    return topi.generic.schedule_injective(outputs)
  File "<decorator-gen-62>", line 2, in schedule_injective
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/target.py", line 273, in dispatch_func
    return generic_func_node(*args)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/target.py", line 135, in __call__
    return _api_internal._GenericFuncCallFunc(self, *args)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 72, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: TVMCall CFunc Error:
Traceback (most recent call last):
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 55, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/srk/.local/lib/python3.6/site-packages/topi-0.5.dev0-py3.6.egg/topi/x86/injective.py", line 26, in schedule_injective
    if len(s[x].op.axis) >= 5:
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/node.py", line 59, in __getattr__
    "'%s' object has no attribute '%s'" % (str(type(self)), name))
AttributeError: '<class 'tvm.tensor.PlaceholderOp'>' object has no attribute 'axis'

@jroesch
Copy link
Member

jroesch commented Jan 9, 2019

It looks like a bug in the schedule, maybe fusion is introducing a pattern the master schedule is not written for? Do you know what operator this is failing for? we should probably modify the compile engine to dump more debug information when an exception happens while scheduling.

@srkreddy1238
Copy link
Contributor Author

@jroesch
Above situation is caused due to Concatenate operator with one Tensor in list. For now I handled it in front end.
We could decide to make Concatenate behave nice with this situation or leave it.

@srkreddy1238
Copy link
Contributor Author

@jroesch
Concatenate issue persists still. It's reproducible when place holders are passed to concatenate.

Traceback (most recent call last):
  File "./tests/python/frontend/tensorflow/test_forward.py", line 1059, in <module>
    test_forward_ptb()
  File "./tests/python/frontend/tensorflow/test_forward.py", line 871, in test_forward_ptb
    params, m = _get_tvm_graph_module(graph_def)
  File "./tests/python/frontend/tensorflow/test_forward.py", line 817, in _get_tvm_graph_module
    graph, lib, params = relay.build(sym, target, params=params)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/build_module.py", line 241, in build
    graph_json, lowered_funcs, params = graph_gen.codegen(func)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/backend/graph_runtime_codegen.py", line 349, in codegen
    self.heads = self.visit(func.body)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/expr_functor.py", line 27, in visit
    res = self.visit_call(expr)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/backend/graph_runtime_codegen.py", line 235, in visit_call
    cached_func = self.compile_engine.lower(func, self.target)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/backend/compile_engine.py", line 86, in lower
    raise RuntimeError(msg)
RuntimeError: Traceback (most recent call last):
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/backend/compile_engine.py", line 78, in lower
    return _backend._CompileEngineLower(self, key)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 72, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: TVMCall CFunc Error:
Traceback (most recent call last):
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 55, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/op/op.py", line 186, in schedule_injective
    return topi.generic.schedule_injective(outputs)
  File "<decorator-gen-62>", line 2, in schedule_injective
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/target.py", line 273, in dispatch_func
    return generic_func_node(*args)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/target.py", line 135, in __call__
    return _api_internal._GenericFuncCallFunc(self, *args)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 72, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: TVMCall CFunc Error:
Traceback (most recent call last):
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 55, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/srk/.local/lib/python3.6/site-packages/topi-0.5.dev0-py3.6.egg/topi/x86/injective.py", line 26, in schedule_injective
    if len(s[x].op.axis) >= 5:
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/node.py", line 59, in __getattr__
    "'%s' object has no attribute '%s'" % (str(type(self)), name))
AttributeError: '<class 'tvm.tensor.PlaceholderOp'>' object has no attribute 'axis'


Error during compile func
--------------------------
fn (%p0: Tensor[(1, 10000), float32],
    %p1: Tensor[(1, 2, 1, 200), float32],
    %p2: Tensor[(1, 2, 1, 200), float32])
    -> Tuple[Tensor[(1, 10000), float32], Tensor[(2, 2, 1, 200), float32]] {
  %0 = (%p1, %p2)
  %1 = concatenate(%0) # ty=Tensor[(2, 2, 1, 200), float32]
  %2 = (%p0, %1)
  %2
}
--------------------------

@srkreddy1238
Copy link
Contributor Author

@jroesch #2412 fixes the above problem.

But LLVM has below issue with fuse. Disabling fuse works fine though.

Traceback (most recent call last):
  File "tests/python/frontend/tensorflow/test_forward.py", line 1060, in <module>
    _test_forward_ptb()
  File "tests/python/frontend/tensorflow/test_forward.py", line 872, in _test_forward_ptb
    params, m = _get_tvm_graph_module(graph_def)
  File "tests/python/frontend/tensorflow/test_forward.py", line 818, in _get_tvm_graph_module
    graph, lib, params = relay.build(sym, target, params=params)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/relay/build_module.py", line 242, in build
    mod = _tvm_build_module(lowered_funcs, target=target, target_host=target_host)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/build_module.py", line 592, in build
    mhost = codegen.build_module(fhost_all, str(target_host))
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/codegen.py", line 20, in build_module
    return _Build(lowered_func, target)
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/_ctypes/function.py", line 185, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/home/srk/.local/lib/python3.6/site-packages/tvm-0.5.dev0-py3.6-linux-x86_64.egg/tvm/_ffi/base.py", line 72, in check_call
    raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [12:16:42] /home/srk/work/DMLC/tvm/src/codegen/llvm/llvm_module.cc:173: LLVM module verification failed with the following errors: 
Invalid operand types for FCmp instruction
  %293 = fcmp oeq i8* %34, %292

@srkreddy1238
Copy link
Contributor Author

All test cases pass now. Above issue specific to PTB model is unrelated to frontend.
I disabled optimisation only for that test case for now.
@Huyuwei @zhreshold welcome to review.

@srkreddy1238
Copy link
Contributor Author

@jroesch welcome to further review ?

@tqchen
Copy link
Member

tqchen commented Jan 18, 2019

@srkreddy1238
Copy link
Contributor Author

@kazum thanks for the review. You may take another look and conclude.

@srkreddy1238
Copy link
Contributor Author

@ZihengJiang & @yzhliu request to consider this for 0.5 release if possible.

@tqchen
Copy link
Member

tqchen commented Feb 3, 2019

@kazum
Copy link
Contributor

kazum commented Feb 4, 2019

@srkreddy1238 My previous comments are not addressed in the latest version.

@srkreddy1238
Copy link
Contributor Author

@kazum handled now :)

Copy link
Contributor

@kazum kazum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Member

@yzhliu yzhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @srkreddy1238 I missed your message. I am peronally very willing to bring this into v0.5 release, while on the other hand we’ve already started voting against a commit-id. What’s your thought? @tqchen @ZihengJiang

@yongwww
Copy link
Member

yongwww commented Feb 5, 2019

good to see this pr get approved, it would be great to see it get merged soon.

@srkreddy1238
Copy link
Contributor Author

@yzhliu no problem :)

@yzhliu
Copy link
Member

yzhliu commented Feb 5, 2019

Thanks everyone for contribution and reviewing. Let's merge.

@yzhliu yzhliu added status: accepted and removed status: need review status: need update need update based on feedbacks labels Feb 5, 2019
@yzhliu yzhliu merged commit 2f859d7 into apache:master Feb 5, 2019
libing4752 pushed a commit to libing4752/tvm that referenced this pull request Feb 18, 2019
* [RELAY][FRONTEND] Tensorflow frontend support.

* 	* LSTM removed for a while.

* 	* basic ops are good.

* 	* nn wip

* 	* wip

* 	* python2.7 corrections.

* * NN ops are good.

* * e2e models working good

* 	* all good except LSTM

* 	* rebase, tutorials and CI trigger.

* 	* CI errors.

* 	* enable opt_level=3

* 	* Docstrings cleanup. testing.tf utils moved to relay from nnvm.

* 	* tutorials update.

* 	* LSTM work good now.

* 	* Rebase

* 	* CI error

* 	* enable PTB.

* 	* rebase.

* 	* tutorials

* Update python/tvm/relay/frontend/tensorflow.py

Co-Authored-By: srkreddy1238 <[email protected]>

* 	* review comments.

* 	CI fix.

* 	* review comments.
merrymercy pushed a commit to merrymercy/tvm that referenced this pull request Feb 18, 2019
* [RELAY][FRONTEND] Tensorflow frontend support.

* 	* LSTM removed for a while.

* 	* basic ops are good.

* 	* nn wip

* 	* wip

* 	* python2.7 corrections.

* * NN ops are good.

* * e2e models working good

* 	* all good except LSTM

* 	* rebase, tutorials and CI trigger.

* 	* CI errors.

* 	* enable opt_level=3

* 	* Docstrings cleanup. testing.tf utils moved to relay from nnvm.

* 	* tutorials update.

* 	* LSTM work good now.

* 	* Rebase

* 	* CI error

* 	* enable PTB.

* 	* rebase.

* 	* tutorials

* Update python/tvm/relay/frontend/tensorflow.py

Co-Authored-By: srkreddy1238 <[email protected]>

* 	* review comments.

* 	CI fix.

* 	* review comments.
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
* [RELAY][FRONTEND] Tensorflow frontend support.

* 	* LSTM removed for a while.

* 	* basic ops are good.

* 	* nn wip

* 	* wip

* 	* python2.7 corrections.

* * NN ops are good.

* * e2e models working good

* 	* all good except LSTM

* 	* rebase, tutorials and CI trigger.

* 	* CI errors.

* 	* enable opt_level=3

* 	* Docstrings cleanup. testing.tf utils moved to relay from nnvm.

* 	* tutorials update.

* 	* LSTM work good now.

* 	* Rebase

* 	* CI error

* 	* enable PTB.

* 	* rebase.

* 	* tutorials

* Update python/tvm/relay/frontend/tensorflow.py

Co-Authored-By: srkreddy1238 <[email protected]>

* 	* review comments.

* 	CI fix.

* 	* review comments.
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
* [RELAY][FRONTEND] Tensorflow frontend support.

* 	* LSTM removed for a while.

* 	* basic ops are good.

* 	* nn wip

* 	* wip

* 	* python2.7 corrections.

* * NN ops are good.

* * e2e models working good

* 	* all good except LSTM

* 	* rebase, tutorials and CI trigger.

* 	* CI errors.

* 	* enable opt_level=3

* 	* Docstrings cleanup. testing.tf utils moved to relay from nnvm.

* 	* tutorials update.

* 	* LSTM work good now.

* 	* Rebase

* 	* CI error

* 	* enable PTB.

* 	* rebase.

* 	* tutorials

* Update python/tvm/relay/frontend/tensorflow.py

Co-Authored-By: srkreddy1238 <[email protected]>

* 	* review comments.

* 	CI fix.

* 	* review comments.
@yzhliu yzhliu mentioned this pull request Mar 2, 2019
28 tasks
@srkreddy1238 srkreddy1238 deleted the tf-relay branch January 24, 2020 04:38
FranckQC pushed a commit to FranckQC/tvm that referenced this pull request Jul 26, 2024
This solved the issue with LWP that appears with maxpool.

The problem was that the LWP handler was forgetting to save p0 (used by the handler). This predicate register needs to be saved too, just like r0-r5, as it had been decided that it was the responsibility of the handler to save everything (even these theoretically caller-saved registers).
Said differently, since it had been decided that calling the LWP handler would not follow the normal ABI, and that the LWP handler would save everything it touches (even normally caller-saved registers like r0-r15 and p0-3), then it absolutely needs to save the predicate registers too (in particular p0, which was causing the issue).

The issue appeared only with maxpool because it's the only one that had a state saved in p0 before calling the LWP handler. And this call destroyed the content of what it had saved, making it subsequently branch to different portions of the code.

Fix: Allocate 32 bytes (instead of 24 previously), in order to save p3:0, and I save those at the bottom of the stack. Restore it at the end of the LWP handler.
quic-sanirudh pushed a commit that referenced this pull request Jul 28, 2024
* Fix LWP assembly handler (predicate register) (#2216)

This solved the issue with LWP that appears with maxpool.

The problem was that the LWP handler was forgetting to save p0 (used by the handler). This predicate register needs to be saved too, just like r0-r5, as it had been decided that it was the responsibility of the handler to save everything (even these theoretically caller-saved registers).
Said differently, since it had been decided that calling the LWP handler would not follow the normal ABI, and that the LWP handler would save everything it touches (even normally caller-saved registers like r0-r15 and p0-3), then it absolutely needs to save the predicate registers too (in particular p0, which was causing the issue).

The issue appeared only with maxpool because it's the only one that had a state saved in p0 before calling the LWP handler. And this call destroyed the content of what it had saved, making it subsequently branch to different portions of the code.

Fix: Allocate 32 bytes (instead of 24 previously), in order to save p3:0, and I save those at the bottom of the stack. Restore it at the end of the LWP handler.

* Remove training spaces

---------

Co-authored-by: Slama, Franck <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants