mlir in tensorflow training? #23

Cuttstage · 2021-12-09T12:14:28Z

Hi, Is it possible for using mlir to make LLVM IR in machine training like GPU support?
I can not find any code in tensroflow to use mlir turn back to tensorflow executor. Therefore, mlir is only useful for inference? I wonder if tensorflow could change make some ops into IR process and then merge IR result back to tensorflow process?

joker-eph · 2021-12-10T00:40:02Z

MLIR is used in Grappler now, which is plugged into the executor. MLIR is also used more and more to implement the XLA CPU/GPU compiler (which emits LLVM), and it is used for both inference and training (it powers JAX for example).

You'd have to be more specific about what you're trying to achieve though? MLIR has been used so far for fairly low-level pieces of infrastructure inside the TensorFlow/XLA ecosystem.

joker-eph · 2021-12-10T00:41:31Z

Seems like you have some misconfigured auto-reply here :)

Cuttstage · 2021-12-10T03:23:54Z

MLIR is used in Grappler now, which is plugged into the executor. MLIR is also used more and more to implement the XLA CPU/GPU compiler (which emits LLVM), and it is used for both inference and training (it powers JAX for example).

You'd have to be more specific about what you're trying to achieve though? MLIR has been used so far for fairly low-level pieces of infrastructure inside the TensorFlow/XLA ecosystem.

Thanks for your reply. I have found some code about tfg in the lastest tensorflow repo. I am a newer to MLIR and hope to use this feature in our machine train. :)

stellaraccident · 2021-12-10T04:22:07Z

MHLO (which is what this repo contains) is the native IR of JAX, which is used heavily for training (on both CPU/GPU/TPU via XLA). However, "training" can mean many things. For example, here is a prototype of a new API we are working on for saving off an entire Jax training program (in this case, a simple mnist model) so it can be run offline via IREE on single CPU/GPU systems (which happens to include a large swath of mobile and embedded devices):

Sample of saving a program/model: https://github.com/google/iree-samples/blob/main/iree-jax/examples/staged_mnist_export.py
Sample standalone trainer: https://github.com/google/iree-samples/blob/main/iree-jax/examples/run_trainer.py
Dump of the corresponding IR (which can be compiled via IREE): https://gist.github.com/stellaraccident/c2f1ae7841c2a7fbc07c778874c50655

This is but one example of a training setup. If looking for distributed training, that is a more advanced topic. Also, the above is just a prototype: we are looking to finish it for everyone to use in the coming months.

Cuttstage · 2021-12-12T08:27:04Z

MHLO (which is what this repo contains) is the native IR of JAX, which is used heavily for training (on both CPU/GPU/TPU via XLA). However, "training" can mean many things. For example, here is a prototype of a new API we are working on for saving off an entire Jax training program (in this case, a simple mnist model) so it can be run offline via IREE on single CPU/GPU systems (which happens to include a large swath of mobile and embedded devices):

Sample of saving a program/model: https://github.com/google/iree-samples/blob/main/iree-jax/examples/staged_mnist_export.py

Sample standalone trainer: https://github.com/google/iree-samples/blob/main/iree-jax/examples/run_trainer.py

Dump of the corresponding IR (which can be compiled via IREE): https://gist.github.com/stellaraccident/c2f1ae7841c2a7fbc07c778874c50655

This is but one example of a training setup. If looking for distributed training, that is a more advanced topic. Also, the above is just a prototype: we are looking to finish it for everyone to use in the coming months.

Thanks a lot. We will have a try. But right now our producing model is running on tf. It will be a great work to move tf to jax.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlir in tensorflow training? #23

mlir in tensorflow training? #23

Cuttstage commented Dec 9, 2021

joker-eph commented Dec 10, 2021

joker-eph commented Dec 10, 2021

Cuttstage commented Dec 10, 2021

stellaraccident commented Dec 10, 2021 •

edited

Loading

Cuttstage commented Dec 12, 2021

mlir in tensorflow training? #23

mlir in tensorflow training? #23

Comments

Cuttstage commented Dec 9, 2021

joker-eph commented Dec 10, 2021

joker-eph commented Dec 10, 2021

Cuttstage commented Dec 10, 2021

stellaraccident commented Dec 10, 2021 • edited Loading

Cuttstage commented Dec 12, 2021

stellaraccident commented Dec 10, 2021 •

edited

Loading