-
Notifications
You must be signed in to change notification settings - Fork 29
Tensorflow Lite (Introduction, motivation and few components in depth)
TensorFlow Lite was proposed as a lightweight solution that focusses on mobile and other embedded devices. Using TensorFlow Lite on-device machine learning inference can be performed in an efficient manner. Efficiency here is measured in terms in terms of latency , size of the file and code footprint. Different techniques have been explored for achieving low latency: optimizing and tailoring kernels for mobile specifically, few pre-fused and pre-trained activations, and the proposal of quantized kernels that allow smaller and faster models.
In order to support fast inference and support machine learning processing (training and inference) on mobile and embedded devices, a light-weight framework is a necessity. There is also a need for:
- On-device real-time computer-vision.
- On-device spoken language understanding.
- Support for stronger user data privacy paradigms (user data resides strictly on-device).
- Serving ‘offline’ use cases when the device isn't connected to the internet.
To support the use cases mentioned above, a framework like TensorFlow Lite would be a requirement.
TensorFlow Lite has the following new components on top of what TensorFlow supports:
- A set of core operators, specifically tuned for mobile platforms (iOS and Android). These operators support both quantized an float operations and have few pre-fused activations to enhance accuracy. These can be used to create and run custom models.
- Custom operations can be defined in kernels by the developer.
- A new model file format that uses the FlatBuffers format. FlatBuffers is an open-sourced cross-platform serialization library (similar to protocol buffers used in TensorFlow). FlatBuffers comes with few advantages over protocol buffers: no parsing step needed before accessing data, per-object memory allocation, lesser code footprint.
- Mobile-optimized interpreter that focuses on making the speed and size of the app small. The interpreter uses static graph ordering, custom memory allocator and kernels that focus on making the loading, initialization and latency very efficient.
- TensorFlow converter that converts TensorFlow models to the TensorFlow Lite model format.
- Android Neural Networks library. This library can be leveraged for hardware acceleration if the device supports it.
- Smaller in size than TensorFlow Mobile. TensorFlow Lite is < 300KB in size when all operators are linked and <= 200KB when using only the operators needed for standard supported models (MobileNet and InceptionV3).
- Java and C++ API support
The TensorFlow Lite architecture is as follows:
At the moment it supports both Android and iOS applications. The first step is to convert a trained TensorFlow model to TensorFlow Lite file format (.tflite) using the TensorFlow Lite Converter as mentioned above. This converted model file can now be used in the application.
For deploying the model, following are the important components as shown in the diagram above:
- Java API: A wrapper around the C++ API (for Android).
- C++ API: This component is responsible for loading the model file and invoking the interpreter for further processing and execution. The same library is used for Android and iOS.
- Interpreter: This component executes the model using the defined kernels. The interpreter supports selective kernel loading and developers can also define their own custom kernels that can be used to perform the execution.
- On few Android devices, the Interpreter can use the Android Neural Networks API for hardware acceleration or default to using a CPU.
- Android Neural Network API
- FlatBuffers
- TensorFlow for Poets
- MobileNets (kexinzhao)