Tensorflow Lite model inferencing fast and lean!!
This article is intended to talk more about how TFLite achieves inference over all the different types of edge devices in a fast and lean way
We have a different set of edge devices such as IoT devices, mobile devices, embedded devices, etc. How TFLite is taking inference seamlessly and elegant way. To understand this let us jump into it.
What is an interpreter?
As we know TFLite consists of a set of tools and the TFLite consist of two core components:
The converter will help us to convert deep learning models into the TFLite format and the interpreter makes our life easier while inferencing.
The TensorFlow Lite interpreter, which runs specially optimized models on many different hardware types, including mobile phones, embedded Linux devices, and microcontrollers
TFLite interpreter people refer to interchangeably as inferencing. The term inference refers to the process of executing a TensorFlow Lite model on edge devices in order to make predictions based on user input. To perform inference with a tensorflow lite model, you must run it through interpreter.
TFLite interpreter is designed to be lean and fast to achieve this it uses a static graph ordering and a custom memory allocator to ensure minimal load, initialization, and execution latency.
Step of inferencing:
TensorFlow inference APIs are provided for most common mobile/embedded platforms such as Android, iOS, & Linux, in multiple programming languages. Across all libraries, the TensorFlow Lite API enables you to load models, feed inputs, and retrieve inference outputs.
TFLite interpreter follows below steps in general:
- Loading a model:-
The first and more must step is to load the .tflite model into the memory, which contains the execution graph.
2. Transforming data:-
The model doesn’t understand the raw input data. To make raw compatible into a model understandable format you need to transform the data. For e.g for the computer vision model, you need to resize the input image and then provide that image to model.
3. Running inference:-
Now the model is in memory and data is in the required format let us take the inference. It involves a few steps such as building the interpreter and allocating tensors.
4. Interpreting output:-
After the third step, we will get some output after inference but the end-user won’t understand that. Model results most of the time are probabilities or approximate value. We have interpreted this result into meaningful output.
Let us take model inferencing using python
import numpy as np import tensorflow as tf # Load TFLite model and allocate tensors. interpreter = tf.lite.Interpreter(model_path="converted_model.tflite") interpreter.allocate_tensors() # Get input and output tensors. input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # Test model on random input data. input_shape = input_details['shape'] input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32) interpreter.set_tensor(input_details['index'], input_data) interpreter.invoke() # The function `get_tensor()` returns a copy of the tensor data. # Use `tensor()` in order to get a pointer to the tensor. output_data = interpreter.get_tensor(output_details['index']) print(output_data)
Example in C++, even though language will change or underlining platform will change steps are the same:
// Load the model std::unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile(filename); // Build the interpreter tflite::ops::builtin::BuiltinOpResolver resolver; std::unique_ptr<tflite::Interpreter> interpreter; tflite::InterpreterBuilder(*model, resolver)(&interpreter); // Resize input tensors, if desired. interpreter->AllocateTensors(); float* input = interpreter->typed_input_tensor<float>(0); // Fill `input`. interpreter->Invoke(); //output data float* output = interpreter->typed_output_tensor<float>(0);
In this article, we explored the TFLite interpreter and what are the steps involved in TFLite inferencing and how to do that.