Tensorflow Lite model inferencing fast and lean!!

This article is intended to talk more about how TFLite achieves inference over all the different types of edge devices in a fast and lean way


We have a different set of edge devices such as IoT devices, mobile devices, embedded devices, etc. How TFLite is taking inference seamlessly and elegant way. To understand this let us jump into it.

What is an interpreter?

As we know TFLite consists of a set of tools and the TFLite consist of two core components:

  1. Converter
  2. Interpreter

The converter will help us to convert deep learning models into the TFLite format and the interpreter makes our life easier while inferencing.

The TensorFlow Lite interpreter, which runs specially optimized models on many different hardware types, including mobile phones, embedded Linux devices, and microcontrollers

TFLite interpreter people refer to interchangeably as inferencing. The term inference refers to the process of executing a TensorFlow Lite model on edge devices in order to make predictions based on user input. To perform inference with a tensorflow lite model, you must run it through interpreter. 

TFLite interpreter is designed to be lean and fast to achieve this it uses a static graph ordering and a custom memory allocator to ensure minimal load, initialization, and execution latency.

Step of inferencing:

TensorFlow inference APIs are provided for most common mobile/embedded platforms such as Android, iOS, & Linux, in multiple programming languages. Across all libraries, the TensorFlow Lite API enables you to load models, feed inputs, and retrieve inference outputs.

TFLite interpreter follows below steps in general:

  1. Loading a model:- 

 The first and more must step is to load the .tflite model into the memory, which contains the execution graph.

2. Transforming data:- 

 The model doesn’t understand the raw input data. To make raw compatible into a model understandable format you need to transform the data. For e.g for the computer vision model, you need to resize the input image and then provide that image to model.

3. Running inference:- 

Now the model is in memory and data is in the required format let us take the inference. It involves a few steps such as building the interpreter and allocating tensors.

4. Interpreting output:-

After the third step, we will get some output after inference but the end-user won’t understand that. Model results most of the time are probabilities or approximate value. We have interpreted this result into meaningful output. 


Example:-

Let us take model inferencing using python 

import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Example in C++, even though language will change or underlining platform will change steps are the same:

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if desired.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();
//output data
float* output = interpreter->typed_output_tensor<float>(0);

Conclusion:- 

In this article, we explored the TFLite interpreter and what are the steps involved in TFLite inferencing and how to do that.

Reference:-

https://www.tensorflow.org/lite/guide/inference

Kafka Idempotent Producer!!

Kafka idempotent producer this is just the term but what exactly mean bu idempotent producer.

Let us first try to understand what is mean by an idempotent.

“Denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself”. — Google dictionary 

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application.

Now we know basically in HTTP verbs get is the Idempotent operation.

As seen in the above picture you can imagine, a producer will be publishing data over the broker and some times due to network error you will see a duplicate message in Kafka. When the producer sends the message to kafka topic, you can introducer duplicate message due to network error. 

kafka producer

Good request flow: Producer publish the message over kafka and kafka say I got the message and I committed and this the ack and producer listening to the ack.

In Failure case: Producer publishes the message over kafka and kafka says I got the message and I committed and this the ack and due to network error producer unable to get the ack and here is the problem. Then in the above case producer say I am going to try because I haven’t received ack from kafka. The producer publishes the same message and this makes the data duplication. 

  1. If the producer resends the message it creates duplicate data.
  2. If the producer doesn’t resend the message then message lost.

How to solve it?

Kafka provides “at least once” delivery semantics. This means that a message that is sent may be delivered one or more times. In kafka ≥0.11 released in 2017, you can configure “idempotent producer”, which won’t introducer duplicate data. To stop processing a message multiple times, it must be persisted to Kafka topic only once. During initialization, unique ID gets assigned to the producer which is called producer ID or PID.

In this flow after network failure also kafka doesn’t make duplicate data, even though the producer publishes a message multiple times till he receives the ack, because of Producer ID or PID.

using PID

To achieve this as a programmer we don’t have to do anything.

producer = Producer({'bootstrap.servers': ‘localhost:9092’,          'message.send.max.retries': 10000000,          'enable.idempotence': True})

Enable the idempotence is true and kafka producer will take care of everything for you.

message.send.max.retries= Integer.MAX_VALUE #which is really huge number

Just consider this in my mind how times you want to retry. The early Idempotent Producer was forcing max.in.flight.requests.per.connection to 1 but in the latest releases it can now be used with max.in.flight.requests.per.connection set to up to 5 and still keep its guarantees.

Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer.

Reference document:- 

Apache Kafka
You’re viewing documentation for an older version of Kafka – check out our current documentation here. Here is a…kafka.apache.org