Tensorflow2.0 HelloWorld using google colab.

In this article, we use the most popular deep learning framework TensorFlow and we will take a basic hello world example to do this example you no need to set up a local environment on your machine. 

Image result for tensorflow

We are using google Colab If you are not aware of what it is? here you go and check out my article on the same Colab getting started!!
Train deep neural network free using google colaboratory.medium.com

Now visit https://colab.research.google.com/ and you will see 

Brief About Colab:

Once you opened the Colab and if you are already logged in Gmail account. 

The google colab is available with zero configuration and free access to GPU and the best part is it sharable. The Google Collaboration is free service for the developers to try TensorFlow on CPU and GPU over the cloud instance of Google. This service is totally free for improving Python programming skills, developers can log in with their Google Gmail account and connect to this service. Here developers can try deep learning applications using popular machine learning libraries such as Keras, TensorFlow, PyTorch, OpenCV & others.

Sign in to google colab and create a new notebook for our HelloWorld example.

Go to File → New NoteBook(Google sign-in is required) → 

Now new notebook is ready we want to use TF2.0.0 for our example so let us first install TensorFlow 2.0.0 is already released as a production version. For installing TensorFlow2.0.0 run the following command.

!pip install tensorflow==2.0.0

After a successful installation, we can verify the installed version.

import tensorflow as tf

Helloworld example:

Now everything is ready and looking promising. We have installed TensorFlow and verified versions too. Now let us look at helicopter overview and create a hello world example. 

To change Runtime: Click on Runtime →Change Runtime Type → one popup will open choose perticular runtime and hardware accelrator such as GPU and TPU.

There are a lot of changes that are there in TF1.0 and TF 2.0.0 TF comes with the ease of development less coding it needs in this version of TF2.0.0. TensorFlow 2.0.0 is developed to remove the issues and complexity of previous versions. 

In the TF 2.0 eager execution is enabled by default.

The eager execution mode evaluates the program immediately and without building the graph. The eager execution mode operation returns the concrete value instead of constructing a computational graph and then execute the program.

We will use the same Hello world code from tensorflow 1.x version for this and let us observe the output.

#This code snippet is from tensorflow 1.X version
import tensorflow as tf

msg = tf.constant('Hello and welcome to Tensorflow world')

sess = tf.Session()

#print the message

In this example, we are using Tensorflow 1.X.X version code to print the message, but Session has been removed in TF2.0.0 this will cause the exception i.e

AttributeError: module 'tensorflow' has no attribute 'Session'

We will use the same above code snippet by removing the Session

import tensorflow as tf

msg = tf.constant('Hello and welcome to Tensorflow world')

#print the message

#print using tf.print()

Here we have two print statement observe output for both print:

  1. tf.Tensor(b’Hello and welcome to Tensorflow world’, shape=(), dtype=string) 
  2. Hello and welcome to Tensorflow world.

This is it, for now, we will start exploring different API of TF in the next article.


Code is available over github you can directly import that in colab and run it.


More Articles on Tensorflows:






Introduction of Intel OpenVINO Toolkit!!!

I got udacity intel AI edge scholarship and in the introduction, they provided Introduction of Intel OpenVINO. In this article, we are going to explore the basic of openVINO.

OpenVINO toolkit can boost your inference applications across multiple deep neural networks with high throughput and efficiency.

OpenVINO stands for Open Visual Inference and Neural Network Optimization.

Download link 


What is OpenVINO?

OpenVINO stands for Open Visual Inference and Neural Network Optimization. OpenVINO is a toolkit provided by Intel to facilitate faster inference of deep learning computer vision models. This toolkit helps developers to create cost-effective and robust computer vision applications.

Learn more about openvino here.

It enables deep learning inference at the edge and supports heterogeneous execution across computer vision accelerators — CPU, GPU, Intel® Movidius™ Neural Compute Stick, and FPGA.

OpenVION provides a number of inbuilt trained models here

Download & Install the OpenVINO:

There is the very good official documentation of OpenVION from intel, which descriptive and easy to understand. Please use below link

  1. To download

Choose & Download | Intel® Distribution of OpenVINO™ Toolkit
Download a version of the Intel® Distribution of OpenVINO™ toolkit for Linux, Windows, or macOS.software.intel.com

2. Get started

Get Started | Intel® Distribution of OpenVINO™ Toolkit
Get up-to-speed fast using resources and training materials for this computer vision toolkit.software.intel.com

OverView of OpenVINO:

The execution process is as follows — 

  • We have to feed a pre-trained model to the Model Optimizer. It optimizes the model and converts it into its intermediate representation (.xml and .bin file).
  • The Inference Engine helps in the proper execution of the model on the different number of devices. It manages the libraries required to run the code properly on different platforms.

The two main components of the OpenVINO toolkit are Model Optimizer and Inference Engine. So, we will dip dive into there details, to have a better understanding of under the hood.

Model Optimizer:

The model optimizer is a cross-platform CLI tool that facilitates the transition between the training and deployment environment. It adjusts the deep learning models for optimal execution on end-point target devices. If you want to understand the optimization technique for TensorFlow you can check out this article.

If you check the diagram carefully optimizer contains three steps 

  1. Converting
  2. Optimizing
  3. Preparing to inference.

OpenVION is a toolkit, not a deep learning library that will help you to train a model. It helps you to optimize and serve the model on different devices.

There is a detailed documentation of how under the hood this works. I don’t want to go into detail. 

Inference Engine:

Now our model is ready for inferencing. The optimizer CLI converted and optimized model and ready for inference. The model optimizer produces the intermediate representation of a model. This is the input for the inference engine to take inference over the input data. 

The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices.

The best thing about the OpenVION inference engine is the heterogeneous execution of the model and it is possible because of the Inference Engine. It uses different plug-ins for different devices.

Code sample Example:

We will take sample store-aisle-monitor-python. This code sample has been provided by intel.

We will take some code sample snippets and brief description.

# Initialize the class
infer_network = Network()

# Load the network to IE plugin to get shape of input layer

n, c, h, w = infer_network.load_model(args.model, args.device, 1, 1, 2, args.cpu_extension)[1]

The above code is self-explanatory. 

just initializing the Network class and loading the model using the load_model function.
The load_model the function returns the plugin along with the input shape.
We only need the input shape that’s why we have specified [1] after the function call.

# The exec_net function will start an asynchronous inference request.
infer_network.exec_net(next_request_id, in_frame)

We need to pass request-id and input frame for inference. 

res = infer_network.get_output(cur_request_id)
for obj in res[0][0]:
if obj[2] > args.prob_threshold:
xmin = int(obj[3] * initial_w)
ymin = int(obj[4] * initial_h)
xmax = int(obj[5] * initial_w)
ymax = int(obj[6] * initial_h)
class_id = int(obj[1])

 get_output the function will give us the model’s result.

You can clone the git repo and start making your hands dirty. Happy coding!


This reference implementation is also available in C++ This reference implementation counts the number of people…github.com

Some Cool project just FYIIntel® IoT Developer Kit
IoT Libraries & Code Samples from Intel. Intel® IoT Developer Kit has 102 repositories available. Follow their code on…github.com

Introduction to Intel® Distribution of OpenVINO™ toolkit for Computer Vision Applications |…
The Intel® Developer Zone offers tools and how-to information to enable cross-platform app development through platform…www.coursera.org

Tensorflow Lite model inferencing fast and lean!!

This article is intended to talk more about how TFLite achieves inference over all the different types of edge devices in a fast and lean way

We have a different set of edge devices such as IoT devices, mobile devices, embedded devices, etc. How TFLite is taking inference seamlessly and elegant way. To understand this let us jump into it.

What is an interpreter?

As we know TFLite consists of a set of tools and the TFLite consist of two core components:

  1. Converter
  2. Interpreter

The converter will help us to convert deep learning models into the TFLite format and the interpreter makes our life easier while inferencing.

The TensorFlow Lite interpreter, which runs specially optimized models on many different hardware types, including mobile phones, embedded Linux devices, and microcontrollers

TFLite interpreter people refer to interchangeably as inferencing. The term inference refers to the process of executing a TensorFlow Lite model on edge devices in order to make predictions based on user input. To perform inference with a tensorflow lite model, you must run it through interpreter. 

TFLite interpreter is designed to be lean and fast to achieve this it uses a static graph ordering and a custom memory allocator to ensure minimal load, initialization, and execution latency.

Step of inferencing:

TensorFlow inference APIs are provided for most common mobile/embedded platforms such as Android, iOS, & Linux, in multiple programming languages. Across all libraries, the TensorFlow Lite API enables you to load models, feed inputs, and retrieve inference outputs.

TFLite interpreter follows below steps in general:

  1. Loading a model:- 

 The first and more must step is to load the .tflite model into the memory, which contains the execution graph.

2. Transforming data:- 

 The model doesn’t understand the raw input data. To make raw compatible into a model understandable format you need to transform the data. For e.g for the computer vision model, you need to resize the input image and then provide that image to model.

3. Running inference:- 

Now the model is in memory and data is in the required format let us take the inference. It involves a few steps such as building the interpreter and allocating tensors.

4. Interpreting output:-

After the third step, we will get some output after inference but the end-user won’t understand that. Model results most of the time are probabilities or approximate value. We have interpreted this result into meaningful output. 


Let us take model inferencing using python 

import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)


# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])

Example in C++, even though language will change or underlining platform will change steps are the same:

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if desired.

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

//output data
float* output = interpreter->typed_output_tensor<float>(0);


In this article, we explored the TFLite interpreter and what are the steps involved in TFLite inferencing and how to do that.



Fast Inference: TFLite GPU Delegate!!

Running inference over the edge devices, especially on mobile devices is very demanding. When you have a really big machine learning model taking inference with the limited resources is a very crucial task. 

Many mobile devices especially mobile devices have hardware accelerators such as GPU. Tensorflow Lite Delegate is useful to optimize our trained model and leveraged the benefits of hardware acceleration.

What is Tensorflow Lite Delegate?

Delegator’s job, in general, is to delegate or transfer your work to someone. TensorFlow Lite supports several hardware accelerators.

A TensorFlow Lite delegate is a way to delegate part or all of graph execution to another executor.

Why should you use delegates?

Running inference on compute-heavy deep learning models on edge devices is resource-demanding due to the mobile devices’ limited processing, memory, and power. Instead of relying on the device CPU, some devices have hardware accelerators, such as GPU or DSP(Digital Signal Processing), that allows for better performance and higher energy efficiency.

How TFLite Delegate work?

How TFLite Delegate work. tensorflow.org

Let us consider the graph on the left side. It has an input node where we will get input for inference. We will get input node going through convolutional operation and then mean operation and it uses the output of these two operations to compute the SquareDifference. 

Let us assume we have a hardware accelerator that can perform Conv2d and mean operations very fastly and efficiently and above graph will be like this:

In this case, we will delegate conv2d and mean these two operations to a specialized hardware accelerator using the TFLite delegator. 

TFLite GPU delegator will delegate the operations to a GPU delegator if available.

TFLite allows us to provide delegates for specific operations, in which case the graph will split into multiple subgraphs, where each subgraph handled by a delegate. Each and every subgraph that is handled by a delegate will be replaced with a node that evaluates the subgraph on its invoked call. Depending on the model, the final graph can end up with one node or many nodes, which means that all of the graphs were delegated or multiple nodes handled the subgraphs. In general, you don’t want to have multiple subgraphs handled by the delegate, since each time you switch from delegate to the main graph, there is an overhead for passing the results from the subgraph to the main graph. 

It’s not always safe to share memory.

How to add a delegate?

  1. Define a kernel node that is responsible for evaluating the delegate subgraph.
  2. Create an instance of TfLiteDelegate, which will register the kernel and claim the nodes that the delegate can execute.


Tensorflow has provided a demo app for android:

In your application, add the AAR as above, import org.tensorflow.lite.gpu.GpuDelegate module, and use theaddDelegate function to register the GPU delegate to the interpreter

import org.tensorflow.lite.Interpreter;
import org.tensorflow.lite.gpu.GpuDelegate;

// Initialize interpreter with GPU delegate
GpuDelegate delegate = new GpuDelegate();
Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
Interpreter interpreter = new Interpreter(model, options);

// Run inference
while (true) {
  interpreter.run(input, output);

// Clean up


Include the GPU delegate header and call the Interpreter::ModifyGraphWithDelegate function to register the GPU delegate to the interpreter:

#import "tensorflow/lite/delegates/gpu/metal_delegate.h"

// Initialize interpreter with GPU delegate
std::unique_ptr<Interpreter> interpreter;
InterpreterBuilder(*model, resolver)(&interpreter);
auto* delegate = NewGpuDelegate(nullptr);  // default config
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

// Run inference
while (true) {
  if (interpreter->Invoke() != kTfLiteOk) return false;

// Clean up
interpreter = nullptr;


Some operations that are trivial on the CPU may have a high cost for the GPU.

Reference Link:


For more such stories

Optimization techniques – TFLite!!

One of the most popular Optimization technique is called quantization.

Running the machine learning model and making inference on mobile devices or embedded devices comes with certain challenges such as the limited amount of resources such as memory, power and data storage, so it’s crucial and critical to deploy ML model on edge devices. 

It’s critical to deploy optimized machine learning models on mobile and embedded devices so that they can run efficiently. There are optimization techniques and one of the optimization techniques is Quantization. In the last article, we have seen how to use the TFLite Converter to optimize the model for edge devices without any modification in weights and activation types.

What is Quantization?

Quantization is generally used in mathematics and digital signal processing. Below is the wiki definition.

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes.

Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the dominant numerical format used for research and for deployment has so far been a 32-bit floating-point or FP32. Convert FP32 weights and output activations into the nearest 8-bit integer, some times 4/2/1 bit as well in quantization.

Quantization optimizes the model by quantizing the weights and activation type. TFLite uses quantization technique to speed up inference over the edge devices. TFLite converter is the answer to whether we can manage a deep learning model with lower precision. Now you know exactly quantization, let us, deep dive:

Quantization dramatically reduces both the memory requirement and computational cost of using neural networks.

The quantizing deep learning model uses techniques that allow for reduced precision representations of weights and, optionally, activations for both storage and computation.

TFLite provides several level of support to quantization.

  1. Post-training quantization
  2. Quantization aware training.

Below is a table that shows the benefits of model quantization for some CNN models. 

Benefits of model quantization for select CNN models. tensorflow.org

Post-training quantization:

As the name implies its post-training technique, this is after your model is trained. Post-training quantization is a technique used to quantizing weights and activation types. This technique can reduce the model size and also improving CPU and hardware acceleration latency. There are different optimization options such as weight, full integer, etc based on our requirement we can choose. 

TensorFlow org provided a decision tree that can help us in making decision


Weight Quantization:

The very simple post-training quantization is quantizing only weights from FP to 8 bit precision. This option is available with TFLite converter. At inference, weights are converted from 8-bits of precision to floating-point and computed using floating-point kernels. This conversion is done once and cached to reduce latency. If you want to improve latency further use of a hybrid operator. 

import tensorflow as tfconverter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]tflite_quant_model = converter.convert()

At the time of conversion, set the optimizations flag to optimize for model size.

This optimization provides latencies close to fully fixed-point inference. but, the outputs are still stored using floating-point.

Full integer quantization:

We can get further latency improvements, reductions in peak memory usage, and access to an integer only hardware accelerators by making sure all model math is quantized. In full integer quantization, you need to measure the dynamic range of activations and inputs by supplying data sets, create a dataset using an input data generator.

import tensorflow as tfdef representative_dataset_gen():  for _ in range(num_calibration_steps):    yield [input]
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)converter.optimizations = [tf.lite.Optimize.DEFAULT]converter.representative_dataset = representative_dataset_gentflite_quant_model = converter.convert()

The result of full integer quantization should be full quantized, any ops don’t have quantized implementation left in FP. Full integer-only execution gets a model with even faster latency, smaller size, and integer-only accelerators compatible model.

you can enforce full integer quantization for all ops and use integer input and output by adding the following lines before you convert.

The converter throw an error if it encounters an operation it cannot currently quantize.

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8converter.inference_output_type = tf.uint8

Float 16 Quantization example:

The IEEE standard for 16-bit floating-point numbers. We can reduce the size of a floating-point model by quantizing the weights to float16. This technique reduces the model size by half with minimal loss of accuracy as compared to other techniques. This technique model will “dequantize” the weights values to float32 when running on the CPU.

import tensorflow as tfconverter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)converter.optimizations = [tf.lite.Optimize.DEFAULT]converter.target_spec.supported_types = [tf.lite.constants.FLOAT16]tflite_quant_model = converter.convert()

We have seen a different technique in post-training quantization: The float 16 quantization may not be a good choice if you need maximum performance. A Full integer quantization to fixed-point math would be better in that case. Weight quantization is a very basic quantization. Since weights are quantized post-training, there could be an accuracy loss, particularly for smaller networks.

Tensorflow Lite model accuracy

Quantization aware Training:

There could be an accuracy loss in a post-training model quantization and to avoid this and if you don’t want to compromise the model accuracy do quantization aware training. As we have learned the post-training quantization technique is after the model has been trained. To overcome post-training quantization technique drawbacks we have quantization aware model training. This technique ensures that the forward pass matches precision for both training and inference. In this technique Tensorflow created flow, wherein the process of constructing the graph you can insert fake nodes in each layer, to simulate the effect of quantization in the forward and backward passes and to learn ranges in the training process, for each layer separately.

There are two aspects of this technique

  • Operator fusion at inference time is accurately modeled at training time.
  • Quantization effects at inference are modeled at training time.
tf.quantization.quantize(    input,    min_range,    max_range,    T,    mode='MIN_COMBINED',    round_mode='HALF_AWAY_FROM_ZERO',    name=None)
out[i] = (in[i] - min_range) * range(T) / (max_range - min_range)if T == qint8: out[i] -= (range(T) + 1) / 2.0
num_discrete_values = 1 << (# of bits in T)range_adjust = num_discrete_values / (num_discrete_values - 1)range = (range_max - range_min) * range_adjustrange_scale = num_discrete_values / rangequantized = round(input * range_scale) - round(range_min * range_scale) +  numeric_limits<T>::min()quantized = max(quantized, numeric_limits<T>::min())quantized = min(quantized, numeric_limits<T>::max())

Check the complete example here:




Tensorflow Lite Converter Example!!

Let us deploy Deep learning TensorFlow model on edge devices using TF Lite. 

There are three different ways we can use TensorFlow lite converter

  1. Convert TF SaveModel to TF Lite 
  2. Convert Keras PreBuilt Model to TF Lite
  3. Concrete Function to TF Lite
  1. Convert TF SaveModel to TF Lite:- 

Let us create a simple model using TensorFlow and save that model using the TF SaveModel. To develop this model we will use TensorFlow API. In this example, we will show how to convert SaveModel into TF Lite FlatBuffer.

# we will train 
import tensorflow as tf# Construct a basic TF model.root = tf.train.Checkpoint()root.v1 = tf.Variable(3.)root.v2 = tf.Variable(2.)root.f = tf.function(lambda x: root.v1 * root.v2 * x)
# Save the model into temp directoryexport_dir = "/tmp/test_saved_model"input_data = tf.constant(1., shape=[1, 1])to_save = root.f.get_concrete_function(input_data)tf.saved_model.save(root, export_dir, to_save)
# Convert the model into TF Lite.converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)tflite_model = converter.convert()
#save model 
tflite_model_files = pathlib.Path(‘/tmp/save_model_tflite.tflite’)

2. Convert Keras PreBuilt Model to TF Lite:-

In this section, we have explored how to convert the prebuilt Keras model into the TF lite model. We will run inference on a pre-trained tf.keras MobileNet model to TensorFlow Lite.

import numpy as npimport tensorflow as tf
# Load the MobileNet keras model.# we will create tf.keras model by loading pretrained model on #imagenet dataset
model = tf.keras.applications.MobileNetV2(    weights="imagenet", input_shape=(224, 224, 3))
# here we pretrained model no need use SaveModel 
# here we will pass model directly to TFLiteConverter
converter = tf.lite.TFLiteConverter.from_keras_model(model)tflite_model = converter.convert()

#if you want to save the TF Lite model use below steps or else skip
tflite_model_files = pathlib.Path(‘/tmp/pretrainedmodel.tflite’)
# Load TFLite model using interpreter and allocate tensors.interpreter = tf.lite.Interpreter(model_content=tflite_model)interpreter.allocate_tensors()

3. Concrete Function to TF Lite:- 

In order to convert TensorFlow 2.0 models to TensorFlow Lite, the model needs to be exported as a concrete function. If you have developed your model using TF 2.0 then this is for you. We will convert concrete function into the TF Lite model. In this section also we will use the Keras MobileNet model.

import tensorflow as tf
# load mobilenet model of keras 
model = tf.keras.applications.MobileNetV2(weights="imagenet", input_shape=(224, 224, 3))

We will tf.function to create a callable tensorflow graph of our model.

#get callable graph from model. 
run_model = tf.function(lambda x: model(x))
# to get the concrete function from callable graph 
concrete_funct = run_model.get_concrete_function(tf.Tensorpec(model.inputs[0].shape, model.inputs[0].dtype))

#convert concrete function into TF Lite model using TFLiteConverter
converter =  tf.lite.TFLiteConverter.from_concrete_functions([concrete_funct])
tflite_model = converte.convert()
#save model 
tflite_model_files = pathlib.Path(‘/tmp/concretefunc_model.tflite’)

CLI TF Lite Converter:-

Apart from this python API we can also use Command Line Interface to convert model. TF lite converter to convert SaveModel to the TFLite model.

The TensorFlow Lite Converter has a command-line tool tflite_convert which supports basic models.

#! /usr/bin/env/  bash
tflite_convert = --saved_model_dir=/tmp/mobilenet_saved_model \

 --output_file. Type: string. Specifies the full path of the output file.

--saved_model_dir. Type: string. Specifies the full path to the directory containing the SavedModel generated in 1.X or 2.X.

 --keras_model_file. Type: string. Specifies the full path of the HDF5 file containing the tf.keras model generated in 1.X or 2.X.

#! /usr/bin/env/  bash
tflite_convert = --keras_model_file=model.h5 \

The converter supports SavedModel directories, tf.keras models, and concrete functions.

For now, we will end off with these options only. Next article we will explore converting RNN model and Quantized Models.

Tensorflow Lite Model Deployment!

Here you go — — Introduction Story of Tensorflow Lite

In the above article, we introduced TensorFlow lite. What is TensorFlow lite and what is the purpose of it and what is TensorFlow lite is not.

In this article, we will dig deeper and steps involved in the TensorFlow lite model deployment. 

The above diagram states the deployment flow of Tensorflow lite mode at the edge devices.

Let us go through the steps from the top of the diagram.

Very high level convert this diagram into two functionality first step is converter and second, is the interpreter or inference the model.

  1. Train Model:- 

Train your model using TensorFlow. We can train our model using any high-level TensorFlow API such as Keras or either you have a legacy TensorFlow model. You can train our model using high-level API like Keras or low-level API. You can develop your own model or use TensorFlow inbuilt model. 

If you have any other model also you can convert those models into TensorFlow using ONNX and use it. Once the model is ready you have save that model. We can save our model in a different format based on APIs such as HDF5, SavedModel or FrozenGraphDef.

2. Convert Model:- 

In this step, we are actually using the Tensorflow Lite converter to convert the TensorFlow model into the TensorFlow lite flatbuffer format.

FlatBuffers is a special data serialization format that is optimized for performance. Tensorflow Lite flatbuffer aka TF Lite model. The TensorFlow Lite converter takes a TensorFlow model and generates a TensorFlow Lite FlatBuffer file (.tflite). The converter supports SavedModel directories, tf.keras models, and concrete functions. Now our TFLite model is ready.

You can convert a model using the Python API or command-line tool. CLI support very basic models.

Python API example:- 

//export_dir is the path of your TF model is saved.
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)tflite_model = converter.convert()

CLI example 

bazel run //tensorflow/lite/python:tflite_convert -- \  --saved_model_dir=/tmp/mobilenet_saved_model \  --output_file=/tmp/mobilenet.tflite

3. Deploy Model:-

Now our model is ready and we have ‘.tflite’ file. We can deploy this to IoT devices, embedded devices or mobile devices. We can 

4. Deploy model:-

To perform inference with a TensorFlow Lite model, you must run it through an interpreter. TensorFlow Lite model serves on a device using an interpreter. TensorFlow Lite interpreter provides a wide range of interfaces and supports a wide range of devices. The TensorFlow Lite interpreter is designed to be lean and fast. We can run models locally on these devices using the Tensorflow Lite interpreter. Once this model gets loaded into devices such as embedded devices, Android or iOS devices. Once a device is deployed then take inference. 

The inferencing model goes through the below steps in generally. 

a. Loading a model:- You must load .tflite model file into memory.

b. Transforming data:- Raw input data for the model generally does not much input data format expected by the model. You need to transform the data.

c. Running inference:- Execute inference over transformed data.

d. Interpreting output:- When you receive results from the model inference, you must interpret the tensors in a meaningful way that’s useful in your application.

PreTrained Models



Some Examples


Tensorflow Lite- machine learning at the edge!!

Tensorflow created a buzz in AI and deep learning forum and TensorFlow is the most popular framework in the deep learning community. 



As we know that to train deep learning models we need to compute power and this age of computation. Now we are moving with cloud computing along with edge computing. Edge computing is the need of today’s world because of innovation in the IoT domain and due to compliance and data protection laws enforcing companies to do computation and the edge side instead of computing model in the cloud and sending the result back to a client device is now the legacy.

As TensorFlow is the most popular deep-learning framework. It comes with its lite weight version for edge computation. Now a day’s mobile devices have good processing power but edge devices have less power.

Train deep learning model in less than 100KB.

The official definition of Tensorflow Lite:

“TensorFlow Lite is an open-source deep learning framework for on-device inference.”

Deploy machine learning models on mobile and IoT devices.

Tensorflow Lite is package of tools to help developers to run TensorFlow models on mobile, embedded devices, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size.

Tensorflow Lite is providing machine learning at the edge devices.

Edge computing means compute at local.

Deep Dive:-

This diagram illustrates the standard flow for deploying the model using TensorFlow Lite.

Deploying model using TensorFlow Lite at the edge devices

Tensorflow Lite is not a separate deep learning framework, it is providing a set of tools that will help developers run TensorFlow models or any other deep learning models on mobile, embedded and IoT devices.


  1. Choose Model or develop your own model.
  2. Choose Model
  3. Convert the Model
  4. Deploy the Model
  5. Run the inference with the Model
  6. Optimize the Model and repeat the above steps.

Tensorflow Lite consists of two main components

  1. Converter:- Tensorflow Lite Converter converts the TensorFlow model into the TensorFlow lite model.
  2. Interpreter:- It is supporting a set of core operators that are optimized for on-device applications and with a small binary size. It is basically for inferencing the model.

Why Edge Computing?

Edge computing is really best to use case along with cloud computing. Nowadays cloud computing becomes crazy but there are a certain requirement where edge computation will beat cloud computing. Why edge computation is more important and what is advantage you will derive from this.

  1. Privacy:- No data needs to leave the device. Everything is there only.
  2. Latency:- There’s no back and forth request to a server.
  3. Connectivity:- Internet connection not required
  4. Power Consumption:- Connecting to a network requires power.

Tensorflow Lite is the one-stop solution to convert your deep learning model and deploy efficiently and enjoy inferencing. TensorFlow lite supports both mobile devices and microcontrollers. 

Colab getting started!!

Train deep neural network free using google colaboratory.

GPU and TPU compute for free? Are you kidding?

Google Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser. If you don’t have money to procure GPU and want to train neural network or want to makes hands dirty with zero investment then this if for you. Colab is a Google internal research tool for data science.

You can use GPU as a backend for free for 12 hours at a time.

It supports Python 2.7 and 3.6, but not R or Scala yet.

Many people want to train some machine learning model or deep learning model but playing with this requires GPU computation and huge resources that blocking many people to try out these things and make hands dirty.

Google Colab is nothing but cloud-hosted jupyter notebook.

Colaboratory is a free Jupyter notebook environment provided by Google where you can use free GPUs and TPUs which can solve all these issues. The best thing about colab is TPUs (tensor processing unity) the special hardware designed by google to process tensor.

Let’s Start:- 

 To start with this you should know jupyter notebook and should have a google account. 


Click on the above link to access google colaboratory. This is not only a static page but an interactive environment that lets you write and execute code in Python and other languages. You can create a new Jupyter notebook by File →New python3 notebook. clicking New Python3 Notebook or New Python2 Notebook.

We will create one python3 notebook and it will create one for us save it on google drive. 

Colab is an ideal way to start everything from improving your Python coding skills to working with deep learning frameworks, like PyTorch, Keras, and TensorFlow and you can install any Python package which is require for your python coding like from simple sklearn, numpy too TensorFlow. 

You can create notebooks in Colab, upload existing notebooks, store notebooks, share notebooks with anyone, mount your Google Drive and use whatever you’ve got stored in there, import most of your directories, upload notebooks directly from GitHub, upload Kaggle files, download your notebooks, and do whatever your doing with your local jupyter notebook.

On the top right you can choose to connect to hosted runtime or connect to local runtime

Set up GPU or TPU:-

It’s very simple and straight forward as going to the “runtime” dropdown menu, selecting “change runtime type” and selecting GPU/TPU in the hardware accelerator drop-down menu!

Now you can start coding and start executing your code !!

How to install a framework or libraries?

It’s as simple as writing import statement in python!.

!pip install fastai

use normal pip install command to install different packages like TensorFlow or PyTorch and start playing with it.

For more details and information



Implementing neural networks using Gluon API

Implementing neural networks using Gluon API

In the previous chapter, we discussed the basics of deep learning and over of Gluon API and MxNet. This chapter explains how to use Gluon API to create a different neural network by exploring Gluon API.

Gluon API is the abstraction over the mathematical computation deep learning framework MxNet. As we have discussed in the last chapter different types of machine learning and different algorithms to implement each method. As part of this chapter, we will look into linear regression, binary classification and multiclass classification using Gluon.

Neural Network using Gluon:

Gluon has a hybrid approach in deep learning programming. It supports both symbolic as well as the imperative style of programming. There are different machine learning algorithms to address different problems. As we stated in the last chapter artificial neural network is a mathematical computation representation of a human brain. It’s mathematical computation to deal with that we need to do a matrix or tensor manipulation and to do that we have Gluon API, NDArray API. Artificial Neural network contains nodes and each node has some weight and bias and the data will get transform from layer to layer up to the output layer. The middle layers between the input layer and the output layer are called hidden layers. Let us explore something:

Linear Regression:

Linear regression is a very basic algorithm in the field of machine learning. Everyone will come across to this algorithm whether you are a novice or expert machine learning engineer or data scientist. Linear regression is categorized under supervised machine learning. As the name state, linear regression used to identify the relationship between two continuous variables. In this case, there are two variables, one is predictor (independent) and another one is the dependent (response) variable. We will be able to model the relationship between two variables by fitting them in a linear equation.

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). This is in a mathematical way.

In the above diagram, you can able to see there is a linear equation to fit these data points. On X-axis we have some data points and on Y-axis we have some data. The data points plotted in this diagram states the linear equation between two variable one is dependent and another one is independent.

To understand this, let us take one small example. There is some real-life example like predict the sale of products based on buying history. Predict the house price based on the size of a house, location of a property, amenities, demand and historical records.

There are two different types of Linear regression.

  1. Simple Linear Regression:– Simple linear regression where there are two variables one is dependent and another is an independent variable. X is used to predicate the dependent variable Y. Like Predicate the total fuel expense based on the distance in kilometers.
  2. Multiple(Multi-Variable) Linear Regression:- Multiple linear regression where we have one dependent variable and two or more independent variable. In certain case, we have two or number features that could affect the dependent variable. Like in the case of predicate the house price depends on the size of a house, location of a house, construction year, etc. There are (X1, X2,..) dependent variable to predict Y.

Let us consider the problem when we are in academics we are predicting marks based on how we solved the paper. Let us take one small example you are planning a road trip to Shimla (the city from India) as you recently watched Tripling (India Web series ) with your two siblings. You started from Pune and the total distance have to travel is 1790 km. Its long journey so you have to plan each and every expense such as fuel, meal, and halt, etc. We will take a blank paper you will put when to start and stop and how much fuel is required? how much money need to reserve for meal and hotel charges and to follow these questions you will list out those things and based on your travel car mileage and current fuel prices you can predict total paid for fuel. So, it’s a simple linear relationship between two variables, If I drive for 1790 km, how much will I pay for fuel? If you want to predict the overall expense of a trip then you can convert this simple linear regression into the complex linear regression model. Add more independent variable such as meal cost, lodging charges, other expenses, and historical data in the last trips.

This is the way we can forecast the current trip charges and plan accordingly. The core idea is to obtain a line that best fits the data. Linear Regression is the simplest and far most popular method in machine learning for problem-solving.

Linear regression using Gluon:

Linear regression is the entry pass to the journey of machine learning, given that it is a very straight forward problem and we can solve this using Gluon API. A linear equation is y=Wx+b by constructing the above graph that learns the gradient of the slope (W) and bias (b) through a number of iterations. The target of each iteration to reduce the loss between actual y and predicated y and to achieve this we want to modify the W and b, so inputs of x will give us the y we want. Let us take one small example, implement linear regression using Gluon API, In this example, we are not developing each and everything from scratch but we will take advantage of gluon API to form our implementation,

# let is importantss
import numpy as np
import mxnet as mx
from mxnet import nd, autograd, gluon
# this is for nural layers
from mxnet.gluon import nn, Trainer
# this is for data loading
from mxnet.gluon.data import DataLoader, ArrayDataset

Here is the above code black we have just imported required modules. If you observed carefully gluon API is the part of the mxnet package. We have imported ndarray to numerical tensor processing and autograd for automatic differentiation of a graph of NDArray operations. mxnet.gluon.data is the module which contains API that can help us to load and process the common public dataset such as MNIST.

from mxnet.gluon import nn, Trainer

Gluon provides nn API to define different layers of neural network and Trainer API help us to train the defined neural network. Data is an important part let us build the data set.

We start by generating our dataset, one is for

# set context for optimisation
data_ctx = mx.cpu()
model_ctx = mx.cpu()
# to generate random data
number_inputs = 2
number_outputs = 1
number_examples = 10000
def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
#generate randome records of 10000
X = nd.random_normal(shape=(number_examples, number_inputs))
noise = 0.01 * nd.random_normal(shape=(number_examples,))
y = real_fn(X) + noise

The above code can generate the dataset for the problem.

Now data is ready, load the data using DataLoader API.

batch_size = 4
train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y),
                                      batch_size=batch_size, shuffle=True)

Let us build a Neural network with two input and one output layer as we defined this using nn.Dense(1, in_units=2). It’s called a dense layer because every node in the input is connected to every node in the subsequent layer.

net = gluon.nn.Dense(1, in_units=2)
# dense layer with 2 inputs and 1 output layer
# print just weight and bias for neural network
# output of above print statements
Parameter dense6_weight (shape=(1, 2), dtype=float32)
Parameter dense6_bias (shape=(1,), dtype=float32)

The output of this weight and bias are actually not a ndArrays. They are an instance of Parameter class. We are using Parameter over NDArray for distinct reasons. Parameters can be associated with multiple contexts unlike NDArray. As we discussed in the first chapter Block is the basic building block of neural network in the Gluon, Block will take input and generate output. We can collect all parameters using net.collect_params() irrespective of how complex the neural network is. This method will return the dictionary of parameters.

Next step would be to initialization of parameter of a neural network. The initialization step is very important. In this step, we can access contexts, data and also we can feed data to a neural network.

net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)
# Deferred initialization
example_data = nd.array([[4,7]])
# access the weight and bias data
net = gluon.nn.Dense(1)
net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)

let us observe the difference net = gluon.nn.Dense(1) and the first layer code net = gluon.nn.Dense(1, in_units=2), Gluon inference the shape on parameters.

square_loss = gluon.loss.L2Loss()

Now need to optimize the neural network, Implementing Stochastic gradient descent from scratch to optimize the neural network every time better we can reuse the code gluon.Trainer, pass a parameter dictionary to optimize the network.

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

SGD is Stochastic gradient descent implementation given by Gluon, the learning rate is 0.0001 and passing a dictionary of parameters to optimize the neural network. Now we have actual y and y-pred, we want to know how far the predicted y is away from our generated y. The difference between this two y is called as a loss function and to reduce this loss we are using SGD.

epochs = 10
loss_sequence = []
num_batches = num_examples / batch_size
for e in range(epochs):
    cumulative_loss = 0
    # inner loop
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(model_ctx)
        label = label.as_in_context(model_ctx)
        with autograd.record():
            output = net(data)
            loss = square_loss(output, label)
        cumulative_loss += nd.mean(loss).asscalar()
    print("Epoch %s, loss: %s" % (e, cumulative_loss / num_examples))

Let us visualize the learning loss.

# plot the convergence of the estimated loss function 
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.figure(num=None,figsize=(8, 6))
# Adding some bells and whistles to the plot
plt.grid(True, which="both")
plt.ylabel('average loss',fontsize=14)

SGD learns the linear regression model by plotting the learning curve. The graph indicates the average loss over each epoch. Loss is getting reduced over each iteration.

Now our model is ready and everything working as expected but we need to do some sanity testing for validation purpose.

params = net.collect_params()
print('The type of "params" is a ',type(params))
# A ParameterDict is a dictionary of Parameter class objects
# we will iterate over the dictionary and print the parameters.
for param in params.values():

From this example, we can say that Gluon can help us to build quick, easy prototyping.

In this example, we used a few API that helps to build a neural network without writing everything from scratch. Gluon provides us a more concise way to express model. API is too powerful to prototype, build model quick and easy. Linear regression we can use in many real-life scenarios,

  1. Predicate the house price
  2. Predicate the weather conditions
  3. Predicate the stock price

These are just a few scenarios where you can apply linear regression to predicate the values. The predicted values in linear regression are continuous values.

Binary Classification:

In the above section, we explored linear regression with sample code. When we implemented this linear regression the output value is continuous values, but there are few real-life examples where we don’t have continuous values but we need to classification such email is spam or not or which party will be getting elected in the next elections, customer should buy an insurance policy or not. The classification problem may be binary or multiclass classification where you have more than two classes. In this type of problem, the output neurons are two or more. In classification problem the prediction values are categorical. Logistic regression is the machine learning technique used to solve such classification problems. Basically, logistic regression is an algorithm to solve a binary classification problem.

Let us consider a problem we will provide an image as an input to the neural network and output could be labeled as to whether its dog(1) or non dog(0). In supervised learning there are two types of a problem one is regression and another one is classification problem. In regression problems, the output is a rational number whereas in classification problems the output is categorical. There are different algorithms available to solve such type of classification problems such as support vector machine, discriminant analysis, naive Bayes, nearest neighbor, and logistic regression. Classification problem-solving means identifying in which of the category a new observation.

In the above diagram, you can easily categories data into two classes, one is circled another one is a cross sign. This called binary classification.

Binary classification using logistic regression:

Logistic regression is a very popular and powerful machine learning technique to solve the classification problem. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables. Logistic regression will answer the question like how likely is it?. Then you will get the question of why are not using linear regression? We have tumor cancer dataset and each of one is malignant or not denoted by zero or one. If we use linear regression then we can construct a line to best fit for an equation y =wx +b then we can decide all values left to the line are non-malignant and values right of the line are malignant based on a threshold (ex. 0.5), what if there is an outlier means some positive class values into the negative class. we need a way to deal with outlier and logistic regression will give us that power. Logistic regression does not try to predict the rational value of a given a set of inputs. Instead, the output is a probability that the given input point belongs to a certain category and based on the threshold we can easily categorize the input observation. Logistic Regression is a type of classification algorithm involving a linear discriminant. The linear discriminant means the input space is separated into two regions by a linear boundary and model will be able to differentiate between points belonging to different category.

Logistic regression technique is useful when several independent variables on a single outcome variable. Let us consider we are watching cricket world cup matches, we want to predicate whether the match will be getting scheduled or not based on weather conditions


In the above dataset, the output is yes(1) or no(0). Here the output is categorical with two output classes that’s why this is aka as binary classification.

Let us start some code, for this example, we are considering the (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html ) with total sample 569 with 30 dimensions and two classes.

Import the required modules. Here we need sklearn python library which contains breast cancer data inbuild, we can use this dataset and apply logistic regression for binary classification.

import mxnet as mx
from mxnet import gluon, autograd, ndarray
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Load the data set and use the pandas data frame to hold the data for further processing.

# the dataset is part of below module
from sklearn.datasets import load_breast_cancer 
# load data 
data = load_breast_cancer()
# use pandas data frame to hold the dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X = data.data
# print first five records
# display record shape means number for rows and cloumns
# number of dimentions

Now data is available but this data is human readable format and to train neural network it won’t be useful. Before start train our neural network we need to normalize the data. To normalize the data we are using pandas. We can also use gluon to normalize the dataset.

df_norm = (df - df.mean()) / (df.max() - df.min())

Before training any machine learning algorithm the critical part is the dataset, We need to split the dataset into training and testing dataset. Let us do that

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=12345)

Tuning the hyperparameters is another important aspect in training the artificial neural network.

LEARNING_R = 0.001
EPOCHS = 150

Let us prepare the data for according to gluon API, so that we can feed that data to network and train. To do that we can use mx.gluon.data module

train_dataset = mx.gluon.data.ArrayDataset(X_train.as_matrix(),y_train)
test_dataset = mx.gluon.data.ArrayDataset(X_test.as_matrix(),y_test)
train_data = mx.gluon.data.DataLoader(train_dataset,
                                      batch_size=BATCH_SIZE, shuffle=True)
test_data = mx.gluon.data.DataLoader(test_dataset,
                                     batch_size=BATCH_SIZE, shuffle=False)

Let us use gluons plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers. It has some predefined layers such Dense layer, sequential, etc.

net = gluon.nn.Sequential()
# Define the model architecture
with net.name_scope():
 net.add(gluon.nn.Dense(64, activation="relu"))
 net.add(gluon.nn.Dense(32, activation="relu") ) 
 net.add(gluon.nn.Dense(1, activation="sigmoid"))
# Intitalize parametes of the model
# Add binary loss function, sigmoid binary cross Entropy
binary_cross_entropy = gluon.loss.SigmoidBinaryCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': LEARNING_R})

The neural network contains four layers. We are using ‘relu’ as an activation function. ReLU rectified linear unit is an activation function aka a ramp function. The third layer is (gluon.nn.BatchNorm() ) batch normalisation layer. Another activation function we have used is ‘sigmoid’. The sigmoid function is another linear activation function having a characteristic of S-shaped curve. In the binary classification, the loss function we used is binary cross entropy. It measures the performance of a model whose output is a probability number between 0 and 1. Below is binary cross entropy loss function mathematical formula.

Then gluon.Trainer() to train the model.

Now training time for the model

for e in range(EPOCHS):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(mx.cpu()).astype('float32')
        label = label.as_in_context(mx.cpu()).astype('float32')
        with autograd.record(): # Start recording the derivatives
            output = net(data) # the forward iteration
            loss = binary_cross_entropy(output, label)
        # Provide stats on the improvement of the model over each epoch
        curr_loss = ndarray.mean(loss).asscalar()
    if e % 20 == 0:
        print("Epoch {}. Current Loss: {}.".format(e, curr_loss))

Look at the above loss function graph, its in S-shape. Let us calculate this

print(accuracy_score(y_test, y_pred_labels))

This is the binary classification problem where we have just observed breast cancer data set with input data set and output is either of the two categories malignant or benign.

Multiclass classification:

We had discussed till linear regression problem, where output is single value and that is also a single rational number, then we have seen some of the categorical problem those aka as classification problems. In Classification problems also there are generally two types of a classification problem.

  1. Binary Classification
  2. MultiClass Classification

Binary classification problem means two categories, such as email is spam or not, breast cancer, and based on weather conditions cricket match will get played or not. In all this scenario the output is either(yes/no) of the categories but there is a real-life scenario where you have more than one category those problems are classified as multiclass classification(more than two classes). MultiClass classification aka multinominal classification. In multiclass classification, classifying observation into one of three or more classes. Don’t be confuse with multi-label classification with multiclass classification.

We went into the grocery shop for shopping at the fruit stall you stopped to buy some fruit, you picked your phone are tried your machine learning algorithm to identify a fruit based on color, shape, etc. Classifies the set of images of fruits which may banana, apple, orange, guava, etc. We will use the same logistic regression algorithm to address this multiclass classification problem. Logistic regression is the classic algorithm to solve the classification problem in supervised learning. As we have seen binary classification is quite useful when We have a dataset with two categories like, use it to predict email spam vs. not spam or breast cancer or not cancer. But this is not for every problem. Sometimes we encounter a problem where each observation could belong to one of the n classes. For example, an image might depict a lion, cat or a dog or a zebra, etc.

Let us dive deeper into the multiclass classification problem for this we will use MNIST (Modified National Institute of Standards and Technology ) dataset. This is the handwritten digits dataset. This dataset is widely used to teach deep learning hello world program. The MNIST dataset contains 60,000 training images and 10,000 testing images. MNIST can be a nice toy dataset for testing new ideas it is like a HelloWorld program for an artificial neural network.

Let us makes our hands dirty with gluon multiclass classification implementation.

from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np

Let us import some of the modules that require such as mxnet, gluon, ndArray, autograd for differentiation and numpy.

Set the context, in previous all example we have set is the CPU for simplicity, you can set GPU if you want to execute code on GPU for that you have to install GPU enabled mxnet GLUON API.

( e.g . model_ctx=mx.gpu() ).

data_ctx = mx.cpu()
model_ctx = mx.cpu()

For multiclass classification, we are using the MNIST data set, as part of this we are not explaining what is MNIST data set for more details you can use this link https://en.wikipedia.org/wiki/MNIST_database.

batch_size = 64
num_inputs = 784
num_outputs = 10
num_examples = 60000
def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),
                                      batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                              batch_size, shuffle=False)

Load the dataset number of inputs is 784 and the number of outputs is 10 (number 0,1…,9) with 60000 examples and 64 is the batch size. mx.gluon.data.vision.MNIST module contains the MNIST dataset which is part of gluon API. For training and validation purpose we are splitting data set into two-part testing data set and training data set.

Data is loaded successfully the next step is to define our module. Revise the code of linear regression for binary classification where we defined the Dense layer with the number inputs and outputs. gluon.nn.Dense(num_ouputs) is the defined layer with output shape and gluon inference the input shape from input data.

net = gluon.nn.Dense(num_outputs)

Parameter initialization is the next step but before going to register an initializer for parameters, gluon doesn’t know the shape of the input parameter because we have mentioned the shape of the output parameters. The parameters will get initialized during the first call to the forward method.

net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)

When you need to get the output in probabilities then Softmax cross entropy loss function can be useful

Softmax is an activation layer which allows us to interpret the outputs as probabilities, while cross entropy is we use to measure the error at a softmax layer.

Let us consider below softmax code snippet

# just for understanding.
def softmax(z):
    """Softmax function"""
    return np.exp(z) / np.sum(np.exp(z))

As the name suggests, softmax function is a “soft” version of max function. Instead of selecting one maximum rational value, it breaks the value with maximal element getting the largest portion of the distribution, that’s why it’s very good to get the probabilities of the inputs. From the above code, you will able to get that Softmax function takes an N-dimensional vector of real numbers as an input and transforms it into a vector of real number in range (0,1).

softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

Now initiate an optimizer with learning rate 0.1. sgd (Stochastic gradient decent)

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

Now the model is trained, but evaluation of the model is required to identify the accuracy. To do this we are using MxNet built-in metric package. We should have to consider accuracy in the ballpark of .10 because of we initialized model randomly.

def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(model_ctx).reshape((-1,784))
        label = label.as_in_context(model_ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]
# call the above function with test data

Now execute the training loop with 10 iterations,

epochs = 10
moving_loss = 0.
for e in range(epochs):
    cumulative_loss = 0
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(model_ctx).reshape((-1,784))
        label = label.as_in_context(model_ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        cumulative_loss += nd.sum(loss).asscalar()
    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % (e, cumulative_loss/num_examples, train_accuracy, test_accuracy))
# output
Epoch 0. Loss: 2.1415544213612874, Train_acc 0.7918833333333334, Test_acc 0.8015
Epoch 1. Loss: 0.9146347909927368, Train_acc 0.8340666666666666, Test_acc 0.8429
Epoch 2. Loss: 0.7468763765970866, Train_acc 0.8524333333333334, Test_acc 0.861
Epoch 3. Loss: 0.65964135333697, Train_acc 0.8633333333333333, Test_acc 0.8696
Epoch 4. Loss: 0.6039828490893046, Train_acc 0.8695833333333334, Test_acc 0.8753
Epoch 5. Loss: 0.5642358363191287, Train_acc 0.8760166666666667, Test_acc 0.8819
Epoch 6. Loss: 0.5329904221892356, Train_acc 0.8797, Test_acc 0.8849
Epoch 7. Loss: 0.5082313110192617, Train_acc 0.8842166666666667, Test_acc 0.8866
Epoch 8. Loss: 0.4875676867882411, Train_acc 0.8860333333333333, Test_acc 0.8891
Epoch 9. Loss: 0.47050906361341477, Train_acc 0.8895333333333333, Test_acc 0.8902

Visualize the prediction

import matplotlib.pyplot as plt
def model_predict(net,data):
    output = net(data.as_in_context(model_ctx))
    return nd.argmax(output, axis=1)
# let's sample 10 random data points from the test set
sample_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                              10, shuffle=True)
for i, (data, label) in enumerate(sample_data):
    data = data.as_in_context(model_ctx)
    im = nd.transpose(data,(1,0,2,3))
    im = nd.reshape(im,(28,10*28,1))
    imtiles = nd.tile(im, (1,1,3))
    print('model predictions are:', pred)

# output of the above code snippet

(10, 28, 28, 1)
model predictions are: 
[3. 6. 7. 8. 3. 8. 1. 8. 2. 1.]
<NDArray 10 @cpu(0)>

From the output of the above program, we can understand our model is able to solve the multiclass classification problem. Multiclass classification problem solved using linear regression algorithm. The activation function we used here is the softmax activation function that will enforce the output should be in the range of (0,1). That allowed us to interpret these outputs as probabilities. Other common names we can use softmax regression and multinomial regression alternatively. In the above example, we have used sgd (stochastic gradient descent)

def SGD(params, lr):
    for param in params:
        param[:] = param - lr * param.grad

Overfitting and regularization:


Till now we have solved regression and classification algorithm and with three different datasets, we achieved almost approximately 90% accuracy over the testing dataset. Sometimes times a model is too closely fit a limited set of data points that time we say its an overfitting error. The above regression and classification algorithm are working fine in the above examples but those are not working for certain of the datasets and running into overfitting they can cause them to perform very poorly. In this section, I would like to explain to you what is overfitting problem and regularization technique that will allow us to reduce this overfitting problem and get this learning algorithm to perform much better.

I find this joke from “Plato and Platypus Walk Into a Bar” does the best analogy to explain this overfitting problem.

“A man tries on a made-to-order suit and says to the tailor, “I need this sleeve taken in! It’s two inches too long!”

The tailor says, “No, just bend your elbow like this. See, it pulls up the sleeve.”

The man says, “Well, okay, but now look at the collar! When I bend my elbow, the collar goes halfway up the back of my head.”

The tailor says, “So? Raise your head up and back. Perfect.”

The man says, “But now the left shoulder is three inches lower than the right one!”

The tailor says, “No problem. Bend at the waist way over to the left and it evens out.”

The man leaves the store wearing the suit, his right elbow crooked and sticking out, his head up and back, all the while leaning down to the left. The only way he can walk is with a choppy, atonic walk.

This suit is perfectly fit that man but it has been overfitted. This suit would neither be useful to him nor to anyone else. I think this is the best analogy to explain this overfitting problem.

Overfitting and underfitting aka overtraining and undertraining and it occurs when an algorithm captures the noise of the data. Underfitting occurs when the model is not fit well enough. Not every algorithm that performs well on training data will also perform well on test data. To identify the overfitting and underfitting using validation and cross-validation data set. Both overfitting and underfitting lead to a poor prediction on the new observations.

Underfitting occurs if the model shows high bias and low variance. Overfitting occurs if the model shows high variance. If we have too many features, the learned model may fit the training set very well but fail to predicate new observations.

Let us ritual our MNIST data set and see how can things go wrong.

from __future__ import print_function
import mxnet as mx
import mxnet.ndarray as nd
from mxnet import autograd
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
ctx = mx.cpu() 
# load the MNIST data set and split it into the training and testing
mnist = mx.test_utils.get_mnist()
num_examples = 1000
batch_size = 64
train_data = mx.gluon.data.DataLoader(
                               batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(
                               batch_size, shuffle=False)

We are using a linear model with softmax. Allocate the parameter and define the model

# weight
W = nd.random_normal(shape=(784,10))
# bias
b = nd.random_normal(shape=10)
params = [W, b]
for param in params:
def net(X):
    y_linear = nd.dot(X, W) + b
    yhat = nd.softmax(y_linear, axis=1)
    return yhat

Define loss function to calculate average loss and optimizer to optimize the loss. As we have seen this cross entropy loss function and SGD in multiclass classification.

# cross entropy 
def cross_entropy(yhat, y):
    return - nd.sum(y * nd.log(yhat), axis=0, exclude=True)
# stochastic gradient descent 
def SGD(params, lr):
    for param in params:
        param[:] = param - lr * param.grad
def evaluate_accuracy(data_iterator, net):
    numerator = 0.
    denominator = 0.
    loss_avg = 0.
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        output = net(data)
        loss = cross_entropy(output, label_one_hot)
        predictions = nd.argmax(output, axis=1)
        numerator += nd.sum(predictions == label)
        denominator += data.shape[0]
        loss_avg = loss_avg*i/(i+1) + nd.mean(loss).asscalar()/(i+1)
    return (numerator / denominator).asscalar(), loss_avg

Plot the loss function and visualize the model using matplotlib.

def plot_learningcurves(loss_tr,loss_ts, acc_tr,acc_ts):
    xs = list(range(len(loss_tr)))
    f = plt.figure(figsize=(12,6))
    fg1 = f.add_subplot(121)
    fg2 = f.add_subplot(122)
    fg1.set_title('Comparing loss functions')
    fg1.semilogy(xs, loss_tr)
    fg1.semilogy(xs, loss_ts)
    fg1.legend(['training loss', 'testing loss'],fontsize=14)
    fg2.set_title('Comparing accuracy')
    fg2.plot(xs, acc_tr)
    fg2.plot(xs, acc_ts)
    fg2.legend(['training accuracy', 'testing accuracy'],fontsize=14)

Let us iterate.

epochs = 1000
moving_loss = 0.
loss_seq_train = []
loss_seq_test = []
acc_seq_train = []
acc_seq_test = []

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        with autograd.record():
            output = net(data)
            loss = cross_entropy(output, label_one_hot)
        SGD(params, .001)
        # Keep a moving average of the losses
        niter +=1
        moving_loss = .99 * moving_loss + .01 * nd.mean(loss).asscalar()
        est_loss = moving_loss/(1-0.99**niter)
    test_accuracy, test_loss = evaluate_accuracy(test_data, net)
    train_accuracy, train_loss = evaluate_accuracy(train_data, net)
    # save them for later

    if e % 100 == 99:
        print("Completed epoch %s. Train Loss: %s, Test Loss %s, Train_acc %s, Test_acc %s" %
              (e+1, train_loss, test_loss, train_accuracy, test_accuracy))

## Plotting the learning curves
# output
Completed epoch 100. Train Loss: 0.5582709927111864, Test Loss 1.4102623425424097, Train_acc 0.862, Test_acc 0.725
Completed epoch 200. Train Loss: 0.2390711386688053, Test Loss 1.2993220016360283, Train_acc 0.94, Test_acc 0.734
Completed epoch 300. Train Loss: 0.13671867409721014, Test Loss 1.2758532278239725, Train_acc 0.971, Test_acc 0.748
Completed epoch 400. Train Loss: 0.09426628216169773, Test Loss 1.2602066472172737, Train_acc 0.989, Test_acc 0.758
Completed epoch 500. Train Loss: 0.05988468159921467, Test Loss 1.2470015566796062, Train_acc 0.996, Test_acc 0.764
Completed epoch 600. Train Loss: 0.043480587191879756, Test Loss 1.2396155279129744, Train_acc 0.998, Test_acc 0.762
Completed epoch 700. Train Loss: 0.032956544135231525, Test Loss 1.234715297818184, Train_acc 0.999, Test_acc 0.764
Completed epoch 800. Train Loss: 0.0268415825557895, Test Loss 1.2299001738429072, Train_acc 1.0, Test_acc 0.768
Completed epoch 900. Train Loss: 0.022739565349183977, Test Loss 1.2265239153057337, Train_acc 1.0, Test_acc 0.77
Completed epoch 1000. Train Loss: 0.019902906555216763, Test Loss 1.2242997065186503, Train_acc 1.0, Test_acc 0.772

From the above graph, you can easily get how the model is performing. From the above output, you can say at the 700th epoch, the model gives 100% accuracy on a dataset., this means it only able to classify 75% of the test examples accurately and 25% not. This is a clear high variance means overfitting. Methods to avoid overfitting:

  1. Cross-Validation
  2. Drop out
  3. Regularization


In the above section, we can able to identify the problem of overfitting. Now we know the problem and we also know what are the reasons for this. Now let us talk about the solution. In the regularisation, we will keep all the features but reduce the magnitude of parameters. Regularisation keeps the weights small keeping the model simpler to avoid overfitting. The model will have a lesser accurate if it is overfitting.

We have a linear regression to predicate y, given by plenty of x inputs.

y = a1x1 + a2x2  + a3x3 + a4x4 + a5x5.....

In the above equation a1, a2,….. are the coefficients and x1,x2,……are the independent variables to predicate dependent y.

“Regularisation means generalize the model for the better. “

“Mastering the trade-off between bias and variance is necessary to become a machine learning champion.”

Regularization is a scientific technique to discourage the complexity of the model ( reduce magnitude ). It does this by penalizing the loss function. What is mean by penalizing the loss function? Penalizing the weights makes them too small, almost near to zero. It makes those terms near to zero almost negligible and help us to simplify the model

The loss function is the sum of the squared difference between the predicted value and the actual value. ƛ is the regularization parameter which determines how much to penalizes the weights and the right value of ƛ is somewhere between 0 (zero) and large value.

There are few regularisation techniques.

  1. L1 Regularization or Lasso Regularization
  2. L2 Regularization or Ridge Regularization
  3. Dropout
  4. Data Augmentation
  5. Early stopping

We are solving the above overfitting problem using L2 regularisation technique.

Let us implement and solve the overfitting problem.

Penalizes the coefficient

# penalizes the coefficients
def l2_penalty(params):
 penalty = nd.zeros(shape=1)
 for param in params:
 penalty = penalty + nd.sum(param ** 2)
 return penalty

Reinitialize the parameter because for measures.

for param in params:
    param[:] = nd.random_normal(shape=param.shape)

L2 regularised logistic regression,

L2 regularization is the term of the sum of the square of all the features weight. Consider below formula. L2 regularization performs better when all the input features influence the output and all with weights are of approximately equal size.

Let us implement this L2 regularisation.

epochs = 1000
moving_loss = 0.
l2_strength = .1
loss_seq_train = []
loss_seq_test = []
acc_seq_train = []
acc_seq_test = []

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        with autograd.record():
            output = net(data)
            loss = nd.sum(cross_entropy(output, label_one_hot)) + l2_strength * l2_penalty(params)
        SGD(params, .001)
        # Keep a moving average of the losses
        niter +=1
        moving_loss = .99 * moving_loss + .01 * nd.mean(loss).asscalar()
        est_loss = moving_loss/(1-0.99**niter)

    test_accuracy, test_loss = evaluate_accuracy(test_data, net)
    train_accuracy, train_loss = evaluate_accuracy(train_data, net)
    # save them for later
    if e % 100 == 99:
        print("Completed epoch %s. Train Loss: %s, Test Loss %s, Train_acc %s, Test_acc %s" %
              (e+1, train_loss, test_loss, train_accuracy, test_accuracy))

## Plotting the learning curves

Let us see the graph for more understanding. From the graph, you easily identify the difference between the training loss and testing loss and how values are closer in this graph.


This chapter given is bit insight about the gluon API, ndArray along with inbuilt some of the neural network modules from gluon. With the completion of this chapter, you are now know How to create a simple artificial neural network using gluon abstraction. When to use regression and when to use classification technique along with some real-time dataset.

As a machine learning developer, the major problem we face is the overfitting and underfitting and this chapter gives us the regularisation tool to address this overfitting problem. Gluon is very concise, powerful abstraction to help us to design, prototype, built, deploy and test the machine learning module over GPU and CPU. We can now know how to set the context (GPU, CPU). We have solved classification problems such as binary classification and multiclass classification using logistic regression technique. Let us move on to the next adventure.