Tensorflow2.0 HelloWorld using google colab.

In this article, we use the most popular deep learning framework TensorFlow and we will take a basic hello world example to do this example you no need to set up a local environment on your machine. 

Image result for tensorflow
Tensorflow.org

We are using google Colab If you are not aware of what it is? here you go and check out my article on the same Colab getting started!!
Train deep neural network free using google colaboratory.medium.com

Now visit https://colab.research.google.com/ and you will see 

Brief About Colab:

Once you opened the Colab and if you are already logged in Gmail account. 

The google colab is available with zero configuration and free access to GPU and the best part is it sharable. The Google Collaboration is free service for the developers to try TensorFlow on CPU and GPU over the cloud instance of Google. This service is totally free for improving Python programming skills, developers can log in with their Google Gmail account and connect to this service. Here developers can try deep learning applications using popular machine learning libraries such as Keras, TensorFlow, PyTorch, OpenCV & others.

Sign in to google colab and create a new notebook for our HelloWorld example.

Go to File → New NoteBook(Google sign-in is required) → 

Now new notebook is ready we want to use TF2.0.0 for our example so let us first install TensorFlow 2.0.0 is already released as a production version. For installing TensorFlow2.0.0 run the following command.

!pip install tensorflow==2.0.0

After a successful installation, we can verify the installed version.

import tensorflow as tf
print(tf.__version__)

Helloworld example:

Now everything is ready and looking promising. We have installed TensorFlow and verified versions too. Now let us look at helicopter overview and create a hello world example. 

To change Runtime: Click on Runtime →Change Runtime Type → one popup will open choose perticular runtime and hardware accelrator such as GPU and TPU.

There are a lot of changes that are there in TF1.0 and TF 2.0.0 TF comes with the ease of development less coding it needs in this version of TF2.0.0. TensorFlow 2.0.0 is developed to remove the issues and complexity of previous versions. 

In the TF 2.0 eager execution is enabled by default.

The eager execution mode evaluates the program immediately and without building the graph. The eager execution mode operation returns the concrete value instead of constructing a computational graph and then execute the program.

We will use the same Hello world code from tensorflow 1.x version for this and let us observe the output.

#This code snippet is from tensorflow 1.X version
import tensorflow as tf

msg = tf.constant('Hello and welcome to Tensorflow world')

#session
sess = tf.Session()

#print the message
print(sess.run(msg))

In this example, we are using Tensorflow 1.X.X version code to print the message, but Session has been removed in TF2.0.0 this will cause the exception i.e

AttributeError: module 'tensorflow' has no attribute 'Session'

We will use the same above code snippet by removing the Session

import tensorflow as tf

msg = tf.constant('Hello and welcome to Tensorflow world')

#print the message
print(msg)

#print using tf.print()
tf.print(msg)

Here we have two print statement observe output for both print:

  1. tf.Tensor(b’Hello and welcome to Tensorflow world’, shape=(), dtype=string) 
  2. Hello and welcome to Tensorflow world.

This is it, for now, we will start exploring different API of TF in the next article.

Code: 

Code is available over github you can directly import that in colab and run it.

https://github.com/maheshwarLigade/GoogleColab/blob/master/HelloWorldTF2_0.ipynb

More Articles on Tensorflows:

https://medium.com/analytics-vidhya/optimization-techniques-tflite-5f6d9ae676d5

https://medium.com/analytics-vidhya/tensorflow-lite-converter-dl-example-febe804b8673

https://medium.com/techwasti/tensorflow-lite-machine-learning-at-the-edge-26e8421ae661

https://medium.com/techwasti/dynamic-computation-graphs-dcg-with-tensorflow-fold-33638b2d5754

https://medium.com/techwasti/tensorflow-lite-deployment-523eec79c017

Introduction of Intel OpenVINO Toolkit!!!

I got udacity intel AI edge scholarship and in the introduction, they provided Introduction of Intel OpenVINO. In this article, we are going to explore the basic of openVINO.

OpenVINO toolkit can boost your inference applications across multiple deep neural networks with high throughput and efficiency.

OpenVINO stands for Open Visual Inference and Neural Network Optimization.

Download link 

https://software.intel.com/en-us/openvino-toolkit/choose-download

What is OpenVINO?

OpenVINO stands for Open Visual Inference and Neural Network Optimization. OpenVINO is a toolkit provided by Intel to facilitate faster inference of deep learning computer vision models. This toolkit helps developers to create cost-effective and robust computer vision applications.

Learn more about openvino here.

It enables deep learning inference at the edge and supports heterogeneous execution across computer vision accelerators — CPU, GPU, Intel® Movidius™ Neural Compute Stick, and FPGA.

OpenVION provides a number of inbuilt trained models here

Download & Install the OpenVINO:

There is the very good official documentation of OpenVION from intel, which descriptive and easy to understand. Please use below link

  1. To download

Choose & Download | Intel® Distribution of OpenVINO™ Toolkit
Download a version of the Intel® Distribution of OpenVINO™ toolkit for Linux, Windows, or macOS.software.intel.com

2. Get started

Get Started | Intel® Distribution of OpenVINO™ Toolkit
Get up-to-speed fast using resources and training materials for this computer vision toolkit.software.intel.com

OverView of OpenVINO:

The execution process is as follows — 

  • We have to feed a pre-trained model to the Model Optimizer. It optimizes the model and converts it into its intermediate representation (.xml and .bin file).
  • The Inference Engine helps in the proper execution of the model on the different number of devices. It manages the libraries required to run the code properly on different platforms.

The two main components of the OpenVINO toolkit are Model Optimizer and Inference Engine. So, we will dip dive into there details, to have a better understanding of under the hood.

Model Optimizer:

The model optimizer is a cross-platform CLI tool that facilitates the transition between the training and deployment environment. It adjusts the deep learning models for optimal execution on end-point target devices. If you want to understand the optimization technique for TensorFlow you can check out this article.

If you check the diagram carefully optimizer contains three steps 

  1. Converting
  2. Optimizing
  3. Preparing to inference.

OpenVION is a toolkit, not a deep learning library that will help you to train a model. It helps you to optimize and serve the model on different devices.

There is a detailed documentation of how under the hood this works. I don’t want to go into detail. 

Inference Engine:

Now our model is ready for inferencing. The optimizer CLI converted and optimized model and ready for inference. The model optimizer produces the intermediate representation of a model. This is the input for the inference engine to take inference over the input data. 

The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices.

The best thing about the OpenVION inference engine is the heterogeneous execution of the model and it is possible because of the Inference Engine. It uses different plug-ins for different devices.

Code sample Example:

We will take sample store-aisle-monitor-python. This code sample has been provided by intel.

We will take some code sample snippets and brief description.

# Initialize the class
infer_network = Network()

# Load the network to IE plugin to get shape of input layer

n, c, h, w = infer_network.load_model(args.model, args.device, 1, 1, 2, args.cpu_extension)[1]

The above code is self-explanatory. 

just initializing the Network class and loading the model using the load_model function.
The load_model the function returns the plugin along with the input shape.
We only need the input shape that’s why we have specified [1] after the function call.

# The exec_net function will start an asynchronous inference request.
infer_network.exec_net(next_request_id, in_frame)

We need to pass request-id and input frame for inference. 

res = infer_network.get_output(cur_request_id)
for obj in res[0][0]:
if obj[2] > args.prob_threshold:
xmin = int(obj[3] * initial_w)
ymin = int(obj[4] * initial_h)
xmax = int(obj[5] * initial_w)
ymax = int(obj[6] * initial_h)
class_id = int(obj[1])

 get_output the function will give us the model’s result.

You can clone the git repo and start making your hands dirty. Happy coding!

References:

intel-iot-devkit/store-aisle-monitor-python
This reference implementation is also available in C++ This reference implementation counts the number of people…github.com

Some Cool project just FYIIntel® IoT Developer Kit
IoT Libraries & Code Samples from Intel. Intel® IoT Developer Kit has 102 repositories available. Follow their code on…github.com

Introduction to Intel® Distribution of OpenVINO™ toolkit for Computer Vision Applications |…
The Intel® Developer Zone offers tools and how-to information to enable cross-platform app development through platform…www.coursera.org

Tensorflow Lite model inferencing fast and lean!!

This article is intended to talk more about how TFLite achieves inference over all the different types of edge devices in a fast and lean way


We have a different set of edge devices such as IoT devices, mobile devices, embedded devices, etc. How TFLite is taking inference seamlessly and elegant way. To understand this let us jump into it.

What is an interpreter?

As we know TFLite consists of a set of tools and the TFLite consist of two core components:

  1. Converter
  2. Interpreter

The converter will help us to convert deep learning models into the TFLite format and the interpreter makes our life easier while inferencing.

The TensorFlow Lite interpreter, which runs specially optimized models on many different hardware types, including mobile phones, embedded Linux devices, and microcontrollers

TFLite interpreter people refer to interchangeably as inferencing. The term inference refers to the process of executing a TensorFlow Lite model on edge devices in order to make predictions based on user input. To perform inference with a tensorflow lite model, you must run it through interpreter. 

TFLite interpreter is designed to be lean and fast to achieve this it uses a static graph ordering and a custom memory allocator to ensure minimal load, initialization, and execution latency.

Step of inferencing:

TensorFlow inference APIs are provided for most common mobile/embedded platforms such as Android, iOS, & Linux, in multiple programming languages. Across all libraries, the TensorFlow Lite API enables you to load models, feed inputs, and retrieve inference outputs.

TFLite interpreter follows below steps in general:

  1. Loading a model:- 

 The first and more must step is to load the .tflite model into the memory, which contains the execution graph.

2. Transforming data:- 

 The model doesn’t understand the raw input data. To make raw compatible into a model understandable format you need to transform the data. For e.g for the computer vision model, you need to resize the input image and then provide that image to model.

3. Running inference:- 

Now the model is in memory and data is in the required format let us take the inference. It involves a few steps such as building the interpreter and allocating tensors.

4. Interpreting output:-

After the third step, we will get some output after inference but the end-user won’t understand that. Model results most of the time are probabilities or approximate value. We have interpreted this result into meaningful output. 


Example:-

Let us take model inferencing using python 

import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

Example in C++, even though language will change or underlining platform will change steps are the same:

// Load the model
std::unique_ptr<tflite::FlatBufferModel> model =
    tflite::FlatBufferModel::BuildFromFile(filename);

// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
std::unique_ptr<tflite::Interpreter> interpreter;
tflite::InterpreterBuilder(*model, resolver)(&interpreter);

// Resize input tensors, if desired.
interpreter->AllocateTensors();

float* input = interpreter->typed_input_tensor<float>(0);
// Fill `input`.

interpreter->Invoke();
//output data
float* output = interpreter->typed_output_tensor<float>(0);

Conclusion:- 

In this article, we explored the TFLite interpreter and what are the steps involved in TFLite inferencing and how to do that.

Reference:-

https://www.tensorflow.org/lite/guide/inference

Fast Inference: TFLite GPU Delegate!!

Running inference over the edge devices, especially on mobile devices is very demanding. When you have a really big machine learning model taking inference with the limited resources is a very crucial task. 

Many mobile devices especially mobile devices have hardware accelerators such as GPU. Tensorflow Lite Delegate is useful to optimize our trained model and leveraged the benefits of hardware acceleration.

What is Tensorflow Lite Delegate?

Delegator’s job, in general, is to delegate or transfer your work to someone. TensorFlow Lite supports several hardware accelerators.

A TensorFlow Lite delegate is a way to delegate part or all of graph execution to another executor.

Why should you use delegates?

Running inference on compute-heavy deep learning models on edge devices is resource-demanding due to the mobile devices’ limited processing, memory, and power. Instead of relying on the device CPU, some devices have hardware accelerators, such as GPU or DSP(Digital Signal Processing), that allows for better performance and higher energy efficiency.

How TFLite Delegate work?

How TFLite Delegate work. tensorflow.org

Let us consider the graph on the left side. It has an input node where we will get input for inference. We will get input node going through convolutional operation and then mean operation and it uses the output of these two operations to compute the SquareDifference. 

Let us assume we have a hardware accelerator that can perform Conv2d and mean operations very fastly and efficiently and above graph will be like this:

In this case, we will delegate conv2d and mean these two operations to a specialized hardware accelerator using the TFLite delegator. 

TFLite GPU delegator will delegate the operations to a GPU delegator if available.

TFLite allows us to provide delegates for specific operations, in which case the graph will split into multiple subgraphs, where each subgraph handled by a delegate. Each and every subgraph that is handled by a delegate will be replaced with a node that evaluates the subgraph on its invoked call. Depending on the model, the final graph can end up with one node or many nodes, which means that all of the graphs were delegated or multiple nodes handled the subgraphs. In general, you don’t want to have multiple subgraphs handled by the delegate, since each time you switch from delegate to the main graph, there is an overhead for passing the results from the subgraph to the main graph. 

It’s not always safe to share memory.

How to add a delegate?

  1. Define a kernel node that is responsible for evaluating the delegate subgraph.
  2. Create an instance of TfLiteDelegate, which will register the kernel and claim the nodes that the delegate can execute.

Android:

Tensorflow has provided a demo app for android:

In your application, add the AAR as above, import org.tensorflow.lite.gpu.GpuDelegate module, and use theaddDelegate function to register the GPU delegate to the interpreter

import org.tensorflow.lite.Interpreter;
import org.tensorflow.lite.gpu.GpuDelegate;

// Initialize interpreter with GPU delegate
GpuDelegate delegate = new GpuDelegate();
Interpreter.Options options = (new Interpreter.Options()).addDelegate(delegate);
Interpreter interpreter = new Interpreter(model, options);

// Run inference
while (true) {
  writeToInput(input);
  interpreter.run(input, output);
  readFromOutput(output);
}

// Clean up
delegate.close();

iOS:

Include the GPU delegate header and call the Interpreter::ModifyGraphWithDelegate function to register the GPU delegate to the interpreter:

#import "tensorflow/lite/delegates/gpu/metal_delegate.h"

// Initialize interpreter with GPU delegate
std::unique_ptr<Interpreter> interpreter;
InterpreterBuilder(*model, resolver)(&interpreter);
auto* delegate = NewGpuDelegate(nullptr);  // default config
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

// Run inference
while (true) {
  WriteToInputTensor(interpreter->typed_input_tensor<float>(0));
  if (interpreter->Invoke() != kTfLiteOk) return false;
  ReadFromOutputTensor(interpreter->typed_output_tensor<float>(0));
}

// Clean up
interpreter = nullptr;
DeleteGpuDelegate(delegate);

Note:-

Some operations that are trivial on the CPU may have a high cost for the GPU.

Reference Link:

https://www.tensorflow.org/lite/performance/gpu

For more such stories

Optimization techniques – TFLite!!

One of the most popular Optimization technique is called quantization.


Running the machine learning model and making inference on mobile devices or embedded devices comes with certain challenges such as the limited amount of resources such as memory, power and data storage, so it’s crucial and critical to deploy ML model on edge devices. 

It’s critical to deploy optimized machine learning models on mobile and embedded devices so that they can run efficiently. There are optimization techniques and one of the optimization techniques is Quantization. In the last article, we have seen how to use the TFLite Converter to optimize the model for edge devices without any modification in weights and activation types.


What is Quantization?

Quantization is generally used in mathematics and digital signal processing. Below is the wiki definition.

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set (often a continuous set) to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes.

Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the dominant numerical format used for research and for deployment has so far been a 32-bit floating-point or FP32. Convert FP32 weights and output activations into the nearest 8-bit integer, some times 4/2/1 bit as well in quantization.

Quantization optimizes the model by quantizing the weights and activation type. TFLite uses quantization technique to speed up inference over the edge devices. TFLite converter is the answer to whether we can manage a deep learning model with lower precision. Now you know exactly quantization, let us, deep dive:

Quantization dramatically reduces both the memory requirement and computational cost of using neural networks.

The quantizing deep learning model uses techniques that allow for reduced precision representations of weights and, optionally, activations for both storage and computation.

TFLite provides several level of support to quantization.

  1. Post-training quantization
  2. Quantization aware training.

Below is a table that shows the benefits of model quantization for some CNN models. 

Benefits of model quantization for select CNN models. tensorflow.org

Post-training quantization:

As the name implies its post-training technique, this is after your model is trained. Post-training quantization is a technique used to quantizing weights and activation types. This technique can reduce the model size and also improving CPU and hardware acceleration latency. There are different optimization options such as weight, full integer, etc based on our requirement we can choose. 

TensorFlow org provided a decision tree that can help us in making decision

tensorflow.org

Weight Quantization:

The very simple post-training quantization is quantizing only weights from FP to 8 bit precision. This option is available with TFLite converter. At inference, weights are converted from 8-bits of precision to floating-point and computed using floating-point kernels. This conversion is done once and cached to reduce latency. If you want to improve latency further use of a hybrid operator. 

import tensorflow as tfconverter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]tflite_quant_model = converter.convert()

At the time of conversion, set the optimizations flag to optimize for model size.

This optimization provides latencies close to fully fixed-point inference. but, the outputs are still stored using floating-point.

Full integer quantization:

We can get further latency improvements, reductions in peak memory usage, and access to an integer only hardware accelerators by making sure all model math is quantized. In full integer quantization, you need to measure the dynamic range of activations and inputs by supplying data sets, create a dataset using an input data generator.

import tensorflow as tfdef representative_dataset_gen():  for _ in range(num_calibration_steps):    yield [input]
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)converter.optimizations = [tf.lite.Optimize.DEFAULT]converter.representative_dataset = representative_dataset_gentflite_quant_model = converter.convert()

The result of full integer quantization should be full quantized, any ops don’t have quantized implementation left in FP. Full integer-only execution gets a model with even faster latency, smaller size, and integer-only accelerators compatible model.

you can enforce full integer quantization for all ops and use integer input and output by adding the following lines before you convert.

The converter throw an error if it encounters an operation it cannot currently quantize.

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8converter.inference_output_type = tf.uint8

Float 16 Quantization example:

The IEEE standard for 16-bit floating-point numbers. We can reduce the size of a floating-point model by quantizing the weights to float16. This technique reduces the model size by half with minimal loss of accuracy as compared to other techniques. This technique model will “dequantize” the weights values to float32 when running on the CPU.

import tensorflow as tfconverter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)converter.optimizations = [tf.lite.Optimize.DEFAULT]converter.target_spec.supported_types = [tf.lite.constants.FLOAT16]tflite_quant_model = converter.convert()

We have seen a different technique in post-training quantization: The float 16 quantization may not be a good choice if you need maximum performance. A Full integer quantization to fixed-point math would be better in that case. Weight quantization is a very basic quantization. Since weights are quantized post-training, there could be an accuracy loss, particularly for smaller networks.

Tensorflow Lite model accuracy

Quantization aware Training:

There could be an accuracy loss in a post-training model quantization and to avoid this and if you don’t want to compromise the model accuracy do quantization aware training. As we have learned the post-training quantization technique is after the model has been trained. To overcome post-training quantization technique drawbacks we have quantization aware model training. This technique ensures that the forward pass matches precision for both training and inference. In this technique Tensorflow created flow, wherein the process of constructing the graph you can insert fake nodes in each layer, to simulate the effect of quantization in the forward and backward passes and to learn ranges in the training process, for each layer separately.

There are two aspects of this technique

  • Operator fusion at inference time is accurately modeled at training time.
  • Quantization effects at inference are modeled at training time.
tf.quantization.quantize(    input,    min_range,    max_range,    T,    mode='MIN_COMBINED',    round_mode='HALF_AWAY_FROM_ZERO',    name=None)
out[i] = (in[i] - min_range) * range(T) / (max_range - min_range)if T == qint8: out[i] -= (range(T) + 1) / 2.0
num_discrete_values = 1 << (# of bits in T)range_adjust = num_discrete_values / (num_discrete_values - 1)range = (range_max - range_min) * range_adjustrange_scale = num_discrete_values / rangequantized = round(input * range_scale) - round(range_min * range_scale) +  numeric_limits<T>::min()quantized = max(quantized, numeric_limits<T>::min())quantized = min(quantized, numeric_limits<T>::max())

Check the complete example here:

References:-

https://www.tensorflow.org/lite/convert/quantization

https://github.com/tensorflow/tensorflow/tree/r1.13/tensorflow/contrib/quantize

Tensorflow Lite Converter Example!!

Let us deploy Deep learning TensorFlow model on edge devices using TF Lite. 

There are three different ways we can use TensorFlow lite converter

  1. Convert TF SaveModel to TF Lite 
  2. Convert Keras PreBuilt Model to TF Lite
  3. Concrete Function to TF Lite
  1. Convert TF SaveModel to TF Lite:- 

Let us create a simple model using TensorFlow and save that model using the TF SaveModel. To develop this model we will use TensorFlow API. In this example, we will show how to convert SaveModel into TF Lite FlatBuffer.

# we will train 
import tensorflow as tf# Construct a basic TF model.root = tf.train.Checkpoint()root.v1 = tf.Variable(3.)root.v2 = tf.Variable(2.)root.f = tf.function(lambda x: root.v1 * root.v2 * x)
# Save the model into temp directoryexport_dir = "/tmp/test_saved_model"input_data = tf.constant(1., shape=[1, 1])to_save = root.f.get_concrete_function(input_data)tf.saved_model.save(root, export_dir, to_save)
# Convert the model into TF Lite.converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)tflite_model = converter.convert()
#save model 
tflite_model_files = pathlib.Path(‘/tmp/save_model_tflite.tflite’)
tflite_model_file.write_bytes(tflite_model)

2. Convert Keras PreBuilt Model to TF Lite:-

In this section, we have explored how to convert the prebuilt Keras model into the TF lite model. We will run inference on a pre-trained tf.keras MobileNet model to TensorFlow Lite.

import numpy as npimport tensorflow as tf
# Load the MobileNet keras model.# we will create tf.keras model by loading pretrained model on #imagenet dataset
model = tf.keras.applications.MobileNetV2(    weights="imagenet", input_shape=(224, 224, 3))
# here we pretrained model no need use SaveModel 
# here we will pass model directly to TFLiteConverter
converter = tf.lite.TFLiteConverter.from_keras_model(model)tflite_model = converter.convert()

#if you want to save the TF Lite model use below steps or else skip
tflite_model_files = pathlib.Path(‘/tmp/pretrainedmodel.tflite’)
tflite_model_file.write_bytes(tflite_model)
# Load TFLite model using interpreter and allocate tensors.interpreter = tf.lite.Interpreter(model_content=tflite_model)interpreter.allocate_tensors()

3. Concrete Function to TF Lite:- 

In order to convert TensorFlow 2.0 models to TensorFlow Lite, the model needs to be exported as a concrete function. If you have developed your model using TF 2.0 then this is for you. We will convert concrete function into the TF Lite model. In this section also we will use the Keras MobileNet model.

import tensorflow as tf
# load mobilenet model of keras 
model = tf.keras.applications.MobileNetV2(weights="imagenet", input_shape=(224, 224, 3))

We will tf.function to create a callable tensorflow graph of our model.

#get callable graph from model. 
run_model = tf.function(lambda x: model(x))
# to get the concrete function from callable graph 
concrete_funct = run_model.get_concrete_function(tf.Tensorpec(model.inputs[0].shape, model.inputs[0].dtype))

#convert concrete function into TF Lite model using TFLiteConverter
converter =  tf.lite.TFLiteConverter.from_concrete_functions([concrete_funct])
tflite_model = converte.convert()
#save model 
tflite_model_files = pathlib.Path(‘/tmp/concretefunc_model.tflite’)
tflite_model_file.write_bytes(tflite_model)

CLI TF Lite Converter:-

Apart from this python API we can also use Command Line Interface to convert model. TF lite converter to convert SaveModel to the TFLite model.

The TensorFlow Lite Converter has a command-line tool tflite_convert which supports basic models.

#! /usr/bin/env/  bash
tflite_convert = --saved_model_dir=/tmp/mobilenet_saved_model \
--output_file=/tmp/mobilenet.tflite

 --output_file. Type: string. Specifies the full path of the output file.

--saved_model_dir. Type: string. Specifies the full path to the directory containing the SavedModel generated in 1.X or 2.X.

 --keras_model_file. Type: string. Specifies the full path of the HDF5 file containing the tf.keras model generated in 1.X or 2.X.

#! /usr/bin/env/  bash
tflite_convert = --keras_model_file=model.h5 \
--output_file=/tmp/mobilenet_keras.tflite

The converter supports SavedModel directories, tf.keras models, and concrete functions.

For now, we will end off with these options only. Next article we will explore converting RNN model and Quantized Models.

Tensorflow Lite Model Deployment!

Here you go — — Introduction Story of Tensorflow Lite

In the above article, we introduced TensorFlow lite. What is TensorFlow lite and what is the purpose of it and what is TensorFlow lite is not.

In this article, we will dig deeper and steps involved in the TensorFlow lite model deployment. 

The above diagram states the deployment flow of Tensorflow lite mode at the edge devices.

Let us go through the steps from the top of the diagram.

Very high level convert this diagram into two functionality first step is converter and second, is the interpreter or inference the model.

  1. Train Model:- 

Train your model using TensorFlow. We can train our model using any high-level TensorFlow API such as Keras or either you have a legacy TensorFlow model. You can train our model using high-level API like Keras or low-level API. You can develop your own model or use TensorFlow inbuilt model. 

If you have any other model also you can convert those models into TensorFlow using ONNX and use it. Once the model is ready you have save that model. We can save our model in a different format based on APIs such as HDF5, SavedModel or FrozenGraphDef.

2. Convert Model:- 

In this step, we are actually using the Tensorflow Lite converter to convert the TensorFlow model into the TensorFlow lite flatbuffer format.

FlatBuffers is a special data serialization format that is optimized for performance. Tensorflow Lite flatbuffer aka TF Lite model. The TensorFlow Lite converter takes a TensorFlow model and generates a TensorFlow Lite FlatBuffer file (.tflite). The converter supports SavedModel directories, tf.keras models, and concrete functions. Now our TFLite model is ready.

You can convert a model using the Python API or command-line tool. CLI support very basic models.

Python API example:- 

//export_dir is the path of your TF model is saved.
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)tflite_model = converter.convert()

CLI example 

bazel run //tensorflow/lite/python:tflite_convert -- \  --saved_model_dir=/tmp/mobilenet_saved_model \  --output_file=/tmp/mobilenet.tflite

3. Deploy Model:-

Now our model is ready and we have ‘.tflite’ file. We can deploy this to IoT devices, embedded devices or mobile devices. We can 

4. Deploy model:-

To perform inference with a TensorFlow Lite model, you must run it through an interpreter. TensorFlow Lite model serves on a device using an interpreter. TensorFlow Lite interpreter provides a wide range of interfaces and supports a wide range of devices. The TensorFlow Lite interpreter is designed to be lean and fast. We can run models locally on these devices using the Tensorflow Lite interpreter. Once this model gets loaded into devices such as embedded devices, Android or iOS devices. Once a device is deployed then take inference. 

The inferencing model goes through the below steps in generally. 

a. Loading a model:- You must load .tflite model file into memory.

b. Transforming data:- Raw input data for the model generally does not much input data format expected by the model. You need to transform the data.

c. Running inference:- Execute inference over transformed data.

d. Interpreting output:- When you receive results from the model inference, you must interpret the tensors in a meaningful way that’s useful in your application.

PreTrained Models

https://www.tensorflow.org/lite/models

https://www.tensorflow.org/lite/guide/roadmap

Some Examples

https://www.tensorflow.org/lite/examples

Tensorflow Lite- machine learning at the edge!!

Tensorflow created a buzz in AI and deep learning forum and TensorFlow is the most popular framework in the deep learning community. 

tensorflow.org

Introduction:- 

As we know that to train deep learning models we need to compute power and this age of computation. Now we are moving with cloud computing along with edge computing. Edge computing is the need of today’s world because of innovation in the IoT domain and due to compliance and data protection laws enforcing companies to do computation and the edge side instead of computing model in the cloud and sending the result back to a client device is now the legacy.

As TensorFlow is the most popular deep-learning framework. It comes with its lite weight version for edge computation. Now a day’s mobile devices have good processing power but edge devices have less power.

Train deep learning model in less than 100KB.

The official definition of Tensorflow Lite:

“TensorFlow Lite is an open-source deep learning framework for on-device inference.”

Deploy machine learning models on mobile and IoT devices.

Tensorflow Lite is package of tools to help developers to run TensorFlow models on mobile, embedded devices, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size.

Tensorflow Lite is providing machine learning at the edge devices.

Edge computing means compute at local.

Deep Dive:-

This diagram illustrates the standard flow for deploying the model using TensorFlow Lite.

Deploying model using TensorFlow Lite at the edge devices

Tensorflow Lite is not a separate deep learning framework, it is providing a set of tools that will help developers run TensorFlow models or any other deep learning models on mobile, embedded and IoT devices.

Steps:-

  1. Choose Model or develop your own model.
  2. Choose Model
  3. Convert the Model
  4. Deploy the Model
  5. Run the inference with the Model
  6. Optimize the Model and repeat the above steps.

Tensorflow Lite consists of two main components

  1. Converter:- Tensorflow Lite Converter converts the TensorFlow model into the TensorFlow lite model.
  2. Interpreter:- It is supporting a set of core operators that are optimized for on-device applications and with a small binary size. It is basically for inferencing the model.

Why Edge Computing?

Edge computing is really best to use case along with cloud computing. Nowadays cloud computing becomes crazy but there are a certain requirement where edge computation will beat cloud computing. Why edge computation is more important and what is advantage you will derive from this.

  1. Privacy:- No data needs to leave the device. Everything is there only.
  2. Latency:- There’s no back and forth request to a server.
  3. Connectivity:- Internet connection not required
  4. Power Consumption:- Connecting to a network requires power.

Tensorflow Lite is the one-stop solution to convert your deep learning model and deploy efficiently and enjoy inferencing. TensorFlow lite supports both mobile devices and microcontrollers. 

Colab getting started!!

Train deep neural network free using google colaboratory.

GPU and TPU compute for free? Are you kidding?

Google Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser. If you don’t have money to procure GPU and want to train neural network or want to makes hands dirty with zero investment then this if for you. Colab is a Google internal research tool for data science.

You can use GPU as a backend for free for 12 hours at a time.

It supports Python 2.7 and 3.6, but not R or Scala yet.

Many people want to train some machine learning model or deep learning model but playing with this requires GPU computation and huge resources that blocking many people to try out these things and make hands dirty.

Google Colab is nothing but cloud-hosted jupyter notebook.

Colaboratory is a free Jupyter notebook environment provided by Google where you can use free GPUs and TPUs which can solve all these issues. The best thing about colab is TPUs (tensor processing unity) the special hardware designed by google to process tensor.

Let’s Start:- 

 To start with this you should know jupyter notebook and should have a google account. 

http://colab.research.google.com/

Click on the above link to access google colaboratory. This is not only a static page but an interactive environment that lets you write and execute code in Python and other languages. You can create a new Jupyter notebook by File →New python3 notebook. clicking New Python3 Notebook or New Python2 Notebook.

We will create one python3 notebook and it will create one for us save it on google drive. 

Colab is an ideal way to start everything from improving your Python coding skills to working with deep learning frameworks, like PyTorch, Keras, and TensorFlow and you can install any Python package which is require for your python coding like from simple sklearn, numpy too TensorFlow. 

You can create notebooks in Colab, upload existing notebooks, store notebooks, share notebooks with anyone, mount your Google Drive and use whatever you’ve got stored in there, import most of your directories, upload notebooks directly from GitHub, upload Kaggle files, download your notebooks, and do whatever your doing with your local jupyter notebook.

On the top right you can choose to connect to hosted runtime or connect to local runtime

Set up GPU or TPU:-

It’s very simple and straight forward as going to the “runtime” dropdown menu, selecting “change runtime type” and selecting GPU/TPU in the hardware accelerator drop-down menu!

Now you can start coding and start executing your code !!

How to install a framework or libraries?

It’s as simple as writing import statement in python!.

!pip install fastai

use normal pip install command to install different packages like TensorFlow or PyTorch and start playing with it.

For more details and information

https://colab.research.google.com/notebooks/welcome.ipynb#scrollTo=gJr_9dXGpJ05

https://colab.research.google.com/notebooks/welcome.ipynb#scrollTo=-Rh3-Vt9Nev9

CNN (Convolutional Neural network) using Gluon

CNN (Convolutional Neural network) using Gluon

Introduction:

Convolutional Neural Network is deep learning networks, which have achieved an excellent result on images recognition, images classifications. objects detections, face recognition, etc. CNN is everywhere and its most popular deep learning architecture. CNN is majorly used in solving the image data challenge and video analytics too. Any data that has spatial relationships are ripe for applying CNN.

In the previous chapter, we covered the basic machine learning techniques or algorithms to solve regression and classification problem. In this chapter, we will explore the deep learning architecture such as CNN (Convolutional Neural Network). CNN’s are a biologically inspired variant of MLPs. CNN aka ConvNet, in this chapter we will use this term interchangeably. In this chapter, we will explore the below points.

  • Introduction of CNN
  • CNN architecture
  • Gluon API for CNN
  • CNN implementation with gluon
  • Image segmentation using CNN

CNN Architecture:

CNN’s are regularised version of multilayer perceptrons. MLPs are the fully connected neural networks, means each neuron in one layer has a connection to all neuron in the next layer. CNN’s design inspired by vision processing of living organisms. Without conscious effort, we make predictions about everything we see and act upon them. When we see something, we label every object based on what we have learned in the past

Hubel and Wiesel in the 1950s and 1960s showed that the How cat’s visual cortex work. The animal visual cortex is the most powerful visual processing system in existence. As we all know that the visual cortex contains a complex arrangement of cells. These cells are sensitive to small sub-regions of the visual field, called a receptive field. The sub-regions are tiled to cover the entire visual field. These cells act as your local filters over the input space and are well-suited to exploit the strong spatially local correlation present in natural images. This is just a higher level intro of How cortex work. CNN is designed to recognize visual patterns directly from pixel images with minimal preprocessing.

Now let us make things simple, think about how our brain thinks and the human brain is a very powerful machine. Everyone works differently and it’s clear that we all have our own ways of learning and taking in new information. “A picture is worth a thousand words” is an English language adage. It refers to the notion that a complex idea can be conveyed with just a single picture, this picture conveys its meaning or essence more effectively than a description does. We see plenty of images every day, our brain is processing them and store them. But what about the machine, how a machine can understand, process and store meaningful insight from that image. In simple term, each image is an arrangement of a pixel, arranged in a special order. If some order or color get changed that effect the image as well. From the above explanation, you can understand that images in machine represent and processed in the form of pixels. Before CNN’s comes into the form it’s very hard to do image processing. Scientists around the world have been trying to find different ways to make computers to extract meaning from visual data(image, video) for about 60+ years from now, and the history of CV (Computer Vision), which is deeply fascinating.

The most fascinating paper was published by two neurophysiologists — David Hubel and Torsten Wiesel — in 1959 as I mentioned above the paper titled was “Receptive fields of single neurons in the cat’s striate cortex”. This duo ran pretty experiments over a cat. They placed electrodes into the primary visual cortex area of an anesthetized cat’s brain and observed, or at least tried to, the neuronal activity in that region while showing the animal various images. Their first efforts were fruitless; they couldn’t get the nerve cells to respond to anything. After a few months of research, they noticed accidentally they caught that one neuron fired as they were slipping a new slide into the projector. Hubel and Wiesel realized that what got the neuron excited was the movement of the line created by the shadow of the sharp edge of the glass slide.

[Image Source: https://commons.wikimedia.org/wiki/File:Human_visual_pathway.svg]

The researchers observed, through their experimentation, that there are simple and complex neurons in the primary visual cortex and that visual processing always starts with simple structures such as oriented edges. This is the much simpler and familiar explanation. The invention does not happen overnight it took years and its evolutionary process to get the groundbreaking the result.

After Hubel and Wiesel there is nothing happen groundbreaking on their idea for a long time. In 1982, David Marr, a British neuroscientist, published another influential paper — “Vision: A computational investigation into the human representation and processing of visual information”. David gave us the next important insight i.e. vision is hierarchical. David introduced a framework for a vision where low-level algorithms that detect edges, curves, corners, etc., and that are used as stepping stones towards to form a high-level understanding of the image.

David Marr’s representational framework:

  • A Primal Sketch of an image, where edges, bars, boundaries, etc., are represented (inspired by Hubel and Wiesel’s research);
  • A 2½D sketch representation where surfaces, information about depth and discontinuities on an image are pieced together;
  • A 3D model that is hierarchically organized in terms of surface and volumetric primitives.

Davids framework was very abstract and high-level and there is no mathematical modeling was given that could be used in artificial learning. It’s a hypothesis. At the same time, Japanese computer scientist, Kunihiko Fukushima, also developed a framework inspired by Hubel and Wiesel. This method is a self-organizing artificial network of simple and complex cells that could recognize patterns and be unaffected by position shifts. The network is Neocognitron included several convolutional layers and whose receptive fields had weight. Fukushima’s Neocognitron the first ever deep neural network and it is a grandfather of today’s convents. And a few years later in 1989, a French scientist Yann LeCun applied a backpropagation style learning algorithm to Fukushima’s neocognitron architecture. After a few more trails and error and Yann released LeNet-5. LeCun applied his architecture and developed and released a commercial product for reading zip codes. Around 1999, scientist and researchers trying to do visual data analysis using Marr’s proposed method instead of feature-based object recognition.

This is just a brief overview and important milestones we have covered that will help us to understand How CNN was evolved. Let us talk about CNN’s architecture, like an every artificial neural network architecture this also having input, hidden layers and output layer. The hidden layers consist of a series of convolutional layers that convolve with multiplication or other dot product. CNN’s are a specialized kind of neural network for processing data that has a grid like a topology, like time series data, which can be thought as one-dimensional array (vector) grid taking samples at regular time intervals but image data can be thought of as a 2-D grid of pixels (matrix). The name “Convolutional neural network” indicates that the network employs a mathematical operation called convolution. Arranging the image in the 2-D grid of pixels is depending on the whether we are looking at a black and white or color image, we might have either one or multiple numerical values corresponding to each pixel. CNN-based neural network architectures now dominate the field of computer vision to such a level that hardly anyone these days would develop a commercial application or enter a competition or hackathon related to image recognition, object detection, or semantic segmentation, without basing their approach on them. There are so many modern CNN networks owe their designs to inspirations from biology. CNNs are very good in strong predictive performance and tend to be computationally efficient because easy to parallelize and has very fewer inputs as compared to a dense layer. If we use a fully connected neural network to deal with the image recognization then we need a huge number of parameters and hidden layers to address this. let us consider we have an image of 28*28*3 then the total number of weights in the hidden layer will be 2352 and it will lead to overfitting that’s why we are not using a fully connected neural network to process image data.

In the convolutional neural network, the neuron in the layer will be connected to a small region of the layer. CNN the neuron in a layer will only be connected small region of the layer before it, instead of all the neuron in a fully connected network.

The above fig shows the general architecture of CNNs. CNN is a type of feed forward artificial neural network in which the connectivity pattern between the neurons inspired by the animal visual cortex. The basic idea is that some of the neurons from the cortex will fire when exposed horizontal and some cortex will fire when exposed vertically and similarly some will fire when exposed diagonal edges and this the motivation behind the connectivity pattern.

In general, CNN has four layers.

  1. Convolution layer
  2. Max Pooling layer
  3. ReLU layer
  4. Fully connected

The main problem with image data is, images won’t always have the same images. There can be certain deformations. Similarly to how a child recognize objects, we can show a child a dog with black color and we told him this is a dog and on the next day when some other pet with black color comes to our house with four legs He has recognized with dog but actual that is not a dog and its goat. Similarly, we have to show some samples to find a common pattern to identify the objects. We have to show millions of pictures to an algorithm to understand the data and detect the object, with the help of these millions of a records algorithm can generalize the inputs and make predictions for the new observations.

Machine see in a different way than humans do. Their world consists of only 0’s and 1’s. CNNs have a different architecture than regular artificial neural networks. In the regular full connected neural network, we putting the input through the series of hidden layers and reach to the fully connected output layer that represents the predictions. CNNs following a bit different approach. All the layers of CNNs are organized in 3 dimensions: width, height, and depth and neurons in the one layer do not connect to all neurons in the next layer but only the small portion of it and the output layer will be the reduced to a single vector of probability scores, organized along the depth dimension. Below fig, illustrate NN(neural network) vs CNN.

As we said earlier, the output can be a single class or a probability of classes that best describes the image. Now, the hard part is understanding what each of these layers does. Let us understand this.

CNNs have two components

  1. Feature extraction part (The hidden layers): The hidden layer perform a series of convolutions and pooling operations during which the features are detected. If you had a picture of a human face, this is the part of where the network would recognize two eyes, nose, lips, and nose, etc.
  2. The Classification part (Fully connected output layer): As we said last classification layer is fully connected layers will serve as a classifier on top of extracted features.

Convolution layer:

Convolution layer is the main building blocks of CNN, as we said convolution refers to the combination of two mathematical functions to produce a third function. Convolution is performed on the input data with the use of filters or kernels ( filters or kernels term people use interchangeably). Apply filters over the input data to produce a feature map. Convolution is sliding over the input. At each and every location, matrix multiplication is performed and sums the result into the feature map.

Note that in the above example an image is 2 dimensional with width and height (black and white image). If the image is colored, it is considered to have one more dimension for RGB color. For that reason, 2-D convolutions are usually used for black and white images, while 3-D convolutions are used for colored images. Let us start with (5*5) input image with no padding and we use a (3*3) convolution filter to get an output image. In the first step, the filter sliding over the matrix and in the filter each element is multiplied with an element in the corresponding location. Then you sum all the results, which is one output value. Then, you repeat this process the same step by moving the filter by one column. And you get the second output. The step size as the filter slides across the image is called a stride. In this example Here, the stride is 1. The same operation is repeated to get the third output. A stride size greater than 1 will always downsize the image. If the size is 1, the size of the image will stay the same. In the above operation, we have shown you the operation in 2D, but in real life applications mostly, convolutions are performed in a 3D matrix with a dimension for width, height, width. Depth is a dimension because of the colors channels used in an image (Red Green Blue).

We perform a number of convolutions on our input matrix and for each operation uses a different kernel (filter), the result does store in feature maps. All feature maps put into a bucket together as a final output of convolutional layer. CNNs uses ReLU is the activation function and output of the convolution passed through the activation function. As I mentioned early in the paragraph the convolution filter can slide over the input matrix. Stride is the decisive steps in a specified direction. Stride is the size of the step the convolution filter moves each time. In general, people refer to stride value as 1, meaning the filter slides pixel by pixel.

The animation above shows stride size 1. Increasing the stride size, your filter is sliding over the input with a larger gap and thus has less overlap between the cells. The size of the feature map is always less than the input matrix and this leads to shrinking our feature map. To prevent shrinking of our feature map matrix we use padding. Padding means a layer of zero value pixels is added to surround the input with zeros. Padding helps us to improve performance, makes sure the kernel and stride size will fit in the input and also keeping the spatial size constant after performing convolution.

Max Pooling layer:

After the convolution operation, the next operation is pooling layer. Max pooling is a sample-based discretization process. If you can see the first diagram in that after every convolution layer there is max pooling layer. Max pooling layer is useful to controls the overfitting and shortens the training time. The pooling function continuously reduce the dimensionality to reduce the number of parameters and number of computation in the network. Max pooling is done by applying a max filter to usually non-overlapping subregions of the initial representation. It reduces the computational cost by reducing the number of parameters to learn and provides basic translation invariance to the internal representation.

Let’s say we have a 4×4 matrix representing our initial input.
Let’s say, as well, that we have a 2×2 filter that we’ll run over our input. We’ll have a stride of 2 (meaning the (dx, dy) for stepping over our input will be (2, 2)) and won’t overlap regions. For each of the regions represented by the filter, we will take the max of that region and create a new, output matrix where each element is the max of a region in the original input.

Max Pooling takes the maximum value in each window. These window sizes need to be specified beforehand. This decreases the feature map size while at the same time keeping the significant information.

ReLU layer:

The Rectified Linear Unit(ReLU) has become very popular in the last few years. ReLU is activation function similarly we have been using different activation function is a different artificial neural network. Activation function aka transfer function. The ReLU is the most used activation function in the world right now. Since it is used in almost all the convolutional neural networks or deep learning.

The ReLU function is ?(?)=max(0,?). As you can see, the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero.

The ReLUs range is from 0 to infinity. ReLUs improve neural networks is by speeding up training. ReLU is idempotent. ReLU is the max function(x,0) with input x e.g. matrix from a convolved image. ReLU then sets all negative values in the matrix x to zero and all other values are kept constant. ReLU is executed after the convolution and therefore a nonlinear activation function like tanh or sigmoid. Each activation function takes a single number and performs a certain fixed mathematical operation on it. In simple words, the rectifier function does to an image like this is remove all the black elements from it keeping only positive value. We expect that any positive value will be returned unchanged whereas an input value of 0 or a negative value will be turned as the value 0. ReLU can allow your model to account for non-linearities and interactions so well. In gluon API we can use ReLU as inbuild implementation from Gluon.

net.add(gluon.nn.Dense(64, activation="relu"))

We can use a simple sample code of the ReLU function.

# rectified linear function
def rectified(x):
  return max(0.0, x)

Fully connected layer:

The fully connected layer is the fully connected neural network layer. This is also referred to as the classification layer. After completion of convolutional, ReLU and max-pooling layers, the classification part consists of a few fully connected layers. The fully connected layers can only accept 1 -Dimensional data. To convert our 3-D data to 1-D, we use the function in Python. This essentially arranges our 3-D volume into a 1-D vector.

This layer gives or returns us the output which is probabilistic value.

Types of CNN Architectures:

In the above section, we explained CNN general architecture but there are different flavors of CNN based some different combinations of layers. Let us try to explore those some useful and famous CNNs architectural style to solve some complex problem. CNNs are designed o recognize the visual patterns with minimal preprocessing from pixel images. The ImageNet project is a large visual database designed for object recognization research. This project runs an annual software contest the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programmer, researcher compete to correctly detect objects. In this section, we are exploring CNN architectures of ILSVRC top competitors.

Let us look into this picture this will give you a broad overview of how evaluation happen.

1. LeNet-5 — Leun et al

LeNet-5 is a 7 layer Convolutional neural network by LeCun et al in 1998. This was deployed in real life financial banking project to recognize handwritten digits on cheques. Image digitized in 32×32 pixel greyscale input images. The ability to process higher resolution images requires larger and more convolutional layers, so this technique is constrained by the availability of computing resources. At that time, the computational capacity was limited and hence the technique wasn’t scalable to large scale images.

2. AlexNet — Krizhevsky et al

AlexNet is a Convolutional neural network by Krizhevsky et al in 2012. It is outperformed significantly in all the prior competitors and won the ILSVRC challenge by reducing the top-5 error loss from 26% to 15.3%. The network was very similar to LeNet but was much more deeper with more filters per layer and had around 60 million parameters.

It consisted of 11×11, 5×5,3×3, convolutions, max pooling, dropout, data augmentation, ReLU activations, SGD with momentum. ReLU activation layer is attached after each every convolutional & fully connected layer except the last softmax layer. The figure certainly looks a bit scary. This is because the network was split into two halves, each trained simultaneously on two different GPUs. AlexNet was trained for 6 days simultaneously on two Nvidia Geforce GTX 580 GPUs. AlexNet was designed by the SuperVision group, consisting of Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever. More simple picture

In AlexNet consist of 5 Convolutional Layers and 3 Fully Connected Layers. These 8 layers combined with two new concepts at that time — MaxPooling and ReLU activation gave their model edge results.

3. ZFNet –

The ILSVRC 2013 winner was also a CNN which is known as ZFNet. It achieved a top-5 error rate of 14.8% which is now already half of the prior mentioned non-neural error rate. They achieved this by tweaking the hyper-parameters of AlexNet while maintaining the same structure with additional Deep Learning elements. As this is similar to AlexNet and have some additional deep learning elements such as dropout, augmentation and Stochastic Gradient Descent with momentum with tweaking the hyperparameters.

4. VGGNet — Simonyan et al

The runner up of 2014 ILSVRC challenge is named VGGNet, because of the simplicity of its uniform architecture, it appeals to a simpler form of a deep convolutional neural network. VGGNet was developed by Simonyan and Zisserman. VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform architecture. The architecture is very much similar to AlexNet with only 3×3 convolutions, but lots of filters. VGGNet Trained on 4 GPUs for 2–3 weeks. The weight configuration of the VGGNet is publicly available and has been used in many other applications and challenges as a baseline feature extractor. VGGNet consists of 138 million parameters, which can be a bit challenging to handle. As the weight configurations are available publicly so, this network is one of the most used choices for feature extraction from images.

VGGNet has 2 simple rules

  1. Each Convolutional layer has configuration — kernel size = 3×3, stride = 1×1, padding = same. The only thing that differs is a number of filters.
  2. Each Max Pooling layer has configuration — windows size = 2×2 and stride = 2×2. Thus, we half the size of the image at every Pooling layer.

5. GoogLeNet/Inception –

The winner of the 2014 ILSVRC competition GoogleNet (Inception v1). achieved a top-5 error rate of 6.67% loss. GoogleNet used an inception module, a novel concept, with smaller convolutions that allowed the reduction of the number of parameters to a mere 4 million. GoogleNet was very close to the human level performance which the organizers of the challenge were now forced to evaluate. Googlenet was inspired by CNN LeNet but implemented a novel element which is nickname an inception module. It is used in batch normalization, image distortions, and RMSprop.

There are two diagrams which are here to understand and visualize GoogleNet very well.

5. ResNet — Kaiming He et al

The 2015 ILSVRC competition brought about a top-5 error rate of 3.57%, which is lower than the human error on top-5. The ResNet (Residual Network) model used by Kaiming He et al at the competition. The network introduced a novel approach called skip connections. Skip connections are also known as gated units or gated recurrent units. this technique they were able to train a NN with 152 layers while still having lower complexity than VGGNet.

It achieves a top-5 error rate of 3.57% which beats human-level performance on this dataset. ResNet has residual connections. The idea came out as a solution to an observation — Deep neural networks perform worse as we keep on adding a layer. The observation brought about a hypothesis: direct mappings are hard to learn. So instead of learning mapping between the output of the layer and its input, learn the difference between them learn the residual.

The Residual neural network uses 1×1 convolutions to increase and decrease the dimensionality of the number of channels.

CNN using Gluon:

As part of this example, we are exploring MNIST data set using CNN. This is the best example to make our hands dirty with Gluon API layer to build CNNs. There four important part we have always consider while building any CNNs.

  1. The kernel size
  2. The filter count (i.e how many filters do we want to use)
  3. Stride (how big steps of the filters)
  4. Padding

Let us deep dive into MNIST using CNN. Recognize handwritten digits using Gluon API using CNNs.

To start with the example we need MNIST data set and need to import some python, gluon module.

import mxnet as mx
import numpy as np
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn
# Select a fixed random seed for reproducibility
mx.random.seed(42)
def data_xform(data):
    """Move channel axis to the beginning, cast to float32, and normalize to [0, 1]."""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255
train_data = mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
val_data = mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)

The above code can download MNIST data set at the default location (this could be.mxnet/datasets/mnist/ in the home directory) and creates Dataset objects, training data set (train_data), and validation data set (val_data) for training and validation we need both two datasets. We can use transform_first() method, to moves the channel axis of the images to the beginning ((28, 28, 1) → (1, 28, 28)) and cast them into the float32 and rescales them from [0,255] to [0,1]. The MNIST dataset is very small that’s why we loaded that in memory.

set the context

ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)

Then we need a training data set and validation data set with batch size 1 and shuffle the training set and non-shuffle validation dataset.

conv_layer = nn.Conv2D(kernel_size=(3, 3), channels=32, in_channels=16, activation='relu')
print(conv_layer.params)

define the convolutional layer in this example we considering 2-D dataset so, this one is 2-D convolutional with ReLU activation function. CNN is a more structured weight representation. Instead of connecting all inputs to all outputs, the characteristic,

# define like a alias
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

We are using softmax cross-entropy as a loss function.

lenet = nn.HybridSequential(prefix='LeNet_')
with lenet.name_scope():
    lenet.add(
        nn.Conv2D(channels=20, kernel_size=(5, 5), activation='tanh'),
        nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
        nn.Conv2D(channels=50, kernel_size=(5, 5), activation='tanh'),
        nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
        nn.Flatten(),
        nn.Dense(500, activation='tanh'),
        nn.Dense(10, activation=None),
    )

Filters can learn to detect small local structures like edges, whereas later layers become sensitive to more and more global structures. Since images often contain a rich set of such features, it is customary to have each convolution layer employ and learn many different filters in parallel, so as to detect many different image features on their respective scales. It’s good to have a more than one filter and do apply filters in parallel. The above code defines a CNN architecture called LeNet. The LeNet architecture is a popular network known to work well on digit classification tasks. We will use a version that differs slightly from the original in the usage of tanh activations instead of sigmoid.

Likewise, input can already have multiple channels. In the above example, the convolution layer takes an input image with 16 channels and maps it to an image with 32 channels by convolving each of the input channels with a different set of 32 filters and then summing over the 16 input channels. Therefore, the total number of filter parameters in the convolution layer is channels * in_channels * prod(kernel_size), which amounts to 4608 in the above example. Another characteristic feature of CNNs is the usage of pooling, means summarizing patches to a single number. This step lowers the computational burden of training the network, but the main motivation for pooling is the assumption that it makes the network less sensitive to small translations, rotations or deformations of the image. Popular pooling strategies are max-pooling and average-pooling, and they are usually performed after convolution.

lenet.initialize(mx.init.Xavier(), ctx=ctx)
lenet.summary(nd.zeros((1, 1, 28, 28), ctx=ctx))

the summary() method can be a great help, it requires the network parameters to be initialized, and an input array to infer the sizes.

output:- 
--------------------------------------------------------------------------------
        Layer (type)                                Output Shape         Param #
================================================================================
               Input                              (1, 1, 28, 28)               0
        Activation-1                <Symbol eNet_conv0_tanh_fwd>               0
        Activation-2                             (1, 20, 24, 24)               0
            Conv2D-3                             (1, 20, 24, 24)             520
         MaxPool2D-4                             (1, 20, 12, 12)               0
        Activation-5                <Symbol eNet_conv1_tanh_fwd>               0
        Activation-6                               (1, 50, 8, 8)               0
            Conv2D-7                               (1, 50, 8, 8)           25050
         MaxPool2D-8                               (1, 50, 4, 4)               0
           Flatten-9                                    (1, 800)               0
       Activation-10               <Symbol eNet_dense0_tanh_fwd>               0
       Activation-11                                    (1, 500)               0
            Dense-12                                    (1, 500)          400500
            Dense-13                                     (1, 10)            5010
================================================================================
Parameters in forward computation graph, duplicate included
   Total params: 431080
   Trainable params: 431080
   Non-trainable params: 0
Shared params in forward computation graph: 0
Unique parameters in model: 431080

First conv + pooling layer in LeNet.

Now we train LeNet with similar hyperparameters as learning rate 0.04, etc. Note that it is advisable to use a GPU if possible since this model is significantly more computationally demanding to evaluate and train.

trainer = gluon.Trainer(
    params=lenet.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.04},
)
metric = mx.metric.Accuracy()
num_epochs = 10
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        inputs = inputs.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        with autograd.record():
            outputs = lenet(inputs)
            loss = loss_function(outputs, labels)
        loss.backward()
        metric.update(labels, outputs)
        trainer.step(batch_size=inputs.shape[0])
    name, acc = metric.get()
    print('After epoch {}: {} = {}'.format(epoch + 1, name, acc))
    metric.reset()
for inputs, labels in val_loader:
    inputs = inputs.as_in_context(ctx)
    labels = labels.as_in_context(ctx)
    metric.update(labels, lenet(inputs))
print('Validaton: {} = {}'.format(*metric.get()))
assert metric.get()[1] > 0.985

Let us visualize the network accuracy. Some wrong predictions on the training and validation set.

def get_mislabeled(loader):
    """Return list of ``(input, pred_lbl, true_lbl)`` for mislabeled samples."""
    mislabeled = []
    for inputs, labels in loader:
        inputs = inputs.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        outputs = lenet(inputs)
        # Predicted label is the index is where the output is maximal
        preds = nd.argmax(outputs, axis=1)
        for i, p, l in zip(inputs, preds, labels):
            p, l = int(p.asscalar()), int(l.asscalar())
            if p != l:
                mislabeled.append((i.asnumpy(), p, l))
    return mislabeled
import numpy as np
sample_size = 8
wrong_train = get_mislabeled(train_loader)
wrong_val = get_mislabeled(val_loader)
wrong_train_sample = [wrong_train[i] for i in np.random.randint(0, len(wrong_train), size=sample_size)]
wrong_val_sample = [wrong_val[i] for i in np.random.randint(0, len(wrong_val), size=sample_size)]
import matplotlib.pyplot as plt
fig, axs = plt.subplots(ncols=sample_size)
for ax, (img, pred, lbl) in zip(axs, wrong_train_sample):
    fig.set_size_inches(18, 4)
    fig.suptitle("Sample of wrong predictions in the training set", fontsize=20)
    ax.imshow(img[0], cmap="gray")
    ax.set_title("Predicted: {}\nActual: {}".format(pred, lbl))
    ax.xaxis.set_visible(False)
    ax.yaxis.set_visible(False)
fig, axs = plt.subplots(ncols=sample_size)
for ax, (img, pred, lbl) in zip(axs, wrong_val_sample):
    fig.set_size_inches(18, 4)
    fig.suptitle("Sample of wrong predictions in the validation set", fontsize=20)
    ax.imshow(img[0], cmap="gray")
    ax.set_title("Predicted: {}\nActual: {}".format(pred, lbl))
    ax.xaxis.set_visible(False)
    ax.yaxis.set_visible(False)