Micronauts Launch: The best way to getting started.

Micronaut has launched a website to generate the Micronauts project using the website without installing the Micronauts CLI SDK.

In a couple of blogs, we have seen about Micronauts if you don’t know you can check below blog post:

Micronaut is very much similar to the spring framework. Micronaut took inference from Spring framework and most the API are in sync only, that’s why adopting Micronauts for spring developer is very easy. As we have start.spring.io to start and create spring or spring boot project on the same note Micronauts also launched website aka Micronauts launch.


As we know we can generate micronaut project using CLI but we can take same advantage using website as well.

If you see here we have different options few are listed below.

  1. Application type
  2. Java version
  3. Base Package
  4. Name of application
  5. etc

1. Application type:-

Application type where we have to specify which type of application we want such as Application (web or any other application), CLI application, Serverless function, gRPC application, and Messaging application. This application type will help us to organize the dependencies.

2. Java Version:-

Java version where we have to specify on which JDK we want to develop your application, e.g Java 8,11, 14 etc.

3. Base Package:-

Base package here we have specify our package of the application under which we want organise our classes, interfaces.

e.g com.techwasti.micronaut.demo

4. Name:-

Here we have to specify name of the application.

e.g HelloworldLaunch.

5. Micronaut Version:-

Which micronaut version our application should be compatible latest one when I am writing this blog post is 2.0.0.

6. Language:-

Select which language do you want to write down the beautiful code, right now micronaut support java, kotlin and Groovy.

7. Build Tool:-

Select which build tool either from maven or gradle.

8. Test Framework:-

Here we have a choice to select the test framework anything from the list such as Junit, Spock, and kotlintest.

9. Features:

When you click on features button one popup will launch.

In features we have different groups such as cache, config, database, etc.

10. Diff:-

This is to show the difference. This is interesting option. Shows the changes that the selected features have on an application generated without any features selected.

11. Preview:-

Another best option this site provides is the preview of your project based on your selection.

The final option is to generate the project and once you click on this are getting zip file. After zip extraction, you will get below kind of structure.

Were we have docker file, build file, gitignore along with source directory structure. Download and import this in any of your IDE(eclipse, intellij) and happy coding.

This is it for now. Let me know your finding on this if any.

Micronaut with Graal native image example.

As we have seen in the last couple of articles on how to create simple Micronauts application development and dockerizing it. In this article, we are gone exploring Helloworld Graal micronaut application.

Here is the definition from Wikipedia. If you are crossing this article that means you are familiar with either of the topic.

GraalVM is a Java VM and JDK based on HotSpot/OpenJDK, implemented in Java. It supports additional programming languages and execution modes, like an ahead-of-time compilation of Java applications for fast startup and low memory footprint. The first production-ready version, GraalVM 19.0, was released in May 2019.

Let us start coding and simultaneously enjoy the topic.

Create a micronaut application using CLI:

$ mn create-app helloworld-graal --features=graal-native-image

The default option is not available to add Graal support we have to use this option  — features=graal-native-image.

If you are using Java or Kotlin and IntelliJ IDEA make sure you have enabled annotation processing.

Now let us create one simple POJO class to hold Play name to make it simple.

import io.micronaut.core.annotation.Introspected;

public class Play{

    private String name;

    public Play(String name) {
        this.name = name;

    public String getName() {
        return name;

    public void setName(String name) {
        this.name = name;

@Introspected annotation is used to generate BeanIntrospection metadata at compilation time. This information is using the render the POJO as JSON using Jackson without using reflection.

Now let us create the Singelton Service class and return play name randomly.

(Note:- The play names are Marathi Play names of famous Sri Pu la Deshpande)

import javax.inject.Singleton;
import java.util.Arrays;
import java.util.List;
import java.util.Random;

public class PlayService {
// create list of plays
    private static final List<Play> PLAYS = Arrays.asList(
            new Play("Tujhe Ahe Tujpashi"),
            new Play("Sundar Mee Honar"),
            new Play("Tee Phularani"),
            new Play("Teen Paishacha Tamasha"),
            new Play("Ek Jhunj Varyashi")
 // to choose random play from PLAYS list
    public Play randomPlay() {
        return PLAYS.get(new Random().nextInt(PLAYS.size()));

Now we need a controller to serve the request of random play name from service class.

import io.micronaut.http.annotation.Controller;
import io.micronaut.http.annotation.Get;

public class PlayController {

    private final PlayService playService;

    public PlayController(PlayService playService) {
        this.playService = playService;

    public Play randomPlay() {
        return playService.randomPlay();

Created controller and injected service object using constructor injection and mapping of GET method using @Get(“/randomplay”).

Now our application is ready you can test by executing below command.

$ ./gradlew run


JSON output 


 name: “Tee Phularani”


Let us create a Graal native image.

Micronaut only supported in Java or Kotlin for graal native-image.

While creating a project we have added — features=graal-native-image this is adding three important features. 

  1. svm(Substrate VM) and graal dependencies in build.gradle.
compileOnly "org.graalvm.nativeimage:svm"
annotationProcessor "io.micronaut:micronaut-graal"

2. A Dockerfile which can be used to construct the native image executing docker-build.sh

3. A native-image.properties file in the resource directory.

Args = -H:IncludeResources=logback.xml|application.yml|bootstrap.yml \
       -H:Name=helloworld-graal \

This is very easy for developer to create a native image inside docker. Fire below two commands: 

$ ./gradlew assemble
$ ./docker-build.sh

Once image is ready we can create a container to verify our understanding. 

$ docker run -p 8080:8080 helloworld-graal

To test the application you can use curl with time:

$ time curl localhost:8080/randomplay

This is for now. You can check the time difference with native image executable and docker with a native image. 

Source code download or clone from github: https://github.com/maheshwarLigade/micronaut-examples/tree/master/helloworld-graal

Best resources to learn Go programming language!!

Golang aka go programming language is the fastest-growing programming most loved programming language.

If you think you are not used ” go ” directly or indirectly then I think you are wrong. Have you heard about Docker containerization technology then you are indirectly using Go language in your day to day basis.

Docker is written in the Google Go programming language.

What is GoLang?

Go-Lang is an open-source language officially released by the Google production team in 2009. It was developed by Robert Griesemer, Ken Thompson and Rob Pike. It is a multi-purpose programming language specially designed to build faster and scalable applications. It provides features like fast compilation, garbage collection, dynamic types, concurrency, standard libraries, and packages.

Let us take a tour to understand what are the best resources available to get started this programming language.

1. Go Tour:-  

This is my favorite site to get started and make our hands dirty. This is the official Go Tour website: https://tour.golang.org. The best things about this site are that the tour is available offline just by running go tool tour in your command line if you have already installed Go-lang locally. This is providing an interactive tutorial where you can run your code snippet and it gives you an overview of Go-Lang. The tour is classified into different sets of modules.

2. Go By Example:- 

Another effective to start go-lang learning is going by example. Go by Example is an interactive online course tutorial for learning Go. Once you know the basic then go ahead and hit the Go by example (https://gobyexample.com). Start hacking by taking examples and get moderate knowledge about go-lang. 

3. Effective Go:-

This is another official resource to learn go-lang. This is also available for free. This is a very interesting website https://golang.org/doc/effective_go.html to explore more about the go-lang. I found it very useful especially because is not just a syntax reference document but a more complete description of all the Go features and constructs and how to use them effectively. This is where you will get some level of expertise.


Golang Bootcamp is a mini book to start learning go-lang. How to get started on Go? Hit this URL http://www.golangbootcamp.com/book/ to explore this book. This book will open a window for you to start learning effectively go-lang. The best thing about this mini-book is, it has a list of basic constructs and concepts and all those attached with go-lang playground. 

5. Go-Playground:- 

Now you know basic of go-lang language and you know how to construct the things. you no need to install go-lang locally on your system to start. We have online https://play.golang.org/ go-lang playground to test your knowledge and constructs. 

6. Go-Lang FAQ:-

Go-lang FAQ is really golden gate for you to understand the core concept and clarify your Bigbang doubts. This is also an official website https://golang.org/doc/faq.

7. Go-lang Bot:-

Golangbot is a fun and easy way to follow and learn Golang consistently and regularly. This can help you in improving your coding, solving practical issues, basics of Golang to advanced tutorials. This is inclusive of all learning materials of Golang. here you will get a different experience of learning. 

Hit this URL to go https://golangbot.com/ and start with hello world to a complex program and Quizs too.  

8. Tutorials Point:-

The tutorials point is also one of the best resources to get familiar with go-lang. if you are an avid reader and learner you should know tutorials point. 


9. Go-Lang Tutorials:-

GoLang tutorials is the best free online classes to learn go-lang. These classes best suited for professionals as well as beginners. It has a cover of the basic concept, control flow, looping, interfaces, memory management, etc. Tutorials are classifieds into sections and all sections having examples.Table of Contents
Audience Installing and configuring Go A step by step approach to Hello World in Go Updated for Go1 Typical early…golangtutorials.blogspot.com

10. Reference Books:-

  1. Introducing Go by O’Reilly.
  2. Go in Action.
  3. Learning Functional Programming in Go.


These are my findings. Please let us know your resources to learn go-lang. How you started and what are the other resources do you think are better to start learning go-lang.

CI and CD with GitHubActions!!

GitHub Actions make it easy to automate all your software workflows, now with world-class CI/CD.


Github actions are a tool to run workflow on any GitHub event.

In today’s era DevOps and Continous integration and continuous deployment. Every organization wants to become agile and develope features, build and deploy a daily or hourly basis. To do this every enterprise uses their own set of tools to watch over source version control such as git then generate a build and execute unit test cases, then functional, integration test cases after that based on threshold do some monkey testing in a simulated environment and then deploy to lower environment and then do promotion. 

The above one is the general flow in any enterprise software. Nowadays everyone wants to fail fast, this leads to come up with different tools such as Jenkins, chef, git, sonarqube and many more. Based on the coding language and deployment server you have to choose the tools. If you are using Docker or containers then different tools.

To make things simple yet powerful and efficient github come up with Github Actions. GitHub Actions features a powerful execution environment integrated into every step of your workflow. You can discover, create, and share actions to perform any job you’d like, and combine them to customize your workflow.

Build, test, and deploy your code right from GitHub. Make code reviews, branch management, and issue triaging work the way you want.

The best thing about this is you can do this in your github repository itself.

There are different sets of actions such as assign reviewers, revert commit, merge, package, publish, etc. for more details visit below pageGitHub Marketplace: actions to improve your workflow
Menu Types Categories Filters Verification An entirely new way to automate your development workflow. 1712 results…github.com

Some important points:-

  1. GitHub Actions support Node.js, Python, Java, Ruby, PHP, Go, Rust, .NET, and more.
  2. Save time with matrix workflows that simultaneously test across multiple operating systems and versions of your runtime.
  3. Run directly on a VM or inside a container. Use your own VMs, in the cloud or on-prem, with self-hosted runners. Hosted runners for every major OS.
  4. It’s one click to copy a link that highlights a specific line number to share a CI/CD failure. You will get live logs.
  5. Built-in secret store.
  6. Multi-container testing.
  7. Community-powered workflows.
  8. Write and Reuse the workflows.
  9. Built-in github package registry.
  10. Simple, pay-as-you-go pricing.


Free for open-source projects.


Actions allow us to easily test multiple versions of your project in parallel.

Getting started:-

Let us create one sample repository on your github. Go to that repository and on the top section below your repository name, you will see “Actions” as a menu click on it.

When you click on it you can see the Actions page, where you will see list of predefined templates for it.

Choose the one which is suitable for your requirements. Click on “set up this workflow” click on this button.

It will redirect you to the actual workflow page, where we have to define workflow using “yml” 

Define workflow if require add some workflow tool from the marketplace and commit the code. If you want to preview your flow click on “Preview” button which is next to the edit file. Once you commit it will in the “Actions” menu we can see workflow like below. Here we can see status here also we can define a new workflow. The different workflow may be based on the environment like dev, QA, UAT or PROD.

Check the status and enjoy coding.

Sample yml:-

    name: Test on node ${{ matrix.node_version }} and ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
        node_version: [8, 10, 12]
        os: [ubuntu-latest, windows-latest, macos-latest]

    - uses: actions/[email protected]

    - name: Use Node.js ${{ matrix.node_version }}
      uses: actions/[email protected]
        version: ${{ matrix.node_version }}
    - name: npm install, build and test
      run: |
        npm install
        npm run build --if-present
        npm test


Automating your workflow with GitHub Actions
GitHub Actions features a powerful execution environment integrated into every step of your workflow. You can discover…help.github.com

For more stories

Lets connect on Stackoverflow , LinkedIn , Facebook& Twitter.

Github Package Registry

Ship your software like a Pro!!!

For more stories

Github recently announced Github package registry to publish and consume packages over GitHub. One-stop solution for all open source project.

Why Github Package Registry?

GitHub Package Registry is a software package hosting service, similar to npmjs.org, rubygems.org, or hub.docker.com, that allows you to host your packages and code in one place. You can host software packages privately or publicly and use them as dependencies in your projects.

Over the last decade, we are using Github to maintain open-source projects. There are millions of public and private repositories are there on GitHub. Software development is a collaborative activity its teamwork. Irrespective of a language we have to publish that source code as a bundle so any other user can consume it as a dependency to do this we are always relying on different registry such as maven, Gradle, npm, and docker, etc. You can manage source code as well as your different package under one umbrella.

Github is committed to serving developers and given them different tools to improve the developer experience.

It’s your code, your packages, and one login.

Some developer did collaboration in open source either way either they will commit the code in some repository or they will import open source packages into there project. This is very critical to find out the open source packages that we can trust and import in the dependency graph. We need someone on whom we can rely on for a trust. Like while using open source packages we are always considering different aspects such as trust, community, support in terms of new features or in terms of compliance.

Github package Registry Goals:-

Github package registry launched with three main goals.

  1. Sharing:- You can share and manage your packages the way you are managing your code
  2. Productivity:- Improve your productivity, while managing software development lifecycle.
  3. Trust:– Develop, maintain and store your packages in the same secure environment with a single login.


“A picture is worth a thousand words

GitHub Package Registry is free for all repositories during the beta. And it will always be free for public and open source repositories.

To explore more please refer to this link.

Managing packages with GitHub Package Registry – GitHub HelpConfiguring Docker for use with GitHub Package Registryhelp.github.com

Implementing neural networks using Gluon API

Implementing neural networks using Gluon API

In the previous chapter, we discussed the basics of deep learning and over of Gluon API and MxNet. This chapter explains how to use Gluon API to create a different neural network by exploring Gluon API.

Gluon API is the abstraction over the mathematical computation deep learning framework MxNet. As we have discussed in the last chapter different types of machine learning and different algorithms to implement each method. As part of this chapter, we will look into linear regression, binary classification and multiclass classification using Gluon.

Neural Network using Gluon:

Gluon has a hybrid approach in deep learning programming. It supports both symbolic as well as the imperative style of programming. There are different machine learning algorithms to address different problems. As we stated in the last chapter artificial neural network is a mathematical computation representation of a human brain. It’s mathematical computation to deal with that we need to do a matrix or tensor manipulation and to do that we have Gluon API, NDArray API. Artificial Neural network contains nodes and each node has some weight and bias and the data will get transform from layer to layer up to the output layer. The middle layers between the input layer and the output layer are called hidden layers. Let us explore something:

Linear Regression:

Linear regression is a very basic algorithm in the field of machine learning. Everyone will come across to this algorithm whether you are a novice or expert machine learning engineer or data scientist. Linear regression is categorized under supervised machine learning. As the name state, linear regression used to identify the relationship between two continuous variables. In this case, there are two variables, one is predictor (independent) and another one is the dependent (response) variable. We will be able to model the relationship between two variables by fitting them in a linear equation.

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0). This is in a mathematical way.

In the above diagram, you can able to see there is a linear equation to fit these data points. On X-axis we have some data points and on Y-axis we have some data. The data points plotted in this diagram states the linear equation between two variable one is dependent and another one is independent.

To understand this, let us take one small example. There is some real-life example like predict the sale of products based on buying history. Predict the house price based on the size of a house, location of a property, amenities, demand and historical records.

There are two different types of Linear regression.

  1. Simple Linear Regression:– Simple linear regression where there are two variables one is dependent and another is an independent variable. X is used to predicate the dependent variable Y. Like Predicate the total fuel expense based on the distance in kilometers.
  2. Multiple(Multi-Variable) Linear Regression:- Multiple linear regression where we have one dependent variable and two or more independent variable. In certain case, we have two or number features that could affect the dependent variable. Like in the case of predicate the house price depends on the size of a house, location of a house, construction year, etc. There are (X1, X2,..) dependent variable to predict Y.

Let us consider the problem when we are in academics we are predicting marks based on how we solved the paper. Let us take one small example you are planning a road trip to Shimla (the city from India) as you recently watched Tripling (India Web series ) with your two siblings. You started from Pune and the total distance have to travel is 1790 km. Its long journey so you have to plan each and every expense such as fuel, meal, and halt, etc. We will take a blank paper you will put when to start and stop and how much fuel is required? how much money need to reserve for meal and hotel charges and to follow these questions you will list out those things and based on your travel car mileage and current fuel prices you can predict total paid for fuel. So, it’s a simple linear relationship between two variables, If I drive for 1790 km, how much will I pay for fuel? If you want to predict the overall expense of a trip then you can convert this simple linear regression into the complex linear regression model. Add more independent variable such as meal cost, lodging charges, other expenses, and historical data in the last trips.

This is the way we can forecast the current trip charges and plan accordingly. The core idea is to obtain a line that best fits the data. Linear Regression is the simplest and far most popular method in machine learning for problem-solving.

Linear regression using Gluon:

Linear regression is the entry pass to the journey of machine learning, given that it is a very straight forward problem and we can solve this using Gluon API. A linear equation is y=Wx+b by constructing the above graph that learns the gradient of the slope (W) and bias (b) through a number of iterations. The target of each iteration to reduce the loss between actual y and predicated y and to achieve this we want to modify the W and b, so inputs of x will give us the y we want. Let us take one small example, implement linear regression using Gluon API, In this example, we are not developing each and everything from scratch but we will take advantage of gluon API to form our implementation,

# let is importantss
import numpy as np
import mxnet as mx
from mxnet import nd, autograd, gluon
# this is for nural layers
from mxnet.gluon import nn, Trainer
# this is for data loading
from mxnet.gluon.data import DataLoader, ArrayDataset

Here is the above code black we have just imported required modules. If you observed carefully gluon API is the part of the mxnet package. We have imported ndarray to numerical tensor processing and autograd for automatic differentiation of a graph of NDArray operations. mxnet.gluon.data is the module which contains API that can help us to load and process the common public dataset such as MNIST.

from mxnet.gluon import nn, Trainer

Gluon provides nn API to define different layers of neural network and Trainer API help us to train the defined neural network. Data is an important part let us build the data set.

We start by generating our dataset, one is for

# set context for optimisation
data_ctx = mx.cpu()
model_ctx = mx.cpu()
# to generate random data
number_inputs = 2
number_outputs = 1
number_examples = 10000
def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
#generate randome records of 10000
X = nd.random_normal(shape=(number_examples, number_inputs))
noise = 0.01 * nd.random_normal(shape=(number_examples,))
y = real_fn(X) + noise

The above code can generate the dataset for the problem.

Now data is ready, load the data using DataLoader API.

batch_size = 4
train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y),
                                      batch_size=batch_size, shuffle=True)

Let us build a Neural network with two input and one output layer as we defined this using nn.Dense(1, in_units=2). It’s called a dense layer because every node in the input is connected to every node in the subsequent layer.

net = gluon.nn.Dense(1, in_units=2)
# dense layer with 2 inputs and 1 output layer
# print just weight and bias for neural network
# output of above print statements
Parameter dense6_weight (shape=(1, 2), dtype=float32)
Parameter dense6_bias (shape=(1,), dtype=float32)

The output of this weight and bias are actually not a ndArrays. They are an instance of Parameter class. We are using Parameter over NDArray for distinct reasons. Parameters can be associated with multiple contexts unlike NDArray. As we discussed in the first chapter Block is the basic building block of neural network in the Gluon, Block will take input and generate output. We can collect all parameters using net.collect_params() irrespective of how complex the neural network is. This method will return the dictionary of parameters.

Next step would be to initialization of parameter of a neural network. The initialization step is very important. In this step, we can access contexts, data and also we can feed data to a neural network.

net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)
# Deferred initialization
example_data = nd.array([[4,7]])
# access the weight and bias data
net = gluon.nn.Dense(1)
net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)

let us observe the difference net = gluon.nn.Dense(1) and the first layer code net = gluon.nn.Dense(1, in_units=2), Gluon inference the shape on parameters.

square_loss = gluon.loss.L2Loss()

Now need to optimize the neural network, Implementing Stochastic gradient descent from scratch to optimize the neural network every time better we can reuse the code gluon.Trainer, pass a parameter dictionary to optimize the network.

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

SGD is Stochastic gradient descent implementation given by Gluon, the learning rate is 0.0001 and passing a dictionary of parameters to optimize the neural network. Now we have actual y and y-pred, we want to know how far the predicted y is away from our generated y. The difference between this two y is called as a loss function and to reduce this loss we are using SGD.

epochs = 10
loss_sequence = []
num_batches = num_examples / batch_size
for e in range(epochs):
    cumulative_loss = 0
    # inner loop
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(model_ctx)
        label = label.as_in_context(model_ctx)
        with autograd.record():
            output = net(data)
            loss = square_loss(output, label)
        cumulative_loss += nd.mean(loss).asscalar()
    print("Epoch %s, loss: %s" % (e, cumulative_loss / num_examples))

Let us visualize the learning loss.

# plot the convergence of the estimated loss function 
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.figure(num=None,figsize=(8, 6))
# Adding some bells and whistles to the plot
plt.grid(True, which="both")
plt.ylabel('average loss',fontsize=14)

SGD learns the linear regression model by plotting the learning curve. The graph indicates the average loss over each epoch. Loss is getting reduced over each iteration.

Now our model is ready and everything working as expected but we need to do some sanity testing for validation purpose.

params = net.collect_params()
print('The type of "params" is a ',type(params))
# A ParameterDict is a dictionary of Parameter class objects
# we will iterate over the dictionary and print the parameters.
for param in params.values():

From this example, we can say that Gluon can help us to build quick, easy prototyping.

In this example, we used a few API that helps to build a neural network without writing everything from scratch. Gluon provides us a more concise way to express model. API is too powerful to prototype, build model quick and easy. Linear regression we can use in many real-life scenarios,

  1. Predicate the house price
  2. Predicate the weather conditions
  3. Predicate the stock price

These are just a few scenarios where you can apply linear regression to predicate the values. The predicted values in linear regression are continuous values.

Binary Classification:

In the above section, we explored linear regression with sample code. When we implemented this linear regression the output value is continuous values, but there are few real-life examples where we don’t have continuous values but we need to classification such email is spam or not or which party will be getting elected in the next elections, customer should buy an insurance policy or not. The classification problem may be binary or multiclass classification where you have more than two classes. In this type of problem, the output neurons are two or more. In classification problem the prediction values are categorical. Logistic regression is the machine learning technique used to solve such classification problems. Basically, logistic regression is an algorithm to solve a binary classification problem.

Let us consider a problem we will provide an image as an input to the neural network and output could be labeled as to whether its dog(1) or non dog(0). In supervised learning there are two types of a problem one is regression and another one is classification problem. In regression problems, the output is a rational number whereas in classification problems the output is categorical. There are different algorithms available to solve such type of classification problems such as support vector machine, discriminant analysis, naive Bayes, nearest neighbor, and logistic regression. Classification problem-solving means identifying in which of the category a new observation.

In the above diagram, you can easily categories data into two classes, one is circled another one is a cross sign. This called binary classification.

Binary classification using logistic regression:

Logistic regression is a very popular and powerful machine learning technique to solve the classification problem. Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables. Logistic regression will answer the question like how likely is it?. Then you will get the question of why are not using linear regression? We have tumor cancer dataset and each of one is malignant or not denoted by zero or one. If we use linear regression then we can construct a line to best fit for an equation y =wx +b then we can decide all values left to the line are non-malignant and values right of the line are malignant based on a threshold (ex. 0.5), what if there is an outlier means some positive class values into the negative class. we need a way to deal with outlier and logistic regression will give us that power. Logistic regression does not try to predict the rational value of a given a set of inputs. Instead, the output is a probability that the given input point belongs to a certain category and based on the threshold we can easily categorize the input observation. Logistic Regression is a type of classification algorithm involving a linear discriminant. The linear discriminant means the input space is separated into two regions by a linear boundary and model will be able to differentiate between points belonging to different category.

Logistic regression technique is useful when several independent variables on a single outcome variable. Let us consider we are watching cricket world cup matches, we want to predicate whether the match will be getting scheduled or not based on weather conditions


In the above dataset, the output is yes(1) or no(0). Here the output is categorical with two output classes that’s why this is aka as binary classification.

Let us start some code, for this example, we are considering the (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html ) with total sample 569 with 30 dimensions and two classes.

Import the required modules. Here we need sklearn python library which contains breast cancer data inbuild, we can use this dataset and apply logistic regression for binary classification.

import mxnet as mx
from mxnet import gluon, autograd, ndarray
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Load the data set and use the pandas data frame to hold the data for further processing.

# the dataset is part of below module
from sklearn.datasets import load_breast_cancer 
# load data 
data = load_breast_cancer()
# use pandas data frame to hold the dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X = data.data
# print first five records
# display record shape means number for rows and cloumns
# number of dimentions

Now data is available but this data is human readable format and to train neural network it won’t be useful. Before start train our neural network we need to normalize the data. To normalize the data we are using pandas. We can also use gluon to normalize the dataset.

df_norm = (df - df.mean()) / (df.max() - df.min())

Before training any machine learning algorithm the critical part is the dataset, We need to split the dataset into training and testing dataset. Let us do that

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=12345)

Tuning the hyperparameters is another important aspect in training the artificial neural network.

LEARNING_R = 0.001
EPOCHS = 150

Let us prepare the data for according to gluon API, so that we can feed that data to network and train. To do that we can use mx.gluon.data module

train_dataset = mx.gluon.data.ArrayDataset(X_train.as_matrix(),y_train)
test_dataset = mx.gluon.data.ArrayDataset(X_test.as_matrix(),y_test)
train_data = mx.gluon.data.DataLoader(train_dataset,
                                      batch_size=BATCH_SIZE, shuffle=True)
test_data = mx.gluon.data.DataLoader(test_dataset,
                                     batch_size=BATCH_SIZE, shuffle=False)

Let us use gluons plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers. It has some predefined layers such Dense layer, sequential, etc.

net = gluon.nn.Sequential()
# Define the model architecture
with net.name_scope():
 net.add(gluon.nn.Dense(64, activation="relu"))
 net.add(gluon.nn.Dense(32, activation="relu") ) 
 net.add(gluon.nn.Dense(1, activation="sigmoid"))
# Intitalize parametes of the model
# Add binary loss function, sigmoid binary cross Entropy
binary_cross_entropy = gluon.loss.SigmoidBinaryCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': LEARNING_R})

The neural network contains four layers. We are using ‘relu’ as an activation function. ReLU rectified linear unit is an activation function aka a ramp function. The third layer is (gluon.nn.BatchNorm() ) batch normalisation layer. Another activation function we have used is ‘sigmoid’. The sigmoid function is another linear activation function having a characteristic of S-shaped curve. In the binary classification, the loss function we used is binary cross entropy. It measures the performance of a model whose output is a probability number between 0 and 1. Below is binary cross entropy loss function mathematical formula.

Then gluon.Trainer() to train the model.

Now training time for the model

for e in range(EPOCHS):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(mx.cpu()).astype('float32')
        label = label.as_in_context(mx.cpu()).astype('float32')
        with autograd.record(): # Start recording the derivatives
            output = net(data) # the forward iteration
            loss = binary_cross_entropy(output, label)
        # Provide stats on the improvement of the model over each epoch
        curr_loss = ndarray.mean(loss).asscalar()
    if e % 20 == 0:
        print("Epoch {}. Current Loss: {}.".format(e, curr_loss))

Look at the above loss function graph, its in S-shape. Let us calculate this

print(accuracy_score(y_test, y_pred_labels))

This is the binary classification problem where we have just observed breast cancer data set with input data set and output is either of the two categories malignant or benign.

Multiclass classification:

We had discussed till linear regression problem, where output is single value and that is also a single rational number, then we have seen some of the categorical problem those aka as classification problems. In Classification problems also there are generally two types of a classification problem.

  1. Binary Classification
  2. MultiClass Classification

Binary classification problem means two categories, such as email is spam or not, breast cancer, and based on weather conditions cricket match will get played or not. In all this scenario the output is either(yes/no) of the categories but there is a real-life scenario where you have more than one category those problems are classified as multiclass classification(more than two classes). MultiClass classification aka multinominal classification. In multiclass classification, classifying observation into one of three or more classes. Don’t be confuse with multi-label classification with multiclass classification.

We went into the grocery shop for shopping at the fruit stall you stopped to buy some fruit, you picked your phone are tried your machine learning algorithm to identify a fruit based on color, shape, etc. Classifies the set of images of fruits which may banana, apple, orange, guava, etc. We will use the same logistic regression algorithm to address this multiclass classification problem. Logistic regression is the classic algorithm to solve the classification problem in supervised learning. As we have seen binary classification is quite useful when We have a dataset with two categories like, use it to predict email spam vs. not spam or breast cancer or not cancer. But this is not for every problem. Sometimes we encounter a problem where each observation could belong to one of the n classes. For example, an image might depict a lion, cat or a dog or a zebra, etc.

Let us dive deeper into the multiclass classification problem for this we will use MNIST (Modified National Institute of Standards and Technology ) dataset. This is the handwritten digits dataset. This dataset is widely used to teach deep learning hello world program. The MNIST dataset contains 60,000 training images and 10,000 testing images. MNIST can be a nice toy dataset for testing new ideas it is like a HelloWorld program for an artificial neural network.

Let us makes our hands dirty with gluon multiclass classification implementation.

from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np

Let us import some of the modules that require such as mxnet, gluon, ndArray, autograd for differentiation and numpy.

Set the context, in previous all example we have set is the CPU for simplicity, you can set GPU if you want to execute code on GPU for that you have to install GPU enabled mxnet GLUON API.

( e.g . model_ctx=mx.gpu() ).

data_ctx = mx.cpu()
model_ctx = mx.cpu()

For multiclass classification, we are using the MNIST data set, as part of this we are not explaining what is MNIST data set for more details you can use this link https://en.wikipedia.org/wiki/MNIST_database.

batch_size = 64
num_inputs = 784
num_outputs = 10
num_examples = 60000
def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),
                                      batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                              batch_size, shuffle=False)

Load the dataset number of inputs is 784 and the number of outputs is 10 (number 0,1…,9) with 60000 examples and 64 is the batch size. mx.gluon.data.vision.MNIST module contains the MNIST dataset which is part of gluon API. For training and validation purpose we are splitting data set into two-part testing data set and training data set.

Data is loaded successfully the next step is to define our module. Revise the code of linear regression for binary classification where we defined the Dense layer with the number inputs and outputs. gluon.nn.Dense(num_ouputs) is the defined layer with output shape and gluon inference the input shape from input data.

net = gluon.nn.Dense(num_outputs)

Parameter initialization is the next step but before going to register an initializer for parameters, gluon doesn’t know the shape of the input parameter because we have mentioned the shape of the output parameters. The parameters will get initialized during the first call to the forward method.

net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)

When you need to get the output in probabilities then Softmax cross entropy loss function can be useful

Softmax is an activation layer which allows us to interpret the outputs as probabilities, while cross entropy is we use to measure the error at a softmax layer.

Let us consider below softmax code snippet

# just for understanding.
def softmax(z):
    """Softmax function"""
    return np.exp(z) / np.sum(np.exp(z))

As the name suggests, softmax function is a “soft” version of max function. Instead of selecting one maximum rational value, it breaks the value with maximal element getting the largest portion of the distribution, that’s why it’s very good to get the probabilities of the inputs. From the above code, you will able to get that Softmax function takes an N-dimensional vector of real numbers as an input and transforms it into a vector of real number in range (0,1).

softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()

Now initiate an optimizer with learning rate 0.1. sgd (Stochastic gradient decent)

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

Now the model is trained, but evaluation of the model is required to identify the accuracy. To do this we are using MxNet built-in metric package. We should have to consider accuracy in the ballpark of .10 because of we initialized model randomly.

def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(model_ctx).reshape((-1,784))
        label = label.as_in_context(model_ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]
# call the above function with test data

Now execute the training loop with 10 iterations,

epochs = 10
moving_loss = 0.
for e in range(epochs):
    cumulative_loss = 0
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(model_ctx).reshape((-1,784))
        label = label.as_in_context(model_ctx)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        cumulative_loss += nd.sum(loss).asscalar()
    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % (e, cumulative_loss/num_examples, train_accuracy, test_accuracy))
# output
Epoch 0. Loss: 2.1415544213612874, Train_acc 0.7918833333333334, Test_acc 0.8015
Epoch 1. Loss: 0.9146347909927368, Train_acc 0.8340666666666666, Test_acc 0.8429
Epoch 2. Loss: 0.7468763765970866, Train_acc 0.8524333333333334, Test_acc 0.861
Epoch 3. Loss: 0.65964135333697, Train_acc 0.8633333333333333, Test_acc 0.8696
Epoch 4. Loss: 0.6039828490893046, Train_acc 0.8695833333333334, Test_acc 0.8753
Epoch 5. Loss: 0.5642358363191287, Train_acc 0.8760166666666667, Test_acc 0.8819
Epoch 6. Loss: 0.5329904221892356, Train_acc 0.8797, Test_acc 0.8849
Epoch 7. Loss: 0.5082313110192617, Train_acc 0.8842166666666667, Test_acc 0.8866
Epoch 8. Loss: 0.4875676867882411, Train_acc 0.8860333333333333, Test_acc 0.8891
Epoch 9. Loss: 0.47050906361341477, Train_acc 0.8895333333333333, Test_acc 0.8902

Visualize the prediction

import matplotlib.pyplot as plt
def model_predict(net,data):
    output = net(data.as_in_context(model_ctx))
    return nd.argmax(output, axis=1)
# let's sample 10 random data points from the test set
sample_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                              10, shuffle=True)
for i, (data, label) in enumerate(sample_data):
    data = data.as_in_context(model_ctx)
    im = nd.transpose(data,(1,0,2,3))
    im = nd.reshape(im,(28,10*28,1))
    imtiles = nd.tile(im, (1,1,3))
    print('model predictions are:', pred)

# output of the above code snippet

(10, 28, 28, 1)
model predictions are: 
[3. 6. 7. 8. 3. 8. 1. 8. 2. 1.]
<NDArray 10 @cpu(0)>

From the output of the above program, we can understand our model is able to solve the multiclass classification problem. Multiclass classification problem solved using linear regression algorithm. The activation function we used here is the softmax activation function that will enforce the output should be in the range of (0,1). That allowed us to interpret these outputs as probabilities. Other common names we can use softmax regression and multinomial regression alternatively. In the above example, we have used sgd (stochastic gradient descent)

def SGD(params, lr):
    for param in params:
        param[:] = param - lr * param.grad

Overfitting and regularization:


Till now we have solved regression and classification algorithm and with three different datasets, we achieved almost approximately 90% accuracy over the testing dataset. Sometimes times a model is too closely fit a limited set of data points that time we say its an overfitting error. The above regression and classification algorithm are working fine in the above examples but those are not working for certain of the datasets and running into overfitting they can cause them to perform very poorly. In this section, I would like to explain to you what is overfitting problem and regularization technique that will allow us to reduce this overfitting problem and get this learning algorithm to perform much better.

I find this joke from “Plato and Platypus Walk Into a Bar” does the best analogy to explain this overfitting problem.

“A man tries on a made-to-order suit and says to the tailor, “I need this sleeve taken in! It’s two inches too long!”

The tailor says, “No, just bend your elbow like this. See, it pulls up the sleeve.”

The man says, “Well, okay, but now look at the collar! When I bend my elbow, the collar goes halfway up the back of my head.”

The tailor says, “So? Raise your head up and back. Perfect.”

The man says, “But now the left shoulder is three inches lower than the right one!”

The tailor says, “No problem. Bend at the waist way over to the left and it evens out.”

The man leaves the store wearing the suit, his right elbow crooked and sticking out, his head up and back, all the while leaning down to the left. The only way he can walk is with a choppy, atonic walk.

This suit is perfectly fit that man but it has been overfitted. This suit would neither be useful to him nor to anyone else. I think this is the best analogy to explain this overfitting problem.

Overfitting and underfitting aka overtraining and undertraining and it occurs when an algorithm captures the noise of the data. Underfitting occurs when the model is not fit well enough. Not every algorithm that performs well on training data will also perform well on test data. To identify the overfitting and underfitting using validation and cross-validation data set. Both overfitting and underfitting lead to a poor prediction on the new observations.

Underfitting occurs if the model shows high bias and low variance. Overfitting occurs if the model shows high variance. If we have too many features, the learned model may fit the training set very well but fail to predicate new observations.

Let us ritual our MNIST data set and see how can things go wrong.

from __future__ import print_function
import mxnet as mx
import mxnet.ndarray as nd
from mxnet import autograd
import numpy as np
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
ctx = mx.cpu() 
# load the MNIST data set and split it into the training and testing
mnist = mx.test_utils.get_mnist()
num_examples = 1000
batch_size = 64
train_data = mx.gluon.data.DataLoader(
                               batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(
                               batch_size, shuffle=False)

We are using a linear model with softmax. Allocate the parameter and define the model

# weight
W = nd.random_normal(shape=(784,10))
# bias
b = nd.random_normal(shape=10)
params = [W, b]
for param in params:
def net(X):
    y_linear = nd.dot(X, W) + b
    yhat = nd.softmax(y_linear, axis=1)
    return yhat

Define loss function to calculate average loss and optimizer to optimize the loss. As we have seen this cross entropy loss function and SGD in multiclass classification.

# cross entropy 
def cross_entropy(yhat, y):
    return - nd.sum(y * nd.log(yhat), axis=0, exclude=True)
# stochastic gradient descent 
def SGD(params, lr):
    for param in params:
        param[:] = param - lr * param.grad
def evaluate_accuracy(data_iterator, net):
    numerator = 0.
    denominator = 0.
    loss_avg = 0.
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        output = net(data)
        loss = cross_entropy(output, label_one_hot)
        predictions = nd.argmax(output, axis=1)
        numerator += nd.sum(predictions == label)
        denominator += data.shape[0]
        loss_avg = loss_avg*i/(i+1) + nd.mean(loss).asscalar()/(i+1)
    return (numerator / denominator).asscalar(), loss_avg

Plot the loss function and visualize the model using matplotlib.

def plot_learningcurves(loss_tr,loss_ts, acc_tr,acc_ts):
    xs = list(range(len(loss_tr)))
    f = plt.figure(figsize=(12,6))
    fg1 = f.add_subplot(121)
    fg2 = f.add_subplot(122)
    fg1.set_title('Comparing loss functions')
    fg1.semilogy(xs, loss_tr)
    fg1.semilogy(xs, loss_ts)
    fg1.legend(['training loss', 'testing loss'],fontsize=14)
    fg2.set_title('Comparing accuracy')
    fg2.plot(xs, acc_tr)
    fg2.plot(xs, acc_ts)
    fg2.legend(['training accuracy', 'testing accuracy'],fontsize=14)

Let us iterate.

epochs = 1000
moving_loss = 0.
loss_seq_train = []
loss_seq_test = []
acc_seq_train = []
acc_seq_test = []

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        with autograd.record():
            output = net(data)
            loss = cross_entropy(output, label_one_hot)
        SGD(params, .001)
        # Keep a moving average of the losses
        niter +=1
        moving_loss = .99 * moving_loss + .01 * nd.mean(loss).asscalar()
        est_loss = moving_loss/(1-0.99**niter)
    test_accuracy, test_loss = evaluate_accuracy(test_data, net)
    train_accuracy, train_loss = evaluate_accuracy(train_data, net)
    # save them for later

    if e % 100 == 99:
        print("Completed epoch %s. Train Loss: %s, Test Loss %s, Train_acc %s, Test_acc %s" %
              (e+1, train_loss, test_loss, train_accuracy, test_accuracy))

## Plotting the learning curves
# output
Completed epoch 100. Train Loss: 0.5582709927111864, Test Loss 1.4102623425424097, Train_acc 0.862, Test_acc 0.725
Completed epoch 200. Train Loss: 0.2390711386688053, Test Loss 1.2993220016360283, Train_acc 0.94, Test_acc 0.734
Completed epoch 300. Train Loss: 0.13671867409721014, Test Loss 1.2758532278239725, Train_acc 0.971, Test_acc 0.748
Completed epoch 400. Train Loss: 0.09426628216169773, Test Loss 1.2602066472172737, Train_acc 0.989, Test_acc 0.758
Completed epoch 500. Train Loss: 0.05988468159921467, Test Loss 1.2470015566796062, Train_acc 0.996, Test_acc 0.764
Completed epoch 600. Train Loss: 0.043480587191879756, Test Loss 1.2396155279129744, Train_acc 0.998, Test_acc 0.762
Completed epoch 700. Train Loss: 0.032956544135231525, Test Loss 1.234715297818184, Train_acc 0.999, Test_acc 0.764
Completed epoch 800. Train Loss: 0.0268415825557895, Test Loss 1.2299001738429072, Train_acc 1.0, Test_acc 0.768
Completed epoch 900. Train Loss: 0.022739565349183977, Test Loss 1.2265239153057337, Train_acc 1.0, Test_acc 0.77
Completed epoch 1000. Train Loss: 0.019902906555216763, Test Loss 1.2242997065186503, Train_acc 1.0, Test_acc 0.772

From the above graph, you can easily get how the model is performing. From the above output, you can say at the 700th epoch, the model gives 100% accuracy on a dataset., this means it only able to classify 75% of the test examples accurately and 25% not. This is a clear high variance means overfitting. Methods to avoid overfitting:

  1. Cross-Validation
  2. Drop out
  3. Regularization


In the above section, we can able to identify the problem of overfitting. Now we know the problem and we also know what are the reasons for this. Now let us talk about the solution. In the regularisation, we will keep all the features but reduce the magnitude of parameters. Regularisation keeps the weights small keeping the model simpler to avoid overfitting. The model will have a lesser accurate if it is overfitting.

We have a linear regression to predicate y, given by plenty of x inputs.

y = a1x1 + a2x2  + a3x3 + a4x4 + a5x5.....

In the above equation a1, a2,….. are the coefficients and x1,x2,……are the independent variables to predicate dependent y.

“Regularisation means generalize the model for the better. “

“Mastering the trade-off between bias and variance is necessary to become a machine learning champion.”

Regularization is a scientific technique to discourage the complexity of the model ( reduce magnitude ). It does this by penalizing the loss function. What is mean by penalizing the loss function? Penalizing the weights makes them too small, almost near to zero. It makes those terms near to zero almost negligible and help us to simplify the model

The loss function is the sum of the squared difference between the predicted value and the actual value. ƛ is the regularization parameter which determines how much to penalizes the weights and the right value of ƛ is somewhere between 0 (zero) and large value.

There are few regularisation techniques.

  1. L1 Regularization or Lasso Regularization
  2. L2 Regularization or Ridge Regularization
  3. Dropout
  4. Data Augmentation
  5. Early stopping

We are solving the above overfitting problem using L2 regularisation technique.

Let us implement and solve the overfitting problem.

Penalizes the coefficient

# penalizes the coefficients
def l2_penalty(params):
 penalty = nd.zeros(shape=1)
 for param in params:
 penalty = penalty + nd.sum(param ** 2)
 return penalty

Reinitialize the parameter because for measures.

for param in params:
    param[:] = nd.random_normal(shape=param.shape)

L2 regularised logistic regression,

L2 regularization is the term of the sum of the square of all the features weight. Consider below formula. L2 regularization performs better when all the input features influence the output and all with weights are of approximately equal size.

Let us implement this L2 regularisation.

epochs = 1000
moving_loss = 0.
l2_strength = .1
loss_seq_train = []
loss_seq_test = []
acc_seq_train = []
acc_seq_test = []

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1,784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        with autograd.record():
            output = net(data)
            loss = nd.sum(cross_entropy(output, label_one_hot)) + l2_strength * l2_penalty(params)
        SGD(params, .001)
        # Keep a moving average of the losses
        niter +=1
        moving_loss = .99 * moving_loss + .01 * nd.mean(loss).asscalar()
        est_loss = moving_loss/(1-0.99**niter)

    test_accuracy, test_loss = evaluate_accuracy(test_data, net)
    train_accuracy, train_loss = evaluate_accuracy(train_data, net)
    # save them for later
    if e % 100 == 99:
        print("Completed epoch %s. Train Loss: %s, Test Loss %s, Train_acc %s, Test_acc %s" %
              (e+1, train_loss, test_loss, train_accuracy, test_accuracy))

## Plotting the learning curves

Let us see the graph for more understanding. From the graph, you easily identify the difference between the training loss and testing loss and how values are closer in this graph.


This chapter given is bit insight about the gluon API, ndArray along with inbuilt some of the neural network modules from gluon. With the completion of this chapter, you are now know How to create a simple artificial neural network using gluon abstraction. When to use regression and when to use classification technique along with some real-time dataset.

As a machine learning developer, the major problem we face is the overfitting and underfitting and this chapter gives us the regularisation tool to address this overfitting problem. Gluon is very concise, powerful abstraction to help us to design, prototype, built, deploy and test the machine learning module over GPU and CPU. We can now know how to set the context (GPU, CPU). We have solved classification problems such as binary classification and multiclass classification using logistic regression technique. Let us move on to the next adventure.

CNN (Convolutional Neural network) using Gluon

CNN (Convolutional Neural network) using Gluon


Convolutional Neural Network is deep learning networks, which have achieved an excellent result on images recognition, images classifications. objects detections, face recognition, etc. CNN is everywhere and its most popular deep learning architecture. CNN is majorly used in solving the image data challenge and video analytics too. Any data that has spatial relationships are ripe for applying CNN.

In the previous chapter, we covered the basic machine learning techniques or algorithms to solve regression and classification problem. In this chapter, we will explore the deep learning architecture such as CNN (Convolutional Neural Network). CNN’s are a biologically inspired variant of MLPs. CNN aka ConvNet, in this chapter we will use this term interchangeably. In this chapter, we will explore the below points.

  • Introduction of CNN
  • CNN architecture
  • Gluon API for CNN
  • CNN implementation with gluon
  • Image segmentation using CNN

CNN Architecture:

CNN’s are regularised version of multilayer perceptrons. MLPs are the fully connected neural networks, means each neuron in one layer has a connection to all neuron in the next layer. CNN’s design inspired by vision processing of living organisms. Without conscious effort, we make predictions about everything we see and act upon them. When we see something, we label every object based on what we have learned in the past

Hubel and Wiesel in the 1950s and 1960s showed that the How cat’s visual cortex work. The animal visual cortex is the most powerful visual processing system in existence. As we all know that the visual cortex contains a complex arrangement of cells. These cells are sensitive to small sub-regions of the visual field, called a receptive field. The sub-regions are tiled to cover the entire visual field. These cells act as your local filters over the input space and are well-suited to exploit the strong spatially local correlation present in natural images. This is just a higher level intro of How cortex work. CNN is designed to recognize visual patterns directly from pixel images with minimal preprocessing.

Now let us make things simple, think about how our brain thinks and the human brain is a very powerful machine. Everyone works differently and it’s clear that we all have our own ways of learning and taking in new information. “A picture is worth a thousand words” is an English language adage. It refers to the notion that a complex idea can be conveyed with just a single picture, this picture conveys its meaning or essence more effectively than a description does. We see plenty of images every day, our brain is processing them and store them. But what about the machine, how a machine can understand, process and store meaningful insight from that image. In simple term, each image is an arrangement of a pixel, arranged in a special order. If some order or color get changed that effect the image as well. From the above explanation, you can understand that images in machine represent and processed in the form of pixels. Before CNN’s comes into the form it’s very hard to do image processing. Scientists around the world have been trying to find different ways to make computers to extract meaning from visual data(image, video) for about 60+ years from now, and the history of CV (Computer Vision), which is deeply fascinating.

The most fascinating paper was published by two neurophysiologists — David Hubel and Torsten Wiesel — in 1959 as I mentioned above the paper titled was “Receptive fields of single neurons in the cat’s striate cortex”. This duo ran pretty experiments over a cat. They placed electrodes into the primary visual cortex area of an anesthetized cat’s brain and observed, or at least tried to, the neuronal activity in that region while showing the animal various images. Their first efforts were fruitless; they couldn’t get the nerve cells to respond to anything. After a few months of research, they noticed accidentally they caught that one neuron fired as they were slipping a new slide into the projector. Hubel and Wiesel realized that what got the neuron excited was the movement of the line created by the shadow of the sharp edge of the glass slide.

[Image Source: https://commons.wikimedia.org/wiki/File:Human_visual_pathway.svg]

The researchers observed, through their experimentation, that there are simple and complex neurons in the primary visual cortex and that visual processing always starts with simple structures such as oriented edges. This is the much simpler and familiar explanation. The invention does not happen overnight it took years and its evolutionary process to get the groundbreaking the result.

After Hubel and Wiesel there is nothing happen groundbreaking on their idea for a long time. In 1982, David Marr, a British neuroscientist, published another influential paper — “Vision: A computational investigation into the human representation and processing of visual information”. David gave us the next important insight i.e. vision is hierarchical. David introduced a framework for a vision where low-level algorithms that detect edges, curves, corners, etc., and that are used as stepping stones towards to form a high-level understanding of the image.

David Marr’s representational framework:

  • A Primal Sketch of an image, where edges, bars, boundaries, etc., are represented (inspired by Hubel and Wiesel’s research);
  • A 2½D sketch representation where surfaces, information about depth and discontinuities on an image are pieced together;
  • A 3D model that is hierarchically organized in terms of surface and volumetric primitives.

Davids framework was very abstract and high-level and there is no mathematical modeling was given that could be used in artificial learning. It’s a hypothesis. At the same time, Japanese computer scientist, Kunihiko Fukushima, also developed a framework inspired by Hubel and Wiesel. This method is a self-organizing artificial network of simple and complex cells that could recognize patterns and be unaffected by position shifts. The network is Neocognitron included several convolutional layers and whose receptive fields had weight. Fukushima’s Neocognitron the first ever deep neural network and it is a grandfather of today’s convents. And a few years later in 1989, a French scientist Yann LeCun applied a backpropagation style learning algorithm to Fukushima’s neocognitron architecture. After a few more trails and error and Yann released LeNet-5. LeCun applied his architecture and developed and released a commercial product for reading zip codes. Around 1999, scientist and researchers trying to do visual data analysis using Marr’s proposed method instead of feature-based object recognition.

This is just a brief overview and important milestones we have covered that will help us to understand How CNN was evolved. Let us talk about CNN’s architecture, like an every artificial neural network architecture this also having input, hidden layers and output layer. The hidden layers consist of a series of convolutional layers that convolve with multiplication or other dot product. CNN’s are a specialized kind of neural network for processing data that has a grid like a topology, like time series data, which can be thought as one-dimensional array (vector) grid taking samples at regular time intervals but image data can be thought of as a 2-D grid of pixels (matrix). The name “Convolutional neural network” indicates that the network employs a mathematical operation called convolution. Arranging the image in the 2-D grid of pixels is depending on the whether we are looking at a black and white or color image, we might have either one or multiple numerical values corresponding to each pixel. CNN-based neural network architectures now dominate the field of computer vision to such a level that hardly anyone these days would develop a commercial application or enter a competition or hackathon related to image recognition, object detection, or semantic segmentation, without basing their approach on them. There are so many modern CNN networks owe their designs to inspirations from biology. CNNs are very good in strong predictive performance and tend to be computationally efficient because easy to parallelize and has very fewer inputs as compared to a dense layer. If we use a fully connected neural network to deal with the image recognization then we need a huge number of parameters and hidden layers to address this. let us consider we have an image of 28*28*3 then the total number of weights in the hidden layer will be 2352 and it will lead to overfitting that’s why we are not using a fully connected neural network to process image data.

In the convolutional neural network, the neuron in the layer will be connected to a small region of the layer. CNN the neuron in a layer will only be connected small region of the layer before it, instead of all the neuron in a fully connected network.

The above fig shows the general architecture of CNNs. CNN is a type of feed forward artificial neural network in which the connectivity pattern between the neurons inspired by the animal visual cortex. The basic idea is that some of the neurons from the cortex will fire when exposed horizontal and some cortex will fire when exposed vertically and similarly some will fire when exposed diagonal edges and this the motivation behind the connectivity pattern.

In general, CNN has four layers.

  1. Convolution layer
  2. Max Pooling layer
  3. ReLU layer
  4. Fully connected

The main problem with image data is, images won’t always have the same images. There can be certain deformations. Similarly to how a child recognize objects, we can show a child a dog with black color and we told him this is a dog and on the next day when some other pet with black color comes to our house with four legs He has recognized with dog but actual that is not a dog and its goat. Similarly, we have to show some samples to find a common pattern to identify the objects. We have to show millions of pictures to an algorithm to understand the data and detect the object, with the help of these millions of a records algorithm can generalize the inputs and make predictions for the new observations.

Machine see in a different way than humans do. Their world consists of only 0’s and 1’s. CNNs have a different architecture than regular artificial neural networks. In the regular full connected neural network, we putting the input through the series of hidden layers and reach to the fully connected output layer that represents the predictions. CNNs following a bit different approach. All the layers of CNNs are organized in 3 dimensions: width, height, and depth and neurons in the one layer do not connect to all neurons in the next layer but only the small portion of it and the output layer will be the reduced to a single vector of probability scores, organized along the depth dimension. Below fig, illustrate NN(neural network) vs CNN.

As we said earlier, the output can be a single class or a probability of classes that best describes the image. Now, the hard part is understanding what each of these layers does. Let us understand this.

CNNs have two components

  1. Feature extraction part (The hidden layers): The hidden layer perform a series of convolutions and pooling operations during which the features are detected. If you had a picture of a human face, this is the part of where the network would recognize two eyes, nose, lips, and nose, etc.
  2. The Classification part (Fully connected output layer): As we said last classification layer is fully connected layers will serve as a classifier on top of extracted features.

Convolution layer:

Convolution layer is the main building blocks of CNN, as we said convolution refers to the combination of two mathematical functions to produce a third function. Convolution is performed on the input data with the use of filters or kernels ( filters or kernels term people use interchangeably). Apply filters over the input data to produce a feature map. Convolution is sliding over the input. At each and every location, matrix multiplication is performed and sums the result into the feature map.

Note that in the above example an image is 2 dimensional with width and height (black and white image). If the image is colored, it is considered to have one more dimension for RGB color. For that reason, 2-D convolutions are usually used for black and white images, while 3-D convolutions are used for colored images. Let us start with (5*5) input image with no padding and we use a (3*3) convolution filter to get an output image. In the first step, the filter sliding over the matrix and in the filter each element is multiplied with an element in the corresponding location. Then you sum all the results, which is one output value. Then, you repeat this process the same step by moving the filter by one column. And you get the second output. The step size as the filter slides across the image is called a stride. In this example Here, the stride is 1. The same operation is repeated to get the third output. A stride size greater than 1 will always downsize the image. If the size is 1, the size of the image will stay the same. In the above operation, we have shown you the operation in 2D, but in real life applications mostly, convolutions are performed in a 3D matrix with a dimension for width, height, width. Depth is a dimension because of the colors channels used in an image (Red Green Blue).

We perform a number of convolutions on our input matrix and for each operation uses a different kernel (filter), the result does store in feature maps. All feature maps put into a bucket together as a final output of convolutional layer. CNNs uses ReLU is the activation function and output of the convolution passed through the activation function. As I mentioned early in the paragraph the convolution filter can slide over the input matrix. Stride is the decisive steps in a specified direction. Stride is the size of the step the convolution filter moves each time. In general, people refer to stride value as 1, meaning the filter slides pixel by pixel.

The animation above shows stride size 1. Increasing the stride size, your filter is sliding over the input with a larger gap and thus has less overlap between the cells. The size of the feature map is always less than the input matrix and this leads to shrinking our feature map. To prevent shrinking of our feature map matrix we use padding. Padding means a layer of zero value pixels is added to surround the input with zeros. Padding helps us to improve performance, makes sure the kernel and stride size will fit in the input and also keeping the spatial size constant after performing convolution.

Max Pooling layer:

After the convolution operation, the next operation is pooling layer. Max pooling is a sample-based discretization process. If you can see the first diagram in that after every convolution layer there is max pooling layer. Max pooling layer is useful to controls the overfitting and shortens the training time. The pooling function continuously reduce the dimensionality to reduce the number of parameters and number of computation in the network. Max pooling is done by applying a max filter to usually non-overlapping subregions of the initial representation. It reduces the computational cost by reducing the number of parameters to learn and provides basic translation invariance to the internal representation.

Let’s say we have a 4×4 matrix representing our initial input.
Let’s say, as well, that we have a 2×2 filter that we’ll run over our input. We’ll have a stride of 2 (meaning the (dx, dy) for stepping over our input will be (2, 2)) and won’t overlap regions. For each of the regions represented by the filter, we will take the max of that region and create a new, output matrix where each element is the max of a region in the original input.

Max Pooling takes the maximum value in each window. These window sizes need to be specified beforehand. This decreases the feature map size while at the same time keeping the significant information.

ReLU layer:

The Rectified Linear Unit(ReLU) has become very popular in the last few years. ReLU is activation function similarly we have been using different activation function is a different artificial neural network. Activation function aka transfer function. The ReLU is the most used activation function in the world right now. Since it is used in almost all the convolutional neural networks or deep learning.

The ReLU function is ?(?)=max(0,?). As you can see, the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to zero.

The ReLUs range is from 0 to infinity. ReLUs improve neural networks is by speeding up training. ReLU is idempotent. ReLU is the max function(x,0) with input x e.g. matrix from a convolved image. ReLU then sets all negative values in the matrix x to zero and all other values are kept constant. ReLU is executed after the convolution and therefore a nonlinear activation function like tanh or sigmoid. Each activation function takes a single number and performs a certain fixed mathematical operation on it. In simple words, the rectifier function does to an image like this is remove all the black elements from it keeping only positive value. We expect that any positive value will be returned unchanged whereas an input value of 0 or a negative value will be turned as the value 0. ReLU can allow your model to account for non-linearities and interactions so well. In gluon API we can use ReLU as inbuild implementation from Gluon.

net.add(gluon.nn.Dense(64, activation="relu"))

We can use a simple sample code of the ReLU function.

# rectified linear function
def rectified(x):
  return max(0.0, x)

Fully connected layer:

The fully connected layer is the fully connected neural network layer. This is also referred to as the classification layer. After completion of convolutional, ReLU and max-pooling layers, the classification part consists of a few fully connected layers. The fully connected layers can only accept 1 -Dimensional data. To convert our 3-D data to 1-D, we use the function in Python. This essentially arranges our 3-D volume into a 1-D vector.

This layer gives or returns us the output which is probabilistic value.

Types of CNN Architectures:

In the above section, we explained CNN general architecture but there are different flavors of CNN based some different combinations of layers. Let us try to explore those some useful and famous CNNs architectural style to solve some complex problem. CNNs are designed o recognize the visual patterns with minimal preprocessing from pixel images. The ImageNet project is a large visual database designed for object recognization research. This project runs an annual software contest the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programmer, researcher compete to correctly detect objects. In this section, we are exploring CNN architectures of ILSVRC top competitors.

Let us look into this picture this will give you a broad overview of how evaluation happen.

1. LeNet-5 — Leun et al

LeNet-5 is a 7 layer Convolutional neural network by LeCun et al in 1998. This was deployed in real life financial banking project to recognize handwritten digits on cheques. Image digitized in 32×32 pixel greyscale input images. The ability to process higher resolution images requires larger and more convolutional layers, so this technique is constrained by the availability of computing resources. At that time, the computational capacity was limited and hence the technique wasn’t scalable to large scale images.

2. AlexNet — Krizhevsky et al

AlexNet is a Convolutional neural network by Krizhevsky et al in 2012. It is outperformed significantly in all the prior competitors and won the ILSVRC challenge by reducing the top-5 error loss from 26% to 15.3%. The network was very similar to LeNet but was much more deeper with more filters per layer and had around 60 million parameters.

It consisted of 11×11, 5×5,3×3, convolutions, max pooling, dropout, data augmentation, ReLU activations, SGD with momentum. ReLU activation layer is attached after each every convolutional & fully connected layer except the last softmax layer. The figure certainly looks a bit scary. This is because the network was split into two halves, each trained simultaneously on two different GPUs. AlexNet was trained for 6 days simultaneously on two Nvidia Geforce GTX 580 GPUs. AlexNet was designed by the SuperVision group, consisting of Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever. More simple picture

In AlexNet consist of 5 Convolutional Layers and 3 Fully Connected Layers. These 8 layers combined with two new concepts at that time — MaxPooling and ReLU activation gave their model edge results.

3. ZFNet –

The ILSVRC 2013 winner was also a CNN which is known as ZFNet. It achieved a top-5 error rate of 14.8% which is now already half of the prior mentioned non-neural error rate. They achieved this by tweaking the hyper-parameters of AlexNet while maintaining the same structure with additional Deep Learning elements. As this is similar to AlexNet and have some additional deep learning elements such as dropout, augmentation and Stochastic Gradient Descent with momentum with tweaking the hyperparameters.

4. VGGNet — Simonyan et al

The runner up of 2014 ILSVRC challenge is named VGGNet, because of the simplicity of its uniform architecture, it appeals to a simpler form of a deep convolutional neural network. VGGNet was developed by Simonyan and Zisserman. VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform architecture. The architecture is very much similar to AlexNet with only 3×3 convolutions, but lots of filters. VGGNet Trained on 4 GPUs for 2–3 weeks. The weight configuration of the VGGNet is publicly available and has been used in many other applications and challenges as a baseline feature extractor. VGGNet consists of 138 million parameters, which can be a bit challenging to handle. As the weight configurations are available publicly so, this network is one of the most used choices for feature extraction from images.

VGGNet has 2 simple rules

  1. Each Convolutional layer has configuration — kernel size = 3×3, stride = 1×1, padding = same. The only thing that differs is a number of filters.
  2. Each Max Pooling layer has configuration — windows size = 2×2 and stride = 2×2. Thus, we half the size of the image at every Pooling layer.

5. GoogLeNet/Inception –

The winner of the 2014 ILSVRC competition GoogleNet (Inception v1). achieved a top-5 error rate of 6.67% loss. GoogleNet used an inception module, a novel concept, with smaller convolutions that allowed the reduction of the number of parameters to a mere 4 million. GoogleNet was very close to the human level performance which the organizers of the challenge were now forced to evaluate. Googlenet was inspired by CNN LeNet but implemented a novel element which is nickname an inception module. It is used in batch normalization, image distortions, and RMSprop.

There are two diagrams which are here to understand and visualize GoogleNet very well.

5. ResNet — Kaiming He et al

The 2015 ILSVRC competition brought about a top-5 error rate of 3.57%, which is lower than the human error on top-5. The ResNet (Residual Network) model used by Kaiming He et al at the competition. The network introduced a novel approach called skip connections. Skip connections are also known as gated units or gated recurrent units. this technique they were able to train a NN with 152 layers while still having lower complexity than VGGNet.

It achieves a top-5 error rate of 3.57% which beats human-level performance on this dataset. ResNet has residual connections. The idea came out as a solution to an observation — Deep neural networks perform worse as we keep on adding a layer. The observation brought about a hypothesis: direct mappings are hard to learn. So instead of learning mapping between the output of the layer and its input, learn the difference between them learn the residual.

The Residual neural network uses 1×1 convolutions to increase and decrease the dimensionality of the number of channels.

CNN using Gluon:

As part of this example, we are exploring MNIST data set using CNN. This is the best example to make our hands dirty with Gluon API layer to build CNNs. There four important part we have always consider while building any CNNs.

  1. The kernel size
  2. The filter count (i.e how many filters do we want to use)
  3. Stride (how big steps of the filters)
  4. Padding

Let us deep dive into MNIST using CNN. Recognize handwritten digits using Gluon API using CNNs.

To start with the example we need MNIST data set and need to import some python, gluon module.

import mxnet as mx
import numpy as np
import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn
# Select a fixed random seed for reproducibility
def data_xform(data):
    """Move channel axis to the beginning, cast to float32, and normalize to [0, 1]."""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255
train_data = mx.gluon.data.vision.MNIST(train=True).transform_first(data_xform)
val_data = mx.gluon.data.vision.MNIST(train=False).transform_first(data_xform)

The above code can download MNIST data set at the default location (this could be.mxnet/datasets/mnist/ in the home directory) and creates Dataset objects, training data set (train_data), and validation data set (val_data) for training and validation we need both two datasets. We can use transform_first() method, to moves the channel axis of the images to the beginning ((28, 28, 1) → (1, 28, 28)) and cast them into the float32 and rescales them from [0,255] to [0,1]. The MNIST dataset is very small that’s why we loaded that in memory.

set the context

ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)

Then we need a training data set and validation data set with batch size 1 and shuffle the training set and non-shuffle validation dataset.

conv_layer = nn.Conv2D(kernel_size=(3, 3), channels=32, in_channels=16, activation='relu')

define the convolutional layer in this example we considering 2-D dataset so, this one is 2-D convolutional with ReLU activation function. CNN is a more structured weight representation. Instead of connecting all inputs to all outputs, the characteristic,

# define like a alias
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

We are using softmax cross-entropy as a loss function.

lenet = nn.HybridSequential(prefix='LeNet_')
with lenet.name_scope():
        nn.Conv2D(channels=20, kernel_size=(5, 5), activation='tanh'),
        nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
        nn.Conv2D(channels=50, kernel_size=(5, 5), activation='tanh'),
        nn.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
        nn.Dense(500, activation='tanh'),
        nn.Dense(10, activation=None),

Filters can learn to detect small local structures like edges, whereas later layers become sensitive to more and more global structures. Since images often contain a rich set of such features, it is customary to have each convolution layer employ and learn many different filters in parallel, so as to detect many different image features on their respective scales. It’s good to have a more than one filter and do apply filters in parallel. The above code defines a CNN architecture called LeNet. The LeNet architecture is a popular network known to work well on digit classification tasks. We will use a version that differs slightly from the original in the usage of tanh activations instead of sigmoid.

Likewise, input can already have multiple channels. In the above example, the convolution layer takes an input image with 16 channels and maps it to an image with 32 channels by convolving each of the input channels with a different set of 32 filters and then summing over the 16 input channels. Therefore, the total number of filter parameters in the convolution layer is channels * in_channels * prod(kernel_size), which amounts to 4608 in the above example. Another characteristic feature of CNNs is the usage of pooling, means summarizing patches to a single number. This step lowers the computational burden of training the network, but the main motivation for pooling is the assumption that it makes the network less sensitive to small translations, rotations or deformations of the image. Popular pooling strategies are max-pooling and average-pooling, and they are usually performed after convolution.

lenet.initialize(mx.init.Xavier(), ctx=ctx)
lenet.summary(nd.zeros((1, 1, 28, 28), ctx=ctx))

the summary() method can be a great help, it requires the network parameters to be initialized, and an input array to infer the sizes.

        Layer (type)                                Output Shape         Param #
               Input                              (1, 1, 28, 28)               0
        Activation-1                <Symbol eNet_conv0_tanh_fwd>               0
        Activation-2                             (1, 20, 24, 24)               0
            Conv2D-3                             (1, 20, 24, 24)             520
         MaxPool2D-4                             (1, 20, 12, 12)               0
        Activation-5                <Symbol eNet_conv1_tanh_fwd>               0
        Activation-6                               (1, 50, 8, 8)               0
            Conv2D-7                               (1, 50, 8, 8)           25050
         MaxPool2D-8                               (1, 50, 4, 4)               0
           Flatten-9                                    (1, 800)               0
       Activation-10               <Symbol eNet_dense0_tanh_fwd>               0
       Activation-11                                    (1, 500)               0
            Dense-12                                    (1, 500)          400500
            Dense-13                                     (1, 10)            5010
Parameters in forward computation graph, duplicate included
   Total params: 431080
   Trainable params: 431080
   Non-trainable params: 0
Shared params in forward computation graph: 0
Unique parameters in model: 431080

First conv + pooling layer in LeNet.

Now we train LeNet with similar hyperparameters as learning rate 0.04, etc. Note that it is advisable to use a GPU if possible since this model is significantly more computationally demanding to evaluate and train.

trainer = gluon.Trainer(
    optimizer_params={'learning_rate': 0.04},
metric = mx.metric.Accuracy()
num_epochs = 10
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        inputs = inputs.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        with autograd.record():
            outputs = lenet(inputs)
            loss = loss_function(outputs, labels)
        metric.update(labels, outputs)
    name, acc = metric.get()
    print('After epoch {}: {} = {}'.format(epoch + 1, name, acc))
for inputs, labels in val_loader:
    inputs = inputs.as_in_context(ctx)
    labels = labels.as_in_context(ctx)
    metric.update(labels, lenet(inputs))
print('Validaton: {} = {}'.format(*metric.get()))
assert metric.get()[1] > 0.985

Let us visualize the network accuracy. Some wrong predictions on the training and validation set.

def get_mislabeled(loader):
    """Return list of ``(input, pred_lbl, true_lbl)`` for mislabeled samples."""
    mislabeled = []
    for inputs, labels in loader:
        inputs = inputs.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        outputs = lenet(inputs)
        # Predicted label is the index is where the output is maximal
        preds = nd.argmax(outputs, axis=1)
        for i, p, l in zip(inputs, preds, labels):
            p, l = int(p.asscalar()), int(l.asscalar())
            if p != l:
                mislabeled.append((i.asnumpy(), p, l))
    return mislabeled
import numpy as np
sample_size = 8
wrong_train = get_mislabeled(train_loader)
wrong_val = get_mislabeled(val_loader)
wrong_train_sample = [wrong_train[i] for i in np.random.randint(0, len(wrong_train), size=sample_size)]
wrong_val_sample = [wrong_val[i] for i in np.random.randint(0, len(wrong_val), size=sample_size)]
import matplotlib.pyplot as plt
fig, axs = plt.subplots(ncols=sample_size)
for ax, (img, pred, lbl) in zip(axs, wrong_train_sample):
    fig.set_size_inches(18, 4)
    fig.suptitle("Sample of wrong predictions in the training set", fontsize=20)
    ax.imshow(img[0], cmap="gray")
    ax.set_title("Predicted: {}\nActual: {}".format(pred, lbl))
fig, axs = plt.subplots(ncols=sample_size)
for ax, (img, pred, lbl) in zip(axs, wrong_val_sample):
    fig.set_size_inches(18, 4)
    fig.suptitle("Sample of wrong predictions in the validation set", fontsize=20)
    ax.imshow(img[0], cmap="gray")
    ax.set_title("Predicted: {}\nActual: {}".format(pred, lbl))

Overview of Deep learning with Gluon

Overview of Deep learning with Gluon

This chapter introduces a fundamental concept and jargon that every machine learning engineer and data scientist should know. In this chapter, we will discuss some basic concept of machine learning, deep learning, and AI. Then in subsequent chapters of this book, we will dive deeper and makes hand dirty. Deep learning(DL) has been outbreaking technology for all the industries and its booster for the AI adaptation.

Andrew Ng once said Artificial Intelligence is the new electricity!

AI and DL and ML are used interchangeably but there is a substantial difference between these three. We will start with a brief definition of each one. This chapter will cover basic of the machine learning, deep learning and AI and some foundation terminology to understand the deep learning then we will have a glance over gluon API. We will also cover some part of MXnet Deep learning framework along with Gluon API. This book is for any technical person who wants to get up to speed on machine learning and deep learning quickly. And anyone who is a novice to the technology but who is curious about how the machine thinks and act. In this book, we can dig deeper into the Deep learning neural network using Gluon API and underline deep learning framework is MxNet. Gluon is packaged along with MxNet and it is an abstraction layer over Apache MxNet Deep learning framework. Gluon name was given by subatomic particle. A gluon is an elementary particle that acts as the exchange particle. This book is for the data scientist and machine learning Engineer and aspiring data scientist.

This chapter contains below points,

  • Artificial Intelligence
  • Machine learning
  • Deep learning
  • Neural Network Architectures
  • Gluon API overview and environment setup

Artificial Intelligence(AI)

Artificial intelligence is where the machine will think, act, fail, learn and react without human intervention. Artificial intelligence is the hype now in the industry and there are tons of articles available — they teach us, dream us for future and scare us as well but above all AI the revolutionary technology. The progress which we did in the last couple of years was awesome due to the amount of innovation in computation power and a vast amount of data. At the very highest level, AI is about creating machines capable of solving problems like a human. As a human, we learn through reasoning, intuition, cognitive thinking, and creativity. There are several definitions of AI floating around, my favorite one “the science and engineering of making intelligent machines”.

The history of AI:-

During the second world war, the Germans build the Enigma machine to be used in military communications to send messages securely.
Alan Turing and team built the machine that used to decipher enigma messages.
Cracking the enigma code by a human was very challenging due to the different permutation and combination. The journey of the question of whether can machines think and act like a human or not started much earlier than that. In the early days of AI, machines were able to solve problems that were difficult for humans to solve or the mundane industry work.
There are different aspects of human intelligence and AI. We just want how to mimic human and built an intelligent machine.

In 1956, American computer scientist John McCarthy organized the Dartmouth Conference, at which the term ‘Artificial Intelligence’ was coined first. Researchers Allen Newell and Herbert Simon were instrumental in promoting AI as a field of computer science that could transform the world. The father of AI developed the LISP programming language which becomes important in AI. In 1951, a machine known as Ferranti Mark 1 successfully used an algorithm to master checkers. Subsequently, Newell and Simon developed a General Problem Solver algorithm to solve mathematical problems. It was also in the late 1960s that the first mobile decision-making robot capable of various actions was made. Its name was Shakey.
Shakey could create a map of its surroundings prior to moving. The first ‘intelligent’ humanoid robot, was built in Japan in 1972. In the early days of AI, researcher believe AI could be able to solve the problem by hard-coding a rule-based system like a decision tree.
This AI system aka Symbolic AI and it was very successfully to solve well defined logical problems but it was failed to solve complex problems such as natural language understanding, image detection, understanding scene, Object detection, time-based forecasting.
Over the decade of efforts and well funded global efforts, researchers found it incredibly difficult to create intelligent machine due to different reasons unavailable of computing power, lack of data.
In 1997, IBM’s Deep Blue defeated became the first computer to beat a supreme world chess champion, Garry Kasparov. AI technology continued its march, largely thanks to improvements in computer hardware and people used AI methods in a narrow domain instead of general intelligence that help researchers to solve some complex problem.
Exponential gains in computer processing power and storage ability allowed companies to store vast quantities of data. Today’s AI hits on almost every aspect of human life, from the military and entertainment to our cell phone and driverless cars, from real-time voice translation to a vacuum that knows where and how to clean our floor without you, from our own computer to your doctor’s office. An autonomous (driverless) car, facial recognization for authentication So what where is AI going in the future? Is it scary or not. No one can tell you for sure.

AI-powered machines are usually classified into two groups — general and narrow. The narrow AI machines can perform specific tasks very well, sometimes better than humans
The technology used for classifying images on Airbnb is an example of narrow AI.
AI, DL, and ML fit together.

Machine learning:-

Machine learning is a computer science branch that deals with methods and technique to implement an algorithm. Machine learning is inferential leaning from a descriptive data set.
This era is data mining era. Data is the fuel of the 21st century. If you have data(fuel ) then you can develop an AI system that electrifies your business. In generally while doing programming means we have data and rules and we will expect some result from this. This is one of the paradigms we follow as a programmer. We want to write down a program to convert temperature Fahrenheit to Celsius, to do this we need data values in Fahrenheit and formula for conversation, then with the help of this, we will write down code snippet that fulfills this requirement and result of this code snippet is a temperature in Celsius.
Machine learning has shifted this paradigm, Machine learning will take data and Answers as input and in a result return the rules. As we discussed above Fahrenheit to celsius programming, but we will just think this problem in ML context. We will provide both Fahrenheit and Celsius values and ask the ML program will find out the relation between this, that means find out the formula. This is just a simple example but there are many more complex problems addressed with the help of ML.

There are plenty of definitions are articles available over the internet that can explain to you what is machine learning? When I just fire a query to google wiki, this is the very simple definition of a machine learning I come across.

“Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data.”

more engineering-oriented Definition:

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
 — Tom Mitchell, 1997

Likewise in the above example, we discussed your program should explicitly identify the relationship between Fahrenheit and Celsius value and take action accordingly in the future instead of providing an implicite formula for conversion. Machine learning is not only a science but also its an art. In machine learning data is the challenging part, If we have a data for training algorithm to validate that ML algorithm we need testing data set as well, so in ML we need two data set one if to train Algorithm and another for testing.

Types of Machine learning:

Less is More

1. Supervised Machine learning:

Supervised machine learning is the technique of Inferring a rule from the labeled dataset. Supervised machine learning means machine learning with some amount of human supervision, means we have input data along with output label. In the supervised learning data set is available with expected output this aka label. From data, the machine learning algorithm will understand for which input what is the output. Typical supervised learning address classification, regression problem such as spam filter, prediction of house prices. To train these systems we need a huge amount of data set.

Below are some Supervised Machine learning algorithms

  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVMs)
  • Decision Trees and Random Forests
  • k-Nearest Neighbors
  • Neural networks2

2. UnSupervised Machine learning:

UnSupervised machine learning is the technique of Inferring a rule and find a meaning full pattern from data set. In this type of machine learning, datasets consisting of input data without labeled result. UnSupervised learning with supervision or learning without a teacher. To train unsupervised algorithm the given data are not annotated that mean only input values provided. This technique is useful to group the data or do the clustering and find the common pattern from the data.

Some unsupervised Algorithm

  • Clustering:
    Hierarchical Cluster Analysis (HCA)
    Expectation Maximization
  • Visualization and dimensionality reduction:
    Principal Component Analysis (PCA) — Kernel PCA
    Locally-Linear Embedding (LLE)
    t-distributed Stochastic Neighbor Embedding (t-SNE)
  • Association rule learning:

3. Self-supervised learning:

Self Supervised learning is a very recent technique of machine learning. This is supervised learning but instead of providing labeled data by human as an input, the data set is auto labeled. Self-supervised learning technique as a potential to solve a problem which is not addressed by supervised learning. As I mentioned earlier in the machine learning data set is the challenging thing. To provide a huge amount of labeled data is a very crucial task.

Self-supervised learning is autonomous supervised learning. It is a representation learning approach that eliminates human supervision to label data. Self-supervised learning is very relevant to human because we learn a few things in a supervised manner and few unsupervised ways but we learn from very few examples and generalize exceptionally well.

4. ReInforcement Leaning:

ReInforcement learning is another technique in machine learning. Have you visited a circus ever, in circus ringmaster train the tiger? For tigers, positive behavior ring master can reward him and for negative behavior can be punishment. The way we learn in academia.
Reinforcement learning means the agent will learn to reinforce the way in a particular environment, can be rewarded for positive behavior and get punished for negative behavior.
Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Reinforcement learning is useful in gaming, it’s goal-oriented leaning where an agent can learn how to behave in the environment by performing actions and accumulate maximize reward to reach to the goal.

This is a very interesting analogy used by Yann LeCun to understand this.

“ Most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. “

Deep Leaning:

Deep-learning is the inferring learning approach that human beings use to gain knowledge. Over the past couple of year, deep learning revolutionized the many aspects of research and industry including things like an autonomous driving vehicle, healthcare, reinforcement learning, generative modeling, NLP, robotics, fintech. Deep-learning technique is the part of a broader family of machine learning. Deep learning technique taking inference of how the human brain works. Deep neural networks are in deep structured and hierarchical each level of a hierarchy represent a different level of abstraction. Deep-learning is now become hype because of advancement in hardware and software. Artificial Neural network is the core part of deep-learning. ANN is the inference of taken from human brain neuron. In a human brain, there are millions of neuron present and they are interconnected and there structure and hierarchy are very deep and complex. Deep-learning neural networks are taken inference from the human brain such how human understand the scene or how our cortex work to identify the object, CNN (Convolutional Neural network) is the best example for this one. Before deep-learning technique to Object detection or to detect human face is a very crucial one, you need to extract feature and create a template for the same, such as the detect nose, the left eye, right eye means you need to define every single step to reach an outcome but with the help of deep learning we can understand any scene and object detection has become very easy. The deep neural network has a deep level of neural network in a hierarchical and abstract way to understand the things and finally combine the result.
Deep-learning is a subset of machine learning which takes ML one step further to process and understand data and find meaningful insight.

Our brain consists of a large network of interconnected neurons, which act as a roadway for information to be transmitted from point A to point B. To send different kinds of information from A to B, the brain activates a different set of neurons, and so essentially uses a different route to get from A to B. Biological neurons are interconnected they understand things by an alteration of sending signals. The cell consists of a cell body, with dendrites acting as connecting wires for other neurons to connect to. In most cases, a neuron has one axon capable of transmitting electric currents actively to other connecting cells. The connections between neurons are established using synapses located at the end of the axon. These synapses are responsible for a lot of the magic of computing and memory in the nervous system. The ANN model is modeled after the biological neural network. In the above diagram Just like a biological neuron has dendrites to receive signals, a cell body to process them, and an axon to send signals out to other neurons, the artificial neuron has a number of input channels, a processing stage, and one output that can fan out to multiple other artificial neurons.

Bit history:

Deep-learning evolved all industry but in general deep learning addressed major problems such as speech recognization system, image recognization, Object detection from the image.

The word “deep learning” was first used when talking about Artificial Neural Networks (ANNs) by Igor Aizenberg and colleagues in or around 2000. In deep-learning deep refers to the number layers typically.

1960s: Shallow neural networks
1960–70s: Backpropagation emerges
1974–80: First AI Winter
1980s: Convolution emerges
1987–93: Second AI Winter
1990s: Unsupervised deep learning
1990s-2000s: Supervised deep learning back in vogue
2006s-present: Modern deep learning

Neural Network architectures:

A neural network is designed to solve a complex task, some tasks are more complex to solve but not impossible such as write down a recommendation system based on shopping history. As a programmer, we can write down some sort of hardcoded rules to fulfill this requirement but this is mundane work, so come with machine learning algorithms will help us to explore data and find meaningful insight pattern. Machine learning or AI system comes into the picture where there is more uncertainty, such as

  1. It’s hard to identify the fraudulent transaction in digital money transfer where the end user is not in front of a system its virtual one.
  2. It’s very hard for a machine to detect the pedestrian.

Artificial Neural networks are the first class model to predicate this uncertain result. ANN is the inference of the human brain. ANN is a simulation of the human brain. Neural network architecture is very complex and they are very adaptive and do parallel computation.

Neural network research is motivated by two desires,

  1. Understand the human brain better way,
  2. Mimic human activity and intelligence in computers that can deal with a complex problem.

There is a different architecture of Neural network will address domain specific problem. Human intelligence is generally intelligent. It’s very tough to develop artificial general intelligence to address almost all or some problem. Neural Network architectures are consist of three major layers: the input layer, hidden layers, and the output layer. The number of hidden layers defines the depth of the Neural network architecture.

Below we will check some brief introduction about some ANN.

  1. Perceptrons
  2. Hopfield Neural Network
  3. CNN
  4. Recurrent Neural Networks

Gluon API


Gluon API is the high-level simple, concise and efficient deep learning API. Amazon and Microsoft research group developed Gluon API specification. This is the product of joint effort taken by both leading tech companies to generalize AI for any developer. Gluon is open source deep learning interface, jointly developed by the companies to let developers “prototype, build, train and deploy sophisticated machine learning models for the cloud, devices at the edge and mobile apps.

Gluon is an API, not another deep learning framework, they provided some concise and clear API abstraction layer this helps us to improve speed, flexibility, and accessibility of deep learning technology for all developers, regardless of their deep learning framework of choice. Gluon offers an interface that allows developers to prototype, build, and train deep learning models.

Developers who are new to machine learning will find this interface more familiar to traditional code since machine learning models can be defined and manipulated just like any other data structure. Seasoned data scientists and researchers will value the ability to build prototypes quickly and utilize dynamic neural network graphs for entirely new model architectures, all without sacrificing training speed.

Gluon is imperative for developing but symbolic for deploying.

Before we dive dipper into the Gluon API, we should know at least one of the underline framework on Gluon is rely upon. Gluon is the abstraction layer for deep learning framework such as MxNet.

Distinct Advantages:

  1. Friendly API Simple, Easy-to-Understand Code
  2. Flexible, Imperative Structure
  3. Build graphs on the fly
  4. High-performance operators for training


MXNet is open source deep learning library by Amazon. This founded by U.Washington and Carnegie Mellon U. This is a portable, efficient and scalable deep learning framework. This will support python, javascript, Scala, Julia, and R. The best thing about MXNet is, it allows both imperative(define by run) and symbolic programming. It has a vibrant community backed by Amazon.

Installing Gluon on MacOS:

The Gluon specification has already been implemented in Apache MXNet so we need to install apache MxNet to setup environment. It’s easy to set up an environment for Gluon API using different options such as docker, pip, virtual environment. MxNet is supporting different languages along with different OS platform. I will show here installation for Mac OS.

You can refer this link to do the installation for your respective platform. (https://mxnet.incubator.apache.org/versions/master/install/index.html?platform=MacOS&language=Python&processor=CPU)

By default, MxNet gets installed with CPU but you can also do the installation for GPU enabled mode.

Pip mode

$ pip install mxnet

MXNet offers MKL pip packages that will be much faster when running on Intel hardware.

Using Docker

Docker images with MXNet are available at Docker Hub(https://hub.docker.com/).

Step 1 Install Docker on your machine. For more detail (https://docs.docker.com/docker-for-mac/install/#install-and-run-docker-for-mac)

Step 2 Pull the MXNet docker image.

$ docker pull mxnet/python

Very your docker pull command

$ docker images

I recommend using Python version 3.3 or greater and setup environment using a Jupyter Notebook

# I used minicoda and virtual environment 
# source activate gulons
# optional: update pip to the newest version
sudo pip install --upgrade pip
# install jupyter
pip install jupyter --user
# install the nightly built mxnet
pip install mxnet --pre --user
Default MxNet is come up with CPU you can install GPU as well If you have GPU availability.
pip install mxnet-cu75 --pre --user  # for CUDA 7.5
# for CUDA 8.0 use this mxnet-cu80 --pre --user
#start notebook and enjoy coding
jupyter notebook

Validate the installation:

To validate the installation this is the simple steps

Pip installation validation, start the terminal and type

$ python
$ import mxnet as mx
$ from mxnet import gluon

The same way you can do the validation for Docker setup and starting docker image and executing a bash command.

MXNet should work on any cloud provider’s CPU-only instances. You can Also do setup Gluon and MxNet over any cloud platform. It’s easy to set up over Amazon AWS.

AWS Deep learning AMI (Amazon Machine Images) — Preinstalled Conda environments for Python 2 or 3 with MXNet and MKL-DNN.

Also, MxNet supports different edge platforms such as Raspberry Pi and NVIDIA Jetson Devices.

The architecture of MxNet:

In the above diagram, you can explore the key modules of the MxNet framework and their relation. As stated in the above diagram solid arrow indicate the concrete dependency and dotted line indicate light dependency. In the above modules, lower modules indicated by bluish color are the system modules and high-level modules indicate user-facing modules this is the actual API where the programmer will do interaction. The modules are

KVStore:- key-value store interface for parameter synchronization

Data Loading:- Efficient distributed data loading and augmentation

NDArray:- Dynamic, asynchronous n-dimensional arrays

Symbolic Execution:- Static symbolic graph executor

Symbolic Construction:- provides a way to construct a computation graph

Operators:-Operators that define static forward and gradient calculation

Storage Allocator:- Allocates and recycles memory blocks

Runtime Dependency Engine:-Schedules and executes the operations

Resource Manager:- Manages global resources

Gluon Package

Gluon package comes with four key modules.

  1. Parameter:- Parameter is a basic component. A parameter can hold the weight of blocks. There are two standard API one is Parameter and to manage a set of parameters we have ParameterDict
  2. Containers:- Containers the blocks that will help you to build neural network Containers are the blocks which hold the parameters.
  3. Trainer:- Trainer helps you to do parameter optimization. Trainer applies optimizer over parameters in the containers.
  4. Utilities:- Utilities contains small utils that help us in certain operations, such as split, and rescale dataset for data parallelism.

Gluon APIs:

Gluon API contains below APIs.

  1. Gluon Neural Network Layers API:- Gluon Neural network layer API provides you building blocks of neural network. It contains API to directly add blocks in a neural network, such as Dense layers, Convolution layers, Activation function layer and Max Pooling layer.
  2. Gluon Recurrent Neural Network API:- This API provides building blocks to define the Recurrent Neural Network. This can help us to define RNN with LSTM.
  3. Gluon Loss API:- This API contains different loss function which is required while building a different neural network. This API can help you to calculate mean squared loss or mean absolute loss.
  4. Gluon Data API:- This API is very useful API for people who want to make hands dirty but don’t have a dataset. This API contains dataset utilities and common public datasets.
  5. Gluon Model Zoo:- Gluon model zoo contains pre-trained and pre-defined models that will help us to bootstrap our development.
  6. Gluon Contrib API:- This is for the whom who had mastery in Gluon and Who want to contribute into Gluon API. This API is for the community who wanted to try out some new features and get feedback.

Deep learning Programming style:

One of my favorite things about Gluon API is that it offers multiple levels of abstraction so you can choose the right one for your project. Gluon offers two styles to create your neural network. First one is symbolic style or Declarative style and the second one is imperative style.
These are the two-deep learning programming style. Each one has there own pros and cons, that’s why almost all the deep learning framework offers both styles of programming.

Imperative Programming:

Imperative programming means define by run means dynamic programming. The part of the computation graph constructed at the run time. Imperative programming is flexible and straightforward. In this programming, we can take advantage of language native features such as iteration, condition, debugger, etc.
Imperative style is nothing new for you the way you are writing Numpy code is the imperative style of programming. Imperative style programs perform operations directly.
Most of the Python code shows an imperative form, for example, the following Numpy code. In this style of programming, the state of the program is getting changed.

import numpy as np
a = np.ones(20)
b = np.ones(20) * 2
c = b * a
d = c + 1

Here is above code snippet When we issue c = b * a command to run the program, the actual operation is getting executed.


  1. straightforward and flexible because of execution flow with a programming language.
  2. Take advantage of native language features


  1. Manual optimization
  2. Not efficient in terms of memory usage and speed.

Symbolic Style of programming:

Symbolic programming aka declarative programming it’s contrary to imperative programming style. In this style of programming execution performed after the computational process fully defined. In this paradigm you need to first define and then run, this is a status computation graph. This is the immutable graphs this is not changing at run time. Symbolic-style programs include compilation steps either explicitly or implicitly, this converts the graph into the function that actually getting called any time. In this style of programming, we can just define a function with a placeholder value and after this, you can compile the function and evaluate it with the actual input. Below is a code snippet, converting above imperative code to symbolic code In the symbolic programming generally requires three steps:

#Step 1:- Define the computation graph.
a = Variable('A')
b = Variable('B')
c = b * a
d = c + Constant(1)
#Step 2:- Compile the computation process into an executable program.
f = compile(d)
#Step 3:- Provide the required inputs and call on the compiled #program for execution.
g = f(a=np.ones(20), b=np.ones(20)*2)

In this code snippet, the c = b * a does not actually perform the operation, instead, this will generate the computation graph that represents this computation process.
Following computation, a graph is generated for operation d.


  1. Infer optimization automatically from the dependency graph.
  2. Memory reuse opportunities.
  3. More efficient and easier to port.


  1. Less flexible

Hybrid Programming style:

Gluon comes up with hybrid programming style and its the positive point for this, in the above description you can not conclude which programming style is good in deep learning.
Gluons hybrid approach give us more flexibility to harness the benefits of both imperative and symbolic programming. User should imperative programming to build and test a prototype on the fly and while deploying or serving in production, we can convert a program into symbolic programming to achieve product level computing performance.
This was possible due to gluon API hybrid programming.

In the hybrid programming, we can build models using either the HybridBlock or the HybridSequential Gluon API classes. By default, Gluon API uses the Block or Sequential Block classes same that is used in imperative programming. When we call hybridize function,
then Gluon will convert programs execution into symbolic programming style.

Let us take a small example of Hybrid programming.

import mxnet as mx
from mxnet import nd
a = mx.nd.zeros((120,60))
b = mx.nd.zeros((120,60))
c = a + b
c += 1
print (c)
improt mxnet as mx
from mxnet import nd
net = mx.sym.Vairable('data')
net = mx.sym.FullyConnected(data=net, num_hidden=10)
net = mx.sym.SoftmaxOutput(data=net)
texec = mx.module.Module(net)

The NDArray API:

In this section, we will introduce the NDArray API. In the MxNet NDArray API is the primary tool to store, transform and manipulate data. This is the core data structure for all computation. NDArray is the multi-dimensional array similar to a Numpy. The NDArray represent the multi-dimensional, fixed size homogenous array. Basically, NDArray provides API to imperative tensor operations. The mxnet.ndarray is similar to numpy.ndarray but not very similar there is some difference.

Array creation:-

We can create NDArray using python tuple or list with NDArray array function.

import mxnet as mx 
from mxnet import nd
# create a 1D array with a python list 
x = mx.nd.array([4,3,9]) 
# create a 2D array with a nested python list 
z = mx.nd.array([[4,3,6], [5,1,8]]) 
#display the array
{'x.shape':x.shape, 'z.shape':z.shape}

We can also create NDArray using numpy.array API.

# import numpy package
import numpy as np
from mxnet import nd
# create numpy array
d = np.arange(15).reshape(3,5)
# create a 2D array from a numpy.ndarray object
y = mx.nd.array(d)
# display array

We can specify data which is optional dtype while creating of NDArray. By default, float32 is used. We can also create NDAaray with placeholder with the help of different function such as zeros, ones, etc. NDArray also offers generally all the API that are required to manipulate the data such as slicing, indexing, shape, basic arithmetic, copies, reduce, etc.

# basic operatiosn of NDArray
# float32 is used by default
a = mx.nd.array([1,2,3])
# create a 16-bit float array
c = mx.nd.array([1.2, 2.3], dtype=np.float16)
(a.dtype, c.dtype)
# create empty array
d = mx.nd.empty((2,3))
# create array with all zeros
e = mx.nd.zeros((2,3))
# create array with all 5
f = mx.nd.full((2,3),5)
# we can also perform some basic operations
# elementwise plus
g = a+ b
# elementwise minus
h = c-d
i = -e
# we can use sum or mean 
j = mx.nd.sum(e)
# exponential
# transpose matrix
# for advanced way

NDArray has some key advantages First, NDArrays support asynchronous mathematical computation on CPU, GPU, and distributed cloud architectures. Second, they provide support for automatic differentiation. These properties make NDArray vital choice for deep learning. As we saw we can create vector, matrix, and tensor and manipulate with the help of NDArray.

We can convert NDArray to Numpy if you have some scenarios and instead of NDArray if you want to use Numpy array we can use, the conversion is easy.

Note:- converted array does not share memory.

# convert x into numpy z array
z = x.asnumpy()
# display type of z for verification (type(z), z)
# display numpy array as a NDArray.

The Symbol API:

In the previous section, we learned about the NDArray to store and manipulate the data. In this section, we will be exploring the symbol API. Symbol API is the basic interface for symbolic programming. Symbolic API are following declarative approach, instead of executing program step by step you need to first define computation graph, computation graph contains the placeholder for inputs and desired output. Gluon API taking advantage of this approach under the hood before hybridization. Your computation graph is a composition of symbols, operators, network layers. With the symbolic API, we can optimize the computation graph. Symbolic API uses a small memory footprint because we can recycle memory from intermediate steps. NDArray allows writing a program in an imperative fashion but symbolic API allows writing a program in a declarative fashion. But most of the operators supported by NDArray also supports symbol API. A symbol means a multi-output symbolic expression

We will just build a simple example of a+b its symbolic API we need to declare placeholder for this using mx.sym.Variable, give them name as a and b respectively.

import mxnet as mx
a = mx.sym.Variable('a')
b = mx.sym.Variable('b')
c = 3 * a + b
# output 

Symbol API also supports a rich set of neural network API with the help of those we can define neural networks as well.

First Gluon Example:

Create a simple neural network layer using the gluon nn package.

# import ndarray module from mxnet package
from mxnet import nd
# import gluon package
from mxnet.gluon import nn
# let us define layer Dense is a subclass of Block to define layer
layer = nn.Dense(2)
# we need to initialise the weight [-0.7,0.7]
# random (3,4) matrix range from -1 to 1
x = nd.random.uniform(-1,1,shape=(3,4))
#  print weight data
# collect the parameters
# type of params collected from layer

In this example, we just saw How to define simple layer using Gluon API.


In this chapter, we introduced some of the fundamental concepts such as Artificial intelligence, deep learning, machine learning, and Gluon API along with MxNet.

It consists of different machine learning types and deep learning techniques and most recent research in machine learning such as self-supervised learning. Deep learning is achieved by just adding more layers as a hidden layer this is possible because of the availability of huge data and advancement is computation. With the help of different deep learning framework and cloud computing now these techniques are available to any software engineer on a fingertip.

In this chapter, we begin our journey into deep learning using Gluon API. Introduction of Gluon API with different deep learning programming paradigm. This chapter ended with the installation of Gluon, environment setup and few small API examples. Let us ready with Gluon API tool to conquer the deep learning world.

Progurad for android kotlin

Progurad for android kotlin

For more stories.

Something interesting for #AndroidDev

If already new to Proguard read my this two article which will help you to enable proguard for any android project.

  1. Enabling Proguard for Android
  2. How to de-obfuscate stack Trace is here?
This is it.

Google has announced to official kotlin support for Android development. which is really a very cool language. If you haven’t try now.

This is the link sample Project AndroidWithKotlin.

Now in this post, we will look into How we can use progurad in kotlin android project.

Before deep dive here I will recommend you to read my first article Enabling Proguard for Android, then move on.

Enabling proguard is the same process for any kind of the android project.

If you enable proguard for the android project and if you are using some of the kotlin extension libraries then you will come across some issues.

jackson-kotlin-module that provides deserializing kotlin classes and data classes which is a really cool feature.

The fix is here:-

This rule will help you to keep your annotation classes and it won’t warn for reflection classes.

-dontwarn kotlin.**
-dontwarn kotlin.reflect.jvm.internal.**
-keep class kotlin.reflect.jvm.internal.** { *; }

If you have an issue with kotlin MetaData. Especially in case of Jackson kotlin module.

-keep class kotlin.Metadata { *; }
-keepclassmembers public class com.mypackage.** {
    public synthetic <methods>;
-keepclassmembers class kotlin.Metadata {
    public <methods>;

For enum

-keepclassmembers class **$WhenMappings {

The consolidated rule for kotlin android project.

-keep class kotlin.** { *; }
-keep class kotlin.Metadata { *; }
-dontwarn kotlin.**
-keepclassmembers class **$WhenMappings {
-keepclassmembers class kotlin.Metadata {
    public <methods>;
-assumenosideeffects class kotlin.jvm.internal.Intrinsics {
    static void checkParameterIsNotNull(java.lang.Object, java.lang.String);

Above rule will sufficient for kotlin android Project. But if you are using different libraries in your project then you have to add those specific configuration based on the error and warnings.

for Moshi

-keep class kotlin.reflect.jvm.internal.impl.builtins.BuiltInsLoaderImpl

which will keep only the no-arg constructor of the service defined in META-INF/services/kotlin.reflect.jvm.internal.impl.builtins.BuiltInsLoader

Thanks, Jake Wharton for this config.

To Learn more about kolin,


I have been following this path, love to hear more from you.

For more stories.

Lets connect on Stackoverflow , LinkedIn , Facebook& Twitter.

De-obfuscate stack traces!!?

De-obfuscate stack traces!!?

For more stories.

Enabling Proguard for android In this article I had told How to obfuscate the code. If you don’t know How obfuscate the code, not data be clear? Proguard is obfuscating the code.

A good engineer thinks in reverse and asks himself about the stylistic consequences of the components and systems he proposes. — — Helmut Jahn

When we obfuscate the code the class names and method names are converted into some random obfuscated names such as a,b,c, etc. The problem is encounter when some exception raised or application is crashed How to decode that stack trace that is the big question developer having.

Exception, errors and Bugs are fact of developers life. So many developers requested and asked me How to de-obfuscate the stack trace? I answered couple of developers but any email you’ve written twice should be a blog post.

where there is a will there is a way.

Once proguard shrinks your code, reading obfuscated stack trace is difficult because filed, method and class names are obfuscated. Fortunately to help developers Proguard provide you mapping file. Every time you run and obfuscate the code, proguard is generating a mapping.txt file.

ProGuard outputs the following files:

dump.txt:- Describes the internal structure of all the class files in the APK.

mapping.txt:- Provides a translation between the original and obfuscated class, method, and field names.

seeds.txt:- Lists the classes and members that were not obfuscated.

usage.txt:- Lists the code that was removed from the APK.

These files are saved at <module-name>/build/outputs/mapping/release/

Mapping.txt file ? :

Mapping file is just simple text file which contains original field, method, and class names mapped to the obfuscated names. You will locate this file at below path app <modulename>/build/outputs/mapping/release/mapping.txt directory.

Note :– Each time you run to create a release build with ProGuard mapping.txt file is overwritten. So retain the mapping.txt file for every version, better keep the mapping.txt file along with the code.

Command to de-obfuscation

retrace.bat|retrace.sh [-verbose] mapping.txt [<stacktrace_file>]

for e.g

retrace.bat -verbose mapping.txt obfuscated_stack_trace.txt

Sample mapping.txt file

com.apothesource.hidingpasswords.HidingUtil -> com.apothesource.hidingpasswords.a: java.lang.String hide(java.lang.String) -> a java.lang.String unhide(java.lang.String) -> b void doHiding(byte[],byte[],boolean) -> acom.apothesource.hidingpasswords.MainActivity -> com.apothesource.hidingpasswords.MainActivity: byte[] mySlightlyCleverHidingKey -> a java.lang.String[] myCompositeKey -> b void onCreate(android.os.Bundle) -> onCreate boolean onCreateOptionsMenu(android.view.Menu) -> onCreateOptionsMenu boolean onOptionsItemSelected(android.view.MenuItem) -> onOptionsItemSelected

How to use it in Production?

In every developer everything is working perfectly on local but not on production.

While uploading APK file on play store you can upload the mapping.txt file.

Important Not from Google:- Once you’ve uploaded a mapping file for a version of your app, only future crashes for that version of your app will be deobfuscated. Crashes for a version of your app that happen before you’ve uploaded its respective mapping file won’t be deobfuscated.

  1. Sign in to your Play Console.
  2. Select your app.
  3. From the left menu, click Android vitals > Deobfuscation files.
  4. Next, to a version of your app, click Upload to upload mapping.txt file.
  5. Upload the ProGuard mapping.txt file for the version of your app.

Once everything works fine then go and Select a crash & ANR and On the ‘Stack Traces’ tab, you’ll see your deobfuscated stack traces.

You can achieve the same thing with the help of developer API as well, details are here.

Reference links:-



Let me know your way to de-obfuscate the stack trace.

For more stories.

Lets connect on Stackoverflow , LinkedIn , Facebook& Twitter.