0% found this document useful (0 votes)
80 views

D.Y. Patil College of Engineering, Akurdi Department of Electronics & Telecommunication Engineering

The document discusses implementing a multilayer perceptron (MLP) neural network using backpropagation for training. It describes the architecture of a backpropagation network, which consists of multiple fully connected layers of nodes with nonlinear activation functions. The backpropagation algorithm repeats a two-phase cycle of propagation and weight update. When an input is presented, it propagates forward through the network. The output is then compared to the desired output using a loss function to calculate errors for output neurons. These errors are then propagated backwards through the network to update weights, in order to minimize loss through multiple iterations. The aim is to implement and test an MLP trained with this backpropagation algorithm.

Uploaded by

P S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

D.Y. Patil College of Engineering, Akurdi Department of Electronics & Telecommunication Engineering

The document discusses implementing a multilayer perceptron (MLP) neural network using backpropagation for training. It describes the architecture of a backpropagation network, which consists of multiple fully connected layers of nodes with nonlinear activation functions. The backpropagation algorithm repeats a two-phase cycle of propagation and weight update. When an input is presented, it propagates forward through the network. The output is then compared to the desired output using a loss function to calculate errors for output neurons. These errors are then propagated backwards through the network to update weights, in order to minimize loss through multiple iterations. The aim is to implement and test an MLP trained with this backpropagation algorithm.

Uploaded by

P S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 41

D.Y.

Patil College of Engineering, Akurdi


Department of Electronics & Telecommunication Engineering

Subject : Machine Learning Class: B.E.(Elective 3)


Date:-

Experiment No:- 01

Title: Simple logic network using MP neuron model.

Aim: Implement simple logic network using MP neuron model

Theory:

The structure of artificial neural networks was based on the present understanding of
biological neural systems. The structure of biological neuron is shown in figure(1). The
computation is achieved by dense interconnection of simple processing units. To describe the
attributes of computing, the artificial neural networks go by many names such as
connectionist models, parallel distributed processors, or self-organizing system. With such
features, an artificial neural system has great potential in performing applications such as
speech and image recognition where intense computation can be done in parallel and the
computational elements are connected by weighted links.

The artificial neuron, the most fundamental computational unit, is modeled based on the basic
property of a biological neuron. This type of processing unit performs in two stages:
weighted summation and some type of nonlinear function. It accepts a set of inputs to
generate the weighted sum, then passes the result to the nonlinear function to make an output.
Unlike conventional computing systems, which have fixed instructions to perform specific
computations, the artificial neural network needs to be taught and trained to function

1
correctly. The advantage is that the neural system can learn new input-output patterns and
adjust the system parameters. Such learning can eliminate specifying instructions to be
executed for computations. Instead, users simply supply appropriate sample input-output
patterns to the network.
Following table shows the associated Terminologies of Biological and Artificial Neural Net
Biological Neural Network Artificial Neural Network
Cell Body Neurons
Dendrite Weight interconnections
Soma Net input
Axon Output

The McCulloch-Pitts Model of Neuron:

The early model of an artificial neuron is introduced by Warren McCulloch and Walter Pitts
in 1943. The McCulloch-Pitts neural model is also known as linear threshold gate. It is a
neuron of a set of inputs ( I1, I2, I3, …., Im) and one output y. The linear threshold gate
simply classifies the set of inputs into two different classes. Thus the output y is binary. Such
a function can be described mathematically using these equations:

and y = f(sum)

W1,W2,W3…Wm are weight values normalized in the range of either (0,1) or (-1,1) and
associated with each input line, sum is the weighted sum, and T is a threshold constant. The
function f is a linear step function at threshold T as shown in figure (2). The symbolic
representation of the linear threshold gate is shown in figure (3).

Figure 2: The Threshold function

2
Figure 3: Symbolic Illustration of Linear Threshold Gate

This model is so simplistic that it only generates a binary output and also the weight and
threshold values are fixed.

Conclusion:-

Programm:

3
import numpy as np

defAND(x1, x2):
x = np.array([1, x1, x2])
w = np.array([-1.5, 1, 1])
y = np.sum(w*x)
if y <= 0:
return 0
else:
return 1

defOR(x1, x2):
x = np.array([1, x1, x2])
w = np.array([-0.5, 1, 1])
y = np.sum(w*x)
if y <= 0:
return 0
else:
return 1

defNAND(x1, x2):
x = np.array([1, x1, x2])
w = np.array([1.5, -1, -1])
y = np.sum(w*x)
if y <= 0:
return 0
else:
return 1

if __name__ == '__main__':
input = [(0, 0), (1, 0), (0, 1), (1, 1)]

print("AND")
for x in input:
y = AND(x[0], x[1])
print(str(x) + " -> " + str(y))

print("OR")
for x in input:
y = OR(x[0], x[1])
print(str(x) + " -> " + str(y))

print("NAND")
for x in input:
y = NAND(x[0], x[1])
print(str(x) + " -> " + str(y))

Output:
AND
4
(0, 0) -> 0
(1, 0) -> 0
(0, 1) -> 0
(1, 1) -> 1
OR
(0, 0) -> 0
(1, 0) -> 1
(0, 1) -> 1
(1, 1) -> 1
NAND
(0, 0) -> 1
(1, 0) -> 1
(0, 1) -> 1
(1, 1) -> 0

D.Y. Patil College of Engineering, Akurdi

5
Department of Electronics & Telecommunication Engineering

Subject : Machine Learing Class: B.E.(Elective 3)


Date:-

Experiment No:- 02

Title: Linear regressor with a single neuron model.

Aim: Implement a simple linear regressor with a single neuron model.

Theory:

A single layer neural net is shown in figure(1).

The net input is calculated as


yin= b+x1w1+x2w2+…+xmwm
The separating line for which the boundary lies between the value x1 and x2, so that net gives
a positive response on one side and negative response on other side, is given as
b+ x1w1+x2w2=0
If weight w2 is not equal to 0 then we get
x2= -(w1/w2) x1-(b/w2)
Thus the requirement for positive response of the net is given by
b+ x1w1+x2w2>0
During training process, the value of w1,w2,and b are determined so that the net will produce a
positive response for the training data. If on the other hand, threshold value is being used,
then the condition for obtaining the positive response from the output unit is
6
Net input received > Ɵ
Yin> Ɵ
x1w1+x2w2> Ɵ
then the separating line equation is given by
x1w1+x2w2= Ɵ
x2= -(w1/w2) x1-( Ɵ /w2)…….(with w2≠0)

Figure(2): Decision Boundry Line

Conclusion:-

## Experiment no 2 ##
## implement simple linear regression model

# import the python libraries

7
import pandas as pd #making data
frames
import numpy as np # for numeric
function
import matplotlib.pyplot as plt #ploting the graph
from sklearn.linear_model import LinearRegression # import linear
regression function
mydata=pd.read_csv("linreg.csv") # import CSV file
print(mydata) # print data from
CSV file

x=mydata.iloc[:,:-1].values #get column value


to variable
y=mydata.iloc[:,1].values

plt.scatter(x,y,color='red') #plot scatter value


plt.title('test data')
plt.xlabel('area in sq feet')
plt.ylabel('price of home')
plt.show
model=LinearRegression() #linear regression
model
model.fit(x,y) # apply model to
data
plt.scatter(x,y)
plt.plot(x,model.predict(x))
plt.show

Output:

runfile('E:/ML/exp2/linea.py', wdir='E:/ML/exp2')
area price
0 500 32
1 570 36
2 590 38
3 682 41
4 729 46
5 680 50
6 825 53
7 900 58
8 850 61
9 1000 66

8
9
D.Y. Patil College of Engineering, Akurdi
Department of Electronics & Telecommunication Engineering

Subject : Machine Learning Class: B.E.(Elective 3)


Date:-

Experiment No:- 03

Title: Back-Propagation Algorithm.

Aim: Implement and test MLP trained with back-propagation algorithm.

Theory:

A multilayer perceptron (MLP) is a feedforward artificial neural network model that maps
sets of input data onto a set of appropriate outputs. An MLP consists of multiple layers of
nodes in a directed graph, with each layer fully connected to the next one. Except for the
input nodes, each node is a neuron (or processing element) with a nonlinear activation
function. MLP utilizes a supervised learning technique called back-propagation for training
the network. MLP is a modification of the standard linear perceptron and can distinguish data
that are not linearly separable.
The architecture of back propagation network is shown below:

The backward propagation Networks:


It is a common method of training artificial neural networks and used in conjunction with an
optimization method such as gradient descent. The algorithm repeats a two phase cycle
propagation and weight update. When an input vector is presented to the network, it is
propagated forward through the network, layer by layer, until it reaches the output layer. The
output of the network is then compared to the desired output, using a loss function, and an
error value is calculated for each of the neurons in the output layer. The error values are then
10
propagated backwards, starting from the output, until each neuron has an associated error
value which roughly represents its contribution to the original output.
Back propagation uses these error values to calculate the gradient of the loss function with
respect to the weights in the network. In the second phase, this gradient is fed to the
optimization method, which in turn uses it to update the weights, in an attempt to minimize
the loss function.
The importance of this process is that, as the network is trained, the neurons in the
intermediate layers organize themselves in such a way that the different neurons learn to
recognize different characteristics of the total input space. After training, when an arbitrary
input pattern is present which contains noise or is incomplete, neurons in the hidden layer of
the network will respond with an active output if the new input contains a pattern that
resembles a feature that the individual neurons have learned to recognize during their
training.
Back-propagation requires a known, desired output for each input value in order to calculate
the loss function gradient – it is therefore usually considered to be a supervised learning
method; nonetheless, it is also used in some unsupervised networks such as autoencoders. It
is a generalization of the delta rule to multi-layered feedforward networks, made possible by
using the chain rule to iteratively compute gradients for each layer. Backpropagation requires
that the activation function used by the artificial neurons (or "nodes") be differentiable.

Modes of learning:

There are two modes of learning to choose from: stochastic and batch.
In stochastic learning, each propagation is followed immediately by a weight update. In batch
learning many propagations occur before updating the weights, accumulating errors over the
samples within a batch.

Limitations

Gradient descent can find the local minimum instead of the global minimum
 Gradient descent with backpropagation is not guaranteed to find the global minimum
of the error function, but only a local minimum; also, it has trouble crossing plateaux
in the error function landscape.
 Backpropagation learning does not require normalization of input vectors; however,
normalization could improve performance.

11
Gradient descent can find the local minimum instead of the global minimum.

Conclusion:-

12
Exp no 3
import numpy as np

def sigmoid(x):
return 1.0/(1.0 + np.exp(-x))

def sigmoid_prime(x):
return sigmoid(x)*(1.0-sigmoid(x))

def tanh(x):
return np.tanh(x)

def tanh_prime(x):
return 1.0 - x**2

class NeuralNetwork:

def __init__(self, layers, activation='tanh'):


if activation == 'sigmoid':
self.activation = sigmoid
self.activation_prime = sigmoid_prime
elif activation == 'tanh':
self.activation = tanh
self.activation_prime = tanh_prime

# Set weights
self.weights = []
# layers = [2,2,1]
# range of weight values (-1,1)
# input and hidden layers - random((2+1, 2+1)) : 3 x 3
for i in range(1, len(layers) - 1):
r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) -1
self.weights.append(r)
# output layer - random((2+1, 1)) : 3 x 1
r = 2*np.random.random( (layers[i] + 1, layers[i+1])) - 1
self.weights.append(r)

def fit(self, X, y, learning_rate=0.2, epochs=100000):


# Add column of ones to X
# This is to add the bias unit to the input layer
ones = np.atleast_2d(np.ones(X.shape[0]))
X = np.concatenate((ones.T, X), axis=1)

for k in range(epochs):
i = np.random.randint(X.shape[0])
a = [X[i]]

for l in range(len(self.weights)):
dot_value = np.dot(a[l], self.weights[l])
activation = self.activation(dot_value)
a.append(activation)

13
# output layer
error = y[i] - a[-1]
deltas = [error * self.activation_prime(a[-1])]

# we need to begin at the second to last layer


# (a layer before the output layer)
for l in range(len(a) - 2, 0, -1):
deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))

# reverse
# [level3(output)->level2(hidden)] => [level2(hidden)->level3(output)]
deltas.reverse()

# backpropagation
# 1. Multiply its output delta and input activation
# to get the gradient of the weight.
# 2. Subtract a ratio (percentage) of the gradient from the weight.
for i in range(len(self.weights)):
layer = np.atleast_2d(a[i])
delta = np.atleast_2d(deltas[i])
self.weights[i] += learning_rate * layer.T.dot(delta)

if k % 10000 == 0: print 'epochs:', k

def predict(self, x):


a = np.concatenate((np.ones(1).T, np.array(x)), axis=1)
for l in range(0, len(self.weights)):
a = self.activation(np.dot(a, self.weights[l]))
return a

if __name__ == '__main__':

nn = NeuralNetwork([2,2,1])
X = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = np.array([0, 1, 1, 0])
nn.fit(X, y)
for e in X:
print(e,nn.predict(e))

14
Output:

epochs: 0
epochs: 10000
epochs: 20000
epochs: 30000
epochs: 40000
epochs: 50000
epochs: 60000
epochs: 70000
epochs: 80000
epochs: 90000
(array([0, 0]), array([ 9.14891326e-05]))
(array([0, 1]), array([ 0.99557796]))
(array([1, 0]), array([ 0.99707463]))
(array([1, 1]), array([ 0.00090973]))

15
D.Y. Patil College of Engineering, Akurdi
Department of Electronics & Telecommunication Engineering

Subject : Machine Learning Class: B.E.(Elective 3)


Date:-

Experiment No:- 04

Title: Radial Basis function Network.

Aim: Implement and test Radial Basis function (RBF) Network.

Theory:

A radial basis function network is an artificial neural network that uses radial basis
functions as activation functions. The output of the network is a linear combination of radial
basis functions of the inputs and neuron parameters. Radial basis function networks have
many uses, including function approximation, time series prediction, classification, and
system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both
researchers at the Royal Signals and Radar Establishment.
RBF network is used for approximating functions and recognizing patterns. It uses Guassian
Potential Functions. Powell has used radial basis functions in exact interpolation. In
interpolation we have n data points xi Rd , and n real valued numbers ti R, where i=1,…n.
The task is to determine a function S in linear space such that’ S(xi) = ti , i=1,…n. The
interpolation function is a linear combination of basis functions.

As basis function bi, radial basis function of the form is,

Vi(x) = Φ( || x-xi || )

Where Φ is mapping R+ →R and the norm is Euclidean distance. The network uses the most
common non-linearity such as sigmoidal and Guassian kernel functions. The Guassian
functions are also used in regularization networks. The response of such a function is positive
for all values of y; the response decrease to 0as |y| →0. The guassian function is generally
defined as

16
f(y)=e-y2
the derivative of this function is given by,

f’(y)=-2y e-y2=-2yf(y)

The graphical representation of this Guassian function is shown in figure

When the Guassian potential functions are being used, each node is found to produce an
identical output for inputs existing within the fixed radial distance from the center of the
kernel, they are found to be radically symmetric, and hence the name radial basis function
network. The entire network forms a linear combination of the nonlinear basis function.

Conclusion:-

17
Exp 04

def rbf(x, c, s):


    return np.exp(-1 / (2 * s**2) * (x-c)**2)
def kmeans(X, k):
    """Performs k-means clustering for 1D input
    
    Arguments:
        X {ndarray} -- A Mx1 array of inputs
        k {int} -- Number of clusters
    
    Returns:
        ndarray -- A kx1 array of final cluster centers
    """
 
    # randomly select initial clusters from input data
    clusters = np.random.choice(np.squeeze(X), size=k)
    prevClusters = clusters.copy()
    stds = np.zeros(k)
    converged = False
 
    while not converged:
        """
        compute distances for each cluster center to each point
        where (distances[i, j] represents the distance between the ith point and jth cluster)
        """
        distances = np.squeeze(np.abs(X[:, np.newaxis] - clusters[np.newaxis, :]))
 
        # find the cluster that's closest to each point
        closestCluster = np.argmin(distances, axis=1)
 
        # update clusters by taking the mean of all of the points assigned to that cluster
        for i in range(k):
            pointsForCluster = X[closestCluster == i]
            if len(pointsForCluster) > 0:
                clusters[i] = np.mean(pointsForCluster, axis=0)
 
        # converge if clusters haven't moved
        converged = np.linalg.norm(clusters - prevClusters) < 1e-6
        prevClusters = clusters.copy()
 
    distances = np.squeeze(np.abs(X[:, np.newaxis] - clusters[np.newaxis, :]))
    closestCluster = np.argmin(distances, axis=1)
 
    clustersWithNoPoints = []
    for i in range(k):
        pointsForCluster = X[closestCluster == i]
        if len(pointsForCluster) < 2:
            # keep track of clusters with no points or 1 point
            clustersWithNoPoints.append(i)
            continue
        else:
            stds[i] = np.std(X[closestCluster == i])
18
 
    # if there are clusters with 0 or 1 points, take the mean std of the other clusters
    if len(clustersWithNoPoints) > 0:
        pointsToAverage = []
        for i in range(k):
            if i not in clustersWithNoPoints:
                pointsToAverage.append(X[closestCluster == i])
        pointsToAverage = np.concatenate(pointsToAverage).ravel()
        stds[clustersWithNoPoints] = np.mean(np.std(pointsToAverage))
 
    return clusters, stds
class RBFNet(object):
    """Implementation of a Radial Basis Function Network"""
    def __init__(self, k=2, lr=0.01, epochs=100, rbf=rbf, inferStds=True):
        self.k = k
        self.lr = lr
        self.epochs = epochs
        self.rbf = rbf
        self.inferStds = inferStds
 
        self.w = np.random.randn(k)
        self.b = np.random.randn(1)
def fit(self, X, y):
    if self.inferStds:
        # compute stds from data
        self.centers, self.stds = kmeans(X, self.k)
    else:
        # use a fixed std
        self.centers, _ = kmeans(X, self.k)
        dMax = max([np.abs(c1 - c2) for c1 in self.centers for c2 in self.centers])
        self.stds = np.repeat(dMax / np.sqrt(2*self.k), self.k)
 
    # training
    for epoch in range(self.epochs):
        for i in range(X.shape[0]):
            # forward pass
            a = np.array([self.rbf(X[i], c, s) for c, s, in zip(self.centers, self.stds)])
            F = a.T.dot(self.w) + self.b
 
            loss = (y[i] - F).flatten() ** 2
            print('Loss: {0:.2f}'.format(loss[0]))
 
            # backward pass
            error = -(y[i] - F).flatten()
 
            # online update
            self.w = self.w - self.lr * a * error
            self.b = self.b - self.lr * error
def predict(self, X):
    y_pred = []
    for i in range(X.shape[0]):
        a = np.array([self.rbf(X[i], c, s) for c, s, in zip(self.centers, self.stds)])
        F = a.T.dot(self.w) + self.b
        y_pred.append(F)
    return np.array(y_pred)
# sample inputs and add noise
NUM_SAMPLES = 100
X = np.random.uniform(0., 1., NUM_SAMPLES)
X = np.sort(X, axis=0)
noise = np.random.uniform(-0.1, 0.1, NUM_SAMPLES)
y = np.sin(2 * np.pi * X)  + noise
 
rbfnet = RBFNet(lr=1e-2, k=2)
rbfnet.fit(X, y)

19
 
y_pred = rbfnet.predict(X)
 
plt.plot(X, y, '-o', label='true')
plt.plot(X, y_pred, '-o', label='RBF-Net')
plt.legend()
 
plt.tight_layout()
plt.show()

20
D.Y. Patil College of Engineering, Akurdi
Department of Electronics & Telecommunication Engineering

Subject : Machine Learning Class: B.E.(Elective 3)


Date:-

Experiment No:- 05

Title: Self Organizing Feature Map.

Aim: Implement Self Organizing Feature Map (SOFM) for character recognition.

Theory:

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of


artificial neural network (ANN) that is trained using unsupervised learning to produce a low-
dimensional (typically two-dimensional), discretized representation of the input space of the
training samples, called a map, and is therefore a method to do dimensionality reduction.
Self-organizing maps differ from other artificial neural networks as they apply competitive
learning as opposed to error-correction learning (such as backpropagation with gradient
descent), and in the sense that they use a neighborhood function to preserve the topological
properties of the input space.
The artificial neural network introduced by the Finnish professor Teuvo Kohonen in the
1980s is sometimes called a Kohonen map or network. The Kohonen net is a
computationally convenient abstraction building on work on biologically neural models. Like
most artificial neural networks, SOMs operate in two modes: training and mapping.
"Training" builds the map using input examples (a competitive process, also called vector
quantization), while "mapping" automatically classifies a new input vector.

Methods used for determining the Winner:


1) This method uses the squared Euclidean distance between input vector and weight
vector , and choosesthe unit whose vector has the smallest Euclidean distance from
the input vector.
2) This method uses the dot product of the input vector and the weight vector. The
largest dot product corresponds to the smallest angle between the input and weight
vectors if they are both of unit length.

21
The K-SOFM architecture is shown in figure:

The training utilizes competitive learning. When a training example is fed to the network, its
Euclidean distance to all weight vectors is computed. The neuron whose weight vector is
most similar to the input is called the best matching unit (BMU). The weights of the BMU
and neurons close to it in the SOM lattice are adjusted towards the input vector. The
magnitude of the change decreases with time and with distance (within the lattice) from the
BMU. The update formula for a neuron v with weight vector.

Conclusion:-

22
D.Y. Patil College of Engineering, Akurdi
Department of Electronics & Telecommunication Engineering

Subject : Machine Learning Class: B.E.(Elective 3)


Date:-

Experiment No:- 06

Title: Implement SVM classifier for classification.

Aim: Implement SVM classifier for classification of data into two classes student can use
dataset such as flower classification
Theory:

What is Support Vector Machine?


“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be
used for both classification or regression challenges. However,  it is mostly
used in classification problems. In this algorithm, we plot each data item as a point in n-
dimensional space (where n is number of features you have) with the value of each feature
being the value of a particular coordinate. Then, we perform classification by finding the
hyper-plane that differentiate the two classes very well (look at the below snapshot).

Support Vectors are simply the co-ordinates of individual observation. Support Vector
Machine is a frontier which best segregates the two classes (hyper-plane/ line).
You can look at definition of support vectors and a few examples of its working here.
 
How does it work?
Above, we got accustomed to the process of segregating the two classes with a hyper-plane.
Now the burning question is “How can we identify the right hyper-plane?”. Don’t worry, it’s
not as hard as you think!
Let’s understand:
 Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B
and C). Now, identify the right hyper-plane to classify star and circle.

23
You need to remember
a thumb rule to identify the right hyper-plane: “Select the hyper-plane which
segregates the two classes better”. In this scenario, hyper-plane “B” has excellently
performed this job.
 Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B
and C) and all are segregating the classes well. Now, How can we identify the right
hyper-plane?

Here, maximizing the distances


between nearest data point (either class) and hyper-plane will help us to decide the right
hyper-plane. This distance is called as Margin. Let’s look at the below snapshot:

Above, you can see that the margin for hyper-plane C is high as compared to both A and B.
Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-

24
plane with higher margin is robustness. If we select a hyper-plane having low margin then
there is high chance of miss-classification.
 Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in
previous section to identify the right hyper-plane

Some of you may have selected the


hyper-plane B as it has higher margin compared to A. But, here is the catch, SVM selects the
hyper-plane which classifies the classes accurately prior to maximizing margin. Here, hyper-
plane B has a classification error and A has classified all correctly. Therefore, the right hyper-
plane is A.
 Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of star lies in the territory of other(circle) class as
an outlier. 

As I have already
mentioned, one star at other end is like an outlier for star class. SVM has a feature to
ignore outliers and find the hyper-plane that has maximum margin. Hence, we can
say, SVM is robust to outliers.

25
 Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we
can’t have linear hyper-plane between the two classes, so how does SVM classify
these two classes? Till now, we have only looked at the linear hyper-plane.

SVM can solve this problem.


Easily! It solves this problem by introducing additional feature. Here, we will add a
new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:

In above plot, points to consider are:


o All values for z would be positive always because z is the squared sum of both
x and y
o In the original plot, red circles appear close to the origin of x and y axes,
leading to lower value of z and star relatively away from the origin result
to higher value of z.
In SVM, it is easy to have a linear hyper-plane between these two classes. But,
another burning question which arises is, should we need to add this feature manually to have
a hyper-plane. No, SVM has a technique called the kernel trick. These are functions which
takes low dimensional input space and transform it to a higher dimensional space i.e. it
converts not separable problem to separable problem, these functions are called kernels. It is
mostly useful in non-linear separation problem. 

26
Simply put, it does some extremely complex data transformations, then find out the
process to separate the data based on the labels or outputs you’ve defined.When we look at
the hyper-planein original input space it looks like a circle:

Now, let’s  look at the methods to apply SVM algorithm in a data science challenge.

Conclusion:-

27
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
y = np.array([0, 0, 0, 1])
clf=svm.SVC(kernel='linear', C=1e6)
clf.fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)

# plot the decision function


ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()

# create grid to evaluate model


xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

# plot decision boundary and margins


ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
# plot support vectors
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100, linewidth=1,
facecolors='none')
plt.show()

Output:

28
D.Y. Patil College of Engineering, Akurdi
Department of Electronics & Telecommunication Engineering

Subject : Machine Learning Class: B.E.(Elective 3)


Date:-

Experiment No:- 07

Title: Implement and test multiclass SVM classifier

Aim: Implement and test multiclass Support Vector Machine classifier

Theory:

Module overview

This article describes how to use the Two-Class Support Vector Machine module in Azure
Machine Learning Studio, to create a model that is based on the support vector machine
algorithm.

Support vector machines (SVMs) are a well-researched class of supervised learning methods.
This particular implementation is suited to prediction of two possible outcomes, based on
either continuous or categorical variables.

After defining the model parameters, train the model by using one of the training modules,
and providing a tagged dataset that includes a label or outcome column.
More about support vector machines

Support vector machines are among the earliest of machine learning algorithms, and SVM
models have been used in many applications, from information retrieval to text and image
classification. SVMs can be used for both classification and regression tasks.

This SVM model is a supervised learning model that requires labeled data. In the training
process, the algorithm analyzes input data and recognizes patterns in a multi-dimensional
feature space called the hyperplane. All input examples are represented as points in this
space, and are mapped to output categories in such a way that categories are divided by as
wide and clear a gap as possible.

For prediction, the SVM algorithm assigns new examples into one category or the other,
mapping them into that same space.
How to configure Two-Class Support Vector Machine

For this model type, it is recommended that you normalize the dataset before using it to train
the classifier.

1. Add the Two-Class Support Vector Machine module to your experiment in Studio.


2. Specify how you want the model to be trained, by setting the Create trainer mode
option.

29
 Single Parameter: If you know how you want to configure the model, you
can provide a specific set of values as arguments.

 Parameter Range: If you are not sure of the best parameters, you can find the
optimal parameters by specifying multiple values and using the Tune Model
Hyperparameters module to find the optimal configuration. The trainer iterates
over multiple combinations of the settings and determines the combination of
values that produces the best model.

2. For Number of iterations, type a number that denotes the number of iterations used
when building the model.

This parameter can be used to control trade-off between training speed and accuracy.

3. For Lambda, type a value to use as the weight for L1 regularization.

This regularization coefficient can be used to tune the model. Larger values penalize
more complex models.

4. Select the option, Normalize features, if you want to normalize features before


training.

If you apply normalization, before training, data points are centered at the mean and
scaled to have one unit of standard deviation.

5. Select the option, Project to the unit sphere, to normalize coefficients.

Projecting values to unit space means that before training, data points are centered at 0
and scaled to have one unit of standard deviation.

6. In Random number seed, type an integer value to use as a seed if you want to ensure
reproducibility across runs. Otherwise, a system clock value is used as a seed, which
can result in slightly different results across runs.
7. Select the option, Allow unknown category, to create a group for unknown values in
the training or validation sets. In this case, the model might be less precise for known
values, but it can provide better predictions for new (unknown) values.

If you deselect it, the model can accept only the values that are contained in the training
data.

8. Connect a labeled dataset, and one of the training modules:


 If you set Create trainer mode to Single Parameter, use the Train Model
module.

 If you set Create trainer mode to Parameter Range, use the Tune Model


Hyperparameters.

 Note

If you pass a parameter range to Train Model, it will use only the first value in the
parameter range list.

30
If you pass a single set of parameter values to the Tune Model
Hyperparameters module, when it expects a range of settings for each parameter, it
ignores the values and using the default values for the learner.

If you select the Parameter Range option and enter a single value for any parameter,
that single value you specified will be used throughout the sweep, even if other
parameters change across a range of values.

9. Run the experiment.

Results

After training is complete:

 To see a summary of the model's parameters, together with the feature weights
learned from training, , right-click the output of Train Model or Tune Model Hyper
parameters, and select Visualize.
 To use the trained models to make predictions, connect the trained model to the Score
Model module.

 To perform cross-validation against a labeled data set, connect the untrained model
and the dataset to Cross-Validate Model.

Examples

For examples of how this learning algorithm is used, see the Azure AI Gallery:

 Direct marketing: Uses an SVM model to classify customers by appetency.


 Credit risk prediction: Uses SVM for assessing credit risk.

 Compare Multiclass Classifiers:Uses an SVM model for handwriting recognition.

Technical notes

This section contains implementation details, tips, and answers to frequently asked questions.
Usage tips

For this model type, it is recommended that you normalize the dataset before using it to train
the classifier.

Although recent research has developed algorithms that have higher accuracy, this algorithm
can work well on simple data sets when your goal is speed over accuracy. If you do not get
the desired results by using Two-Class Support Vector Model, try one of these
classification methods:

 Multiclass Logistic Regression


 Two-Class Boosted Decision Tree

Module parameters

31
Name Range Type Default Description

Number of >=1 Integer 1 The number of iterations


iterations

Lambda >=double.Epsilon Float 0.001 Weight for L1 regularization.


Using a non-zero value avoids
overfitting the model to the
training dataset.

Normalize Any Boolean True If True, normalize the features.


features

Project to Any Boolean False If True, project the features to a


the unit- unit circle.
sphere

Random Any Integer The seed for the random number


number seed generator used by the model.
Leave it blank for the default.

Allow Any Boolean True If True, creates an additional


unknown level for each categorical
categorical column. Any levels in the test
levels dataset that are not available in
the training dataset are mapped
to this additional level.

Output
Name Type Description

Untrained model Data Table An untrained binary classification model.

Conclusion:-

32
Exp no 07:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")
bankdata.shape
bankdata.head()
X = bankdata.drop('Class', axis=1)
y = bankdata['Class']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
from sklearn.svm import SVC
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

output:

[[152 0] [ 1 122]]
precision recall f1-score support

0 0.99 1.00 1.00 152


1 1.00 0.99 1.00 123

avg / total 1.00 1.00 1.00 275

33
D.Y. Patil College of Engineering, Akurdi
Department of Electronics & Telecommunication Engineering

Subject : Soft Computing Techniques Class: B.E.(Elective 3)


Date:-

Experiment No:- 08

Title: Study and Implement and test CNN for object recognition

Aim: Implement and test Convolution Neural Network for object recognition.

Artificial Intelligence has been witnessing a monumental growth in bridging the gap
between the capabilities of humans and machines. Researchers and enthusiasts alike,
work on numerous aspects of the field to make amazing things happen. One of many
such areas is the domain of Computer Vision.

The agenda for this field is to enable machines to view the world as humans do,
perceive it in a similar manner and even use the knowledge for a multitude of tasks
such as Image & Video recognition, Image Analysis & Classification, Media
Recreation, Recommendation Systems, Natural Language Processing, etc. The
advancements in Computer Vision with Deep Learning has been constructed and
perfected with time, primarily over one particular algorithm.

Convolution Neural Network.

Introduction

34
A CNN sequence to classify handwritten digits
A Convolutional Neural Network (ConvNet/CNN):

It is a Deep Learning algorithm which can take in an input image, assign importance
(learnable weights and biases) to various aspects/objects in the image and be able to
differentiate one from the other. The pre-processing required in a ConvNet is much lower as
compared to other classification algorithms. While in primitive methods filters are hand-
engineered, with enough training, ConvNets have the ability to learn these
filters/characteristics.

The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in


the Human Brain and was inspired by the organization of the Visual Cortex. Individual
neurons respond to stimuli only in a restricted region of the visual field known as the
Receptive Field. A collection of such fields overlap to cover the entire visual area.

35
Why ConvNets over Feed-Forward Neural Nets?

Flattening of a 3x3 image matrix into a 9x1 vector


An image is nothing but a matrix of pixel values, right? So why not just flatten the image (e.g.
3x3 image matrix into a 9x1 vector) and feed it to a Multi-Level Perceptron for classification
purposes? Uh.. not really.In cases of extremely basic binary images, the method might show
an average precision score while performing prediction of classes but would have little to no
accuracy when it comes to complex images having pixel dependencies throughout.
A ConvNet is able to successfully capture the Spatial and Temporal
dependencies in an image through the application of relevant filters. The architecture
performs a better fitting to the image dataset due to the reduction in the number of
parameters involved and reusability of weights. In other words, the network can be
trained to understand the sophistication of the image better.

Input Image

4x4x3 RGB Image

36
In the figure, we have an RGB image which has been separated by its three color
planes — Red, Green, and Blue. There are a number of such color spaces in which images
exist — Grayscale, RGB, HSV, CMYK, etc.
You can imagine how computationally intensive things would get once the images
reach dimensions, say 8K (7680×4320). The role of the ConvNet is to reduce the images into
a form which is easier to process, without losing features which are critical for getting a good
prediction. This is important when we are to design an architecture which is not only good at
learning features but also is scalable to massive datasets.
Convolution Layer — The Kernel

Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature
Image Dimensions = 5 (Height) x 5 (Breadth) x 1 (Number of channels, eg. RGB)In the above
demonstration, the green section resembles our 5x5x1 input image, I. The element involved
in carrying out the convolution operation in the first part of a Convolutional Layer is called
the Kernel/Filter, K, represented in the color yellow. We have selected K as a 3x3x1
matrix.image over which the kernel is hovering.

37
Movement of the Kernel
The filter moves to the right with a certain Stride Value till it parses the complete
width. Moving on, it hops down to the beginning (left) of the image with the same Stride
Value and repeats the process until the entire image is traversed.

Convolution operation on a MxNx3 image matrix with a 3x3x3 Kernel


In the case of images with multiple channels (e.g. RGB), the Kernel has the same
depth as that of the input image. Matrix Multiplication is performed between Kn and In stack
([K1, I1]; [K2, I2]; [K3, I3]) and all the results are summed with the bias to give us a squashed
one-depth channel Convoluted Feature Output.

38
Convolution Operation with Stride Length = 2
The objective of the Convolution Operation is to extract the high-level features such
as edges, from the input image. ConvNets need not be limited to only one Convolutional
Layer. Conventionally, the first ConvLayer is responsible for capturing the Low-Level
features such as edges, color, gradient orientation, etc. With added layers, the architecture
adapts to the High-Level features as well, giving us a network which has the wholesome
understanding of images in the dataset, similar to how we would.
There are two types of results to the operation — one in which the convolved feature is
reduced in dimensionality as compared to the input, and the other in which the dimensionality
is either increased or remains the same. This is done by applying Valid Padding in case of the
former, or Same Padding in the case of the latter.

SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 image


When we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1
kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1. Hence
the name — Same Padding.

39
On the other hand, if we perform the same operation without padding, we are
presented with a matrix which has dimensions of the Kernel (3x3x1) itself — Valid Padding.
The following repository houses many such GIFs which would help you get a better
understanding of how Padding and Stride Length work together to achieve results relevant to
our needs.
vdumoulin/conv_arithmetic
A technical report on convolution arithmetic in the context of deep learning -
vdumoulin/conv_arithmeticgithub.com
Pooling Layer

3x3 pooling over 5x5 convolved feature


Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the
spatial size of the Convolved Feature. This is to decrease the computational power required to
process the data through dimensionality reduction. Furthermore, it is useful for extracting
dominant featureswhich are rotational and positional invariant, thus maintaining the process of
effectively training of the model.
There are two types of Pooling: Max Pooling and Average Pooling. Max
Pooling returns the maximum value from the portion of the image covered by the Kernel. On
the other hand, Average Pooling returns the average of all the values from the portion of the
image covered by the Kernel.
Max Pooling also performs as a Noise Suppressant. It discards the noisy activations
altogether and also performs de-noising along with dimensionality reduction. On the other
hand, Average Pooling simply performs dimensionality reduction as a noise suppressing
mechanism. Hence, we can say that Max Pooling performs a lot better than Average Pooling.

40
The Convolutional Layer and the Pooling Layer, together form the i-th layer of a
Convolutional Neural Network. Depending on the complexities in the images, the number of
such layers may be increased for capturing low-levels details even further, but at the cost of
more computational power.
After going through the above process, we have successfully enabled the model to
understand the features. Moving on, we are going to flatten the final output and feed it to a
regular Neural Network for classification purposes.
Classification — Fully Connected Layer (FC Layer)

Adding a Fully-Connected layer is a (usually) cheap way of learning non-linear


combinations of the high-level features as represented by the output of the convolutional layer.
The Fully-Connected layer is learning a possibly non-linear function in that space.
Now that we have converted our input image into a suitable form for our Multi-Level
Perceptron, we shall flatten the image into a column vector. The flattened output is fed to a
feed-forward neural network and backpropagation applied to every iteration of training. Over
a series of epochs, the model is able to distinguish between dominating and certain low-level
features in images and classify them using the Softmax Classification technique.

Conclusion:-

41

You might also like