0% found this document useful (0 votes)

95 views

Machine Learning Lab Mannual R20

Uploaded by

durgaganisetti873

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views

Machine Learning Lab Mannual R20

Uploaded by

durgaganisetti873

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 26

MACHINE LEARNING LAB MANNUAL R[20]

What is Find-S Algorithm in Machine Learning?

In order to understand Find-S algorithm, you need to have a basic idea of the following concepts
as well:

1. Concept Learning
2. General Hypothesis
3. Specific Hypothesis

1. Concept Learning

Let’s try to understand concept learning with a real-life example. Most of human learning is
based on past instances or experiences. For example, we are able to identify any type of vehicle
based on a certain set of features like make, model, etc., that are defined over a large set of
features.

These special features differentiate the set of cars, trucks, etc from the larger set of vehicles.
These features that define the set of cars, trucks, etc are known as concepts.

Similar to this, machines can also learn from concepts to identify whether an object belongs to a
specific category or not. Any algorithm that supports concept learning requires the following:

 Training Data
 Target Concept
 Actual Data Objects

2. General Hypothesis

Hypothesis, in general, is an explanation for something. The general hypothesis basically states
the general relationship between the major variables. For example, a general hypothesis for
ordering food would be I want a burger.

G = { ‘?’, ‘?’, ‘?’, …..’?’}

3. Specific Hypothesis

The specific hypothesis fills in all the important details about the variables given in the general
hypothesis. The more specific details into the example given above would be I want a
cheeseburger with a chicken pepperoni filling with a lot of lettuce.

S = {‘Φ’,’Φ’,’Φ’, ……,’Φ’}

Now, let’s talk about the Find-S Algorithm in Machine Learning.

The Find-S algorithm follows the steps written below:

1. Initialize ‘h’ to the most specific hypothesis.

2. The Find-S algorithm only considers the positive examples and eliminates negative
examples. For each positive example, the algorithm checks for each attribute in the
example. If the attribute value is the same as the hypothesis value, the algorithm moves
on without any changes. But if the attribute value is different than the hypothesis value,
the algorithm changes it to ‘?’.

Now that we are done with the basic explanation of the Find-S algorithm, let us take a look at
how it works.

How Does It Work?

1. The process starts with initializing ‘h’ with the most specific hypothesis, generally, it is
the first positive example in the data set.
2. We check for each positive example. If the example is negative, we will move on to the
next example but if it is a positive example we will consider it for the next step.
3. We will check if each attribute in the example is equal to the hypothesis value.
4. If the value matches, then no changes are made.
5. If the value does not match, the value is changed to ‘?’.
6. We do this until we reach the last positive example in the data set.

Limitations of Find-S Algorithm

There are a few limitations of the Find-S algorithm listed down below:

1. There is no way to determine if the hypothesis is consistent throughout the data.

2. Inconsistent training sets can actually mislead the Find-S algorithm, since it ignores the
negative examples.
3. Find-S algorithm does not provide a backtracking technique to determine the best
possible changes that could be done to improve the resulting hypothesis.

Now that we are aware of the limitations of the Find-S algorithm, let us take a look at a practical
implementation of the Find-S Algorithm.

Experiment-1:

Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.

Implementation of Find-S Algorithm

To understand the implementation, let us try to implement it to a smaller data set with a bunch of
examples to decide if a person wants to go for a walk.
The concept of this particular problem will be on what days does a person likes to go on walk.

Time Weather Temperature Company Humidity Wind Goes

Morning Sunny Warm Yes Mild Strong Yes
Evening Rainy Cold No Mild Normal No
Morning Sunny Moderate Yes Normal Normal Yes
Evening Sunny Cold Yes High Strong Yes

Looking at the data set, we have six attributes and a final attribute that defines the positive or
negative example. In this case, yes is a positive example, which means the person will go for a
walk.

So now, the general hypothesis is:

h0 = {‘Morning’, ‘Sunny’, ‘Warm’, ‘Yes’, ‘Mild’, ‘Strong’}

This is our general hypothesis, and now we will consider each example one by one, but only the
positive examples.

h1= {‘Morning’, ‘Sunny’, ‘?’, ‘Yes’, ‘?’, ‘?’}

h2 = {‘?’, ‘Sunny’, ‘?’, ‘Yes’, ‘?’, ‘?’}

We replaced all the different values in the general hypothesis to get a resultant hypothesis. Now
that we know how the Find-S algorithm works, let us take a look at an implementation
using Python.

Program:

import pandas as pd

import numpy as np

data = pd.read_csv("data.csv")

print(data,"n")

d = np.array(data)[:,:-1]

print("n The attributes are: ",d)

target = np.array(data)[:,-1]

print("n The target is: ",target)

def train(c,t):

for i, val in enumerate(t):

if val == "Yes":

specific_hypothesis = c[i].copy()

break

for i, val in enumerate(c):

if t[i] == "Yes":

for x in range(len(specific_hypothesis)):

if val[x] != specific_hypothesis[x]:

specific_hypothesis[x] = '?'

else:

pass

return specific_hypothesis

print("n The final hypothesis is:",train(d,target))

Output:
Experiment-2:

For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.

1. It is an extended form of FIND-s algorithm.

2. It considers both positive and negative examples.

3. Positive examples are used FIND-S algorithm (i.e, Specific to General).

4. While negative examples from general to specific.

Algorithm:

Step1: Load Data set

Step2: Initialize General Hypothesis and Specific Hypothesis.

Step3: For each training example

Step4: If example is positive example

if attribute_value == hypothesis_value:

Do nothing

else:

replace attribute value with '?' (Basically generalizing it)

Step5: If example is Negative example

Make generalize hypothesis more specific.

trainingdata.csv

Norma
Sunny Warm l Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Chang
Rainy Cold High Strong Warm e No
Chang
Sunny Warm High Strong Cool e Yes

prog2.py

import csv
with open("trainingdata.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)
s=data[1][:-1]
g=[['?' for i in range(len(s))] for j in range(len(s))]

for i in data:
if i[-1]=="Yes":
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'

elif i[-1]=="No":
for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
else:
g[j][j]="?"
print("\nSteps of Candidate Elimination Algorithm",data.index(i)+1)
print(s)
print(g)
gh=[]
for i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s)
print("\nFinal general hypothesis:\n",gh)
Output

Steps of Candidate Elimination Algorithm 1

['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Steps of Candidate Elimination Algorithm 2

Steps of Candidate Elimination Algorithm 3

['Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same']
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', 'Same']]

Steps of Candidate Elimination Algorithm 4

['Sunny', 'Warm', '?', 'Strong', '?', '?']
[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Final specific hypothesis:

['Sunny', 'Warm', '?', 'Strong', '?', '?']

Final general hypothesis:

[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]
Experiment-3:

Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge to classify a
new sample.

Program:

Temperat Humidi Wind PlayTen

Outlook ure ty y nis

Sunny Hot High False No

Sunny Hot High True No

Ye
Overcast Hot High False
s

Ye
Rainy Mild High False
s

Ye
Rainy Cool Normal False
s

Rainy Cool Normal True No

Ye
Overcast Cool Normal True
s

Sunny Mild High False No

Ye
Sunny Cool Normal False
s

Ye
Rainy Mild Normal False
s

Ye
Sunny Mild Normal True
s

Ye
Overcast Mild High True
s

Ye
Overcast Hot Normal False
s

Rainy Mild High True No

Program:
import pandas as pd

from sklearn import tree

from sklearn.preprocessing import LabelEncoder

from sklearn.tree import DecisionTreeClassifier

from sklearn.externals.six import StringIO

data = pd.read_csv('tennisdata.csv')

print("The first 5 values of data is \n",data.head())

X = data.iloc[:,:-1]

print("\nThe first 5 values of Train data is \n",X.head())

y = data.iloc[:,-1]

print("\nThe first 5 values of Train output is \n",y.head())

# Convert them in numbers

le_outlook = LabelEncoder()

X.Outlook = le_outlook.fit_transform(X.Outlook)

le_Temperature = LabelEncoder()

X.Temperature = le_Temperature.fit_transform(X.Temperature)

le_Humidity = LabelEncoder()

X.Humidity = le_Humidity.fit_transform(X.Humidity)

le_Windy = LabelEncoder()

X.Windy = le_Windy.fit_transform(X.Windy)

print("\nNow the Train data is",X.head())

le_PlayTennis = LabelEncoder()

y = le_PlayTennis.fit_transform(y)

print("\nNow the Train data is\n",y)

classifier = DecisionTreeClassifier()
classifier.fit(X,y)

def labelEncoderForInput(list1):

list1[0] = le_outlook.transform([list1[0]])[0]

list1[1] = le_Temperature.transform([list1[1]])[0]

list1[2] = le_Humidity.transform([list1[2]])[0]

list1[3] = le_Windy.transform([list1[3]])[0]

return [list1]

inp = ["Rainy","Mild","High","False"]

inp1=["Rainy","Cool","High","False"]

pred1 = labelEncoderForInput(inp1)

y_pred = classifier.predict(pred1)

y_pred

print("\nfor input {0}, we obtain {1}".format(inp1,

le_PlayTennis.inverse_transform(y_pred[0])))

Experiment-4:

Exercise to solve the real world problems using the following machine learning methods:

a)Linear Regression b)Logistic Regression

a)Linear Regression :

Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression. Since linear regression shows the
linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.

Linear regression uses the relationship between the data-points to draw a straight line through all
them.

Program:
import matplotlib.pyplot as plt
from scipy import stats
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(x):
return slope * x + intercept
mymodel = list(map(myfunc, x))
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

Output:

b)Logistic Regression:

Logistic regression aims to solve classification problems. It does this by predicting categorical
outcomes, unlike linear regression that predicts a continuous outcome.

Program:

import numpy

from sklearn import linear_model

x = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-
1,1)

y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(x,y)

predicted = logr.predict(numpy.array([2.09]).reshape(-1,1))

print(predicted)

Output:

Experiment-5:

Develop a program for bias, variance, remove duplicates, cross validation

Description:

In machine learning, an error is a measure of how accurately an algorithm can make predictions
for the previously unknown dataset. On the basis of these errors, the machine learning model is
selected that can perform best on the particular dataset. There are mainly two types of errors in
machine learning, which are:

o Reducible errors: These errors can be reduced to improve the model accuracy. Such
errors can further be classified into bias and Variance.

Program:

import numpy as np

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.metrics import mean_squared_error

np.random.seed(42)
X = np.random.rand(100, 10)

y = np.dot(X, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) + np.random.randn(100) * 0.5

X = np.unique(X, axis=0)

y = y[:len(X)]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LinearRegression()

model.fit(X_train, y_train)

train_error = mean_squared_error(y_train, model.predict(X_train))

test_error = mean_squared_error(y_test, model.predict(X_test))

print("Training error:", train_error)

print("Testing error:", test_error)

num_splits = 10

cv_scores = cross_val_score(model, X, y, cv=num_splits, scoring='neg_mean_squared_error')

bias = -np.mean(cv_scores)

variance = np.var(cv_scores)

print("Bias:", bias)

print("Variance:", variance)

Output:

Training error: 29.298675932449534

Testing error: 35.55713027301944
Bias: 37.451044010954064
Variance: 153.46346777877687

Experiment-6:

Write a program to implement Categorical Encoding, One-hot Encoding.

One-hot encoding is used to convert categorical variables into a format that can be readily used
by machine learning algorithms.
The basic idea of one-hot encoding is to create new variables that take on values 0 and 1 to
represent the original categorical values.

For example, the following image shows how we would perform one-hot encoding to convert a
categorical variable that contains team names into new variables that contain only 0 and 1
values:

Program:

import pandas as pd

df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'],

'points': [25, 12, 15, 14, 19, 23, 25, 29]})

print(df)

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(handle_unknown='ignore')

encoder_df = pd.DataFrame(encoder.fit_transform(df[['team']]).toarray())

final_df = df.join(encoder_df)

print(final_df)

final_df.drop('team', axis=1, inplace=True)

print(final_df)
final_df.columns = ['points', 'teamA', 'teamB', 'teamC']

print(final_df)

Experiment-7:
Build an Artificial Neural Network by implementing the Back propagation algorithm and
test the same using appropriate data sets.

Back Propagation:

Back propagation is a widely used algorithm in machine learning and neural networks that is
used to train artificial neural networks (ANNs) by adjusting the weights of the connections
between neurons. The goal of back propagation is to minimize the difference between the
predicted output of the neural network and the true output of the training data.
The back propagation algorithm works by first passing an input through the neural network to
obtain an output. The difference between the predicted output and the true output is then
calculated, and this difference is used to adjust the weights of the connections between neurons
in the network. This adjustment is done by calculating the gradient of the error function with
respect to the weights, which tells us how much each weight needs to be adjusted in order to
decrease the error.

Program:
import numpy as np
X=np.array(([2,9],[1,5],[3,6]),dtype=float)
Y=np.array(([92],[86],[89]),dtype=float)
X=X/np.amax(X,axis=0)
Y=Y/100
def sigmoid (x):
return 1/(1+np.exp(-x))
def derivatives_sigmoid(x):
return x*(1-x)
epoch=7000
lr=0.1
inputlayer_neurons=2
hiddenlayer_neurons=3
output_neurons=1
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1+bh
hlayer_act=sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp=outinp1+bout
output=sigmoid(outinp)
EO=Y-output
outgrad=derivatives_sigmoid(output)
d_output=EO* outgrad
EH=d_output.dot(wout.T)
hiddengrad=derivatives_sigmoid(hlayer_act)
d_hiddenlayer=EH*hiddengrad
wout+=hlayer_act.T.dot(d_output)*lr
wh+=X.T.dot(d_hiddenlayer)*lr
print("input:\n" + str(X))
print("Actual output:\n" + str(Y))
print("predicted output:\n",output)

Output:
input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual output:
[[0.92]
[0.86]
[0.89]]
predicted output:
[[0.89699538]
[0.87715324]
[0.89533612]]

Experiment 8:

Write a program to implement k-nearest neighbor algorithm to classify the iris data set.
print both correct and wrong predictions.
Description:

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.

o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.

o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.

o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need
a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a
particular dataset. Consider the below diagram:

Program:

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load the iris dataset
iris = load_iris()
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2,
random_state=42)
# Create a KNeighborsClassifier object with k=3
knn = KNeighborsClassifier(n_neighbors=3)
# Fit the training data to the classifier
knn.fit(X_train, y_train)
# Predict the test set
y_pred = knn.predict(X_test)
# Print the number of correct and wrong predictions
correct = 0
wrong = 0
for i in range(len(y_test)):
if y_test[i] == y_pred[i]:
correct += 1
else:
wrong += 1
print("Number of correct predictions:", correct)
print("Number of wrong predictions:", wrong)

Output:
Number of correct predictions: 30
Number of wrong predictions: 0

Experiment-9:

Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

Program:
from math import ceil
import numpy as np
from scipy import linalg
def lowess(x, y, f, iterations):
n = len(x)
r = int(ceil(f * n))
h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]
w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for iteration in range(iterations):
for i in range(n):
weights = delta * w[:, i]
b = np.array([np.sum(weights * y), np.sum(weights * y * x)])
A = np.array([[np.sum(weights), np.sum(weights * x)],[np.sum(weights * x),
np.sum(weights * x * x)]])
beta = linalg.solve(A, b)
yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest
s = np.median(np.abs(residuals))
delta = np.clip(residuals / (6.0 * s), -1, 1)
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
y = np.sin(x) + 0.3 * np.random.randn(n)
f =0.25
iterations=3
yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt

plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")

Output:
Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
Document1.csv dataset
I love this sandwich pos
This is an amazing place pos
I feel very good about these beers,pos pos
This is my best work,pos pos
What an awesome view,pos pos
I do not like this restaurant,neg neg
I am tired of this stuff,neg neg
I can't deal with this,neg neg
He is my sworn enemy,neg neg
My boss is horrible,neg neg
This is an awesome place,pos pos
I do not like the taste of this juice,neg neg
I love to dance,pos pos
I am sick and tired of this place,neg neg
What a great holiday,pos pos
That is a bad locality to stay,neg neg
We will have good fun tomorrow,pos pos
I went to my enemy's house today,neg neg

Program:
import pandas as pd
msg = pd.read_csv('document1.csv', names=['message', 'label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
X = msg.message
y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

Output:
Total Instances of Dataset: 18
am amazing an and awesome bad boss can dance deal ... that the \
0 0 0 0 0 0 0 0 0 0 0 ... 0 0
1 0 0 0 0 0 1 0 0 0 0 ... 1 0
2 0 0 0 0 0 0 0 0 0 0 ... 0 0
3 0 0 0 0 0 0 0 1 0 1 ... 0 0
4 0 0 0 0 0 0 1 0 0 0 ... 0 0

this tired to today view went what with

0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0
2 0 0 1 1 0 1 0 0
3 1 0 0 0 0 0 0 1
4 0 0 0 0 0 0 0 0
[5 rows x 43 columns]
He is my sworn enemy,neg -> pos
That is a bad locality to stay,neg -> neg
I went to my enemy's house today,neg -> pos
I can't deal with this,neg -> pos
My boss is horrible,neg -> pos
Accuracy Metrics:
Accuracy: 1.0
Recall: 1.0
Precision: 1.0
Confusion Matrix:
[[1 0]
[0 4]]

Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.

Program:
from sklearn.cluster import KMeans
from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset=load_iris()
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')
# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')
# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')

Output:
Text(0.5, 1.0, 'GMM Classification')

Experiment-12:
Exploratory data analysis for classification using pandas or Matplotlib.

Description:
 EDA is applied to investigate the data and summarize the key insights.
 It will give you the basic understanding of your data, it’s distribution, null values and much
more.
 You can either explore data using graphs or through some python functions.
 In the non-graphical approach, you will be using functions such as shape, summary, describe,
isnull, info, datatypes and more.
 In the graphical approach, you will be using plots such as scatter, box, bar, density and
correlation plots.

Program:

import pandas as pd

import numpy as np

import seaborn as sns

df = pd.read_csv('titanic.csv')

df.head()  View the data

#Basic information

df.info()

#Describe the data

df.describe()
#Find the duplicates

df.duplicated().sum()

#unique values

df['Pclass'].unique()

df['Survived'].unique()

df['Sex'].unique()

#Plot the unique values

sns.countplot(df['Pclass']).unique()

#Datatypes

df.dtypes

#Filter data

df[df['Pclass']==1].head()

#Boxplot

df[['Fare']].boxplot()

#Correlation

df.corr()

#Correlation plot

sns.heatmap(df.corr())
Output:

Example Sky Airtemp Humidity Wind Water Forecast Enjoysport 1 2 3 4
No ratings yet
Example Sky Airtemp Humidity Wind Water Forecast Enjoysport 1 2 3 4
6 pages
Soft Computing Notes PDF
100% (1)
Soft Computing Notes PDF
69 pages
Find S Algorithm
No ratings yet
Find S Algorithm
7 pages
Find - S Algorithm
No ratings yet
Find - S Algorithm
17 pages
ML LAB PROGRAMS
No ratings yet
ML LAB PROGRAMS
42 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
48 pages
ML LAB
No ratings yet
ML LAB
21 pages
1.implement FIND-S Algorithm: Desription
No ratings yet
1.implement FIND-S Algorithm: Desription
19 pages
S Algorithm
No ratings yet
S Algorithm
19 pages
ML LAB EXP 1
No ratings yet
ML LAB EXP 1
5 pages
22K61A0618_removed_lab manual sasi cld
No ratings yet
22K61A0618_removed_lab manual sasi cld
25 pages
candidate_elimination_algorithm
No ratings yet
candidate_elimination_algorithm
3 pages
Machine Learning Notes Unit 1
No ratings yet
Machine Learning Notes Unit 1
25 pages
ML Lab
No ratings yet
ML Lab
49 pages
ML Problems
No ratings yet
ML Problems
9 pages
Find-S_Algorithm_Explanation
No ratings yet
Find-S_Algorithm_Explanation
2 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
43 pages
Machine Learninf File Final
No ratings yet
Machine Learninf File Final
45 pages
MLP - Iv Eee
No ratings yet
MLP - Iv Eee
36 pages
ML Lab Manual-99
No ratings yet
ML Lab Manual-99
23 pages
Lab Manual
No ratings yet
Lab Manual
55 pages
Ex 1 in ML
No ratings yet
Ex 1 in ML
4 pages
AD3461_ML Lab Manual
No ratings yet
AD3461_ML Lab Manual
54 pages
Candidate Elimination Algo
No ratings yet
Candidate Elimination Algo
13 pages
EXP2
No ratings yet
EXP2
3 pages
ML Lab - 231009 - 210335
No ratings yet
ML Lab - 231009 - 210335
38 pages
Ex.no.2_Find S Algorithm
No ratings yet
Ex.no.2_Find S Algorithm
3 pages
ML Lab Experiments (1) - Pages-1
No ratings yet
ML Lab Experiments (1) - Pages-1
6 pages
Lab Manual Final
No ratings yet
Lab Manual Final
34 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
15CSL76
No ratings yet
15CSL76
35 pages
2 concept-learning
No ratings yet
2 concept-learning
42 pages
Find S
No ratings yet
Find S
4 pages
IV - ML Lab
No ratings yet
IV - ML Lab
31 pages
ML Lab Program - VTU
No ratings yet
ML Lab Program - VTU
4 pages
Laboratory 1 Intro To Machine Learning 1
No ratings yet
Laboratory 1 Intro To Machine Learning 1
3 pages
R20-21NM-III-I-ML-LAB MANUAL (1)
No ratings yet
R20-21NM-III-I-ML-LAB MANUAL (1)
38 pages
MLDM 230207120936 121018d3
No ratings yet
MLDM 230207120936 121018d3
8 pages
Lecture No. 2: AU-KBC Research Centre, MIT Campus, Anna University
No ratings yet
Lecture No. 2: AU-KBC Research Centre, MIT Campus, Anna University
86 pages
ML-Lec4
No ratings yet
ML-Lec4
7 pages
ML 02 Concept
No ratings yet
ML 02 Concept
7 pages
MLlab Manual LIET
No ratings yet
MLlab Manual LIET
52 pages
Outcome Based Lab Report
No ratings yet
Outcome Based Lab Report
22 pages
Amit MLT1
No ratings yet
Amit MLT1
22 pages
VI Sem Machine Learning CS 601 PDF
No ratings yet
VI Sem Machine Learning CS 601 PDF
28 pages
CANDIDATE-ELIMINATION Learning Algorithm
0% (1)
CANDIDATE-ELIMINATION Learning Algorithm
3 pages
lab_program-2
No ratings yet
lab_program-2
4 pages
FINAL LAB PROGRAMS (2)
No ratings yet
FINAL LAB PROGRAMS (2)
52 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
ML MANUAL (1)
No ratings yet
ML MANUAL (1)
74 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
VI Sem Machine Learning CS 601
No ratings yet
VI Sem Machine Learning CS 601
28 pages
Candidate
No ratings yet
Candidate
4 pages
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
No ratings yet
PESIT Bangalore South Campus: Vii Semester Lab Manual Subject: Machine Learning
31 pages
ML unit -I part II
No ratings yet
ML unit -I part II
9 pages
(ML) Machine Learning Lab Manual
No ratings yet
(ML) Machine Learning Lab Manual
25 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
Candidate Elimination Algorithm
No ratings yet
Candidate Elimination Algorithm
24 pages
EXP 4-Find S
No ratings yet
EXP 4-Find S
2 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
From Everand
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
Dwayne Phillips
No ratings yet
Hinton - Deep Learning For AI
No ratings yet
Hinton - Deep Learning For AI
8 pages
MCA (New) 2nd Year Syllabus 2021 - 2022
No ratings yet
MCA (New) 2nd Year Syllabus 2021 - 2022
48 pages
Multivariate Demand Forecasting of Sales Data
No ratings yet
Multivariate Demand Forecasting of Sales Data
14 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
32 pages
Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning
No ratings yet
Feedforward Neural Networks in Depth, Part 1 - Forward and Backward Propagations - I, Deep Learning
11 pages
neural-networks-essay-feranmi-dere
No ratings yet
neural-networks-essay-feranmi-dere
7 pages
Genetic Neural Network Based Data Mining in Prediction of Heart Disease Using Risk Factors
No ratings yet
Genetic Neural Network Based Data Mining in Prediction of Heart Disease Using Risk Factors
5 pages
short notes CCS35_Neural_Network_and_Deep_Learning_U3
No ratings yet
short notes CCS35_Neural_Network_and_Deep_Learning_U3
41 pages
module 2
No ratings yet
module 2
42 pages
ML UNIT 3
No ratings yet
ML UNIT 3
17 pages
Prediction of TBM Jamming Risk in Squeezing Grounds Using Bayesian and Artificial Neural Networks
No ratings yet
Prediction of TBM Jamming Risk in Squeezing Grounds Using Bayesian and Artificial Neural Networks
11 pages
Comparison of LMS and Neural Network
No ratings yet
Comparison of LMS and Neural Network
42 pages
CIMAT A Compute-In-Memory Architecture For On-Chip Training Based On Transpose SRAM Arrays
No ratings yet
CIMAT A Compute-In-Memory Architecture For On-Chip Training Based On Transpose SRAM Arrays
11 pages
IT Dept. Machine Learning (Stanford University)
No ratings yet
IT Dept. Machine Learning (Stanford University)
9 pages
Artificial Neural Networks - Theoretical B PDF
No ratings yet
Artificial Neural Networks - Theoretical B PDF
18 pages
Euro Banknote Recognition System Using A Three-Layered Perceptron and RBF Networks
No ratings yet
Euro Banknote Recognition System Using A Three-Layered Perceptron and RBF Networks
11 pages
AI Chapter 5
No ratings yet
AI Chapter 5
48 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
12 pages
Immediate download Discovering Knowledge in Data An Introduction to Data Mining 1st Edition Daniel T. Larose ebooks 2025
100% (2)
Immediate download Discovering Knowledge in Data An Introduction to Data Mining 1st Edition Daniel T. Larose ebooks 2025
77 pages
Chapter 6_Neural Networks
No ratings yet
Chapter 6_Neural Networks
4 pages
Remotesensing 12 01135 With Cover
No ratings yet
Remotesensing 12 01135 With Cover
25 pages
Sketch To Image Using GAN
No ratings yet
Sketch To Image Using GAN
6 pages
AIDS Module 4
No ratings yet
AIDS Module 4
29 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
SOFT COMPUTING 2 marks with answer
No ratings yet
SOFT COMPUTING 2 marks with answer
13 pages
Artificial neural network-based harmonics extraction
No ratings yet
Artificial neural network-based harmonics extraction
18 pages
Nils J. Nilsson - Introduction To Machine Learning
No ratings yet
Nils J. Nilsson - Introduction To Machine Learning
196 pages
A Survey of Deep Learning Techniques Applied To Trading: Limit Order Book Modeling
No ratings yet
A Survey of Deep Learning Techniques Applied To Trading: Limit Order Book Modeling
10 pages