0% found this document useful (0 votes)
20 views

Classification of Dry Bean

classification data

Uploaded by

Ahaan Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Classification of Dry Bean

classification data

Uploaded by

Ahaan Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

In [1]: import numpy as np

import pandas as pd
import matplotlib as plt
%matplotlib inline
import seaborn as sns
df=pd.read_csv('Dry_Bean.csv')
import warnings
warnings.filterwarnings('ignore')

In [2]: df.head()

Out[2]: Area Perimeter MajorAxisLength MinorAxisLength AspectRation Eccentricity ConvexArea

0 28395 610.291 208.178117 173.888747 1.197191 0.549812 28715

1 28734 638.018 200.524796 182.734419 1.097356 0.411785 29172

2 29380 624.110 212.826130 175.931143 1.209713 0.562727 29690

3 30008 645.884 210.557999 182.516516 1.153638 0.498616 30724

4 30140 620.134 201.847882 190.279279 1.060798 0.333680 30417

 

In [3]: df.shape

(13611, 17)
Out[3]:

In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13611 entries, 0 to 13610
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Area 13611 non-null int64
1 Perimeter 13611 non-null float64
2 MajorAxisLength 13611 non-null float64
3 MinorAxisLength 13611 non-null float64
4 AspectRation 13611 non-null float64
5 Eccentricity 13611 non-null float64
6 ConvexArea 13611 non-null int64
7 EquivDiameter 13611 non-null float64
8 Extent 13611 non-null float64
9 Solidity 13611 non-null float64
10 roundness 13611 non-null float64
11 Compactness 13611 non-null float64
12 ShapeFactor1 13611 non-null float64
13 ShapeFactor2 13611 non-null float64
14 ShapeFactor3 13611 non-null float64
15 ShapeFactor4 13611 non-null float64
16 Class 13611 non-null object
dtypes: float64(14), int64(2), object(1)
memory usage: 1.8+ MB

In [5]: df.isnull().sum()
Area 0
Out[5]:
Perimeter 0
MajorAxisLength 0
MinorAxisLength 0
AspectRation 0
Eccentricity 0
ConvexArea 0
EquivDiameter 0
Extent 0
Solidity 0
roundness 0
Compactness 0
ShapeFactor1 0
ShapeFactor2 0
ShapeFactor3 0
ShapeFactor4 0
Class 0
dtype: int64

In [6]: df.describe()

Out[6]: Area Perimeter MajorAxisLength MinorAxisLength AspectRation Eccentricit

count 13611.000000 13611.000000 13611.000000 13611.000000 13611.000000 13611.00000

mean 53048.284549 855.283459 320.141867 202.270714 1.583242 0.75089

std 29324.095717 214.289696 85.694186 44.970091 0.246678 0.09200

min 20420.000000 524.736000 183.601165 122.512653 1.024868 0.21895

25% 36328.000000 703.523500 253.303633 175.848170 1.432307 0.71592

50% 44652.000000 794.941000 296.883367 192.431733 1.551124 0.76444

75% 61332.000000 977.213000 376.495012 217.031741 1.707109 0.81046

max 254616.000000 1985.370000 738.860154 460.198497 2.430306 0.91142

 

In [7]: print(df['Class'].value_counts())

DERMASON 3546
SIRA 2636
SEKER 2027
HOROZ 1928
CALI 1630
BARBUNYA 1322
BOMBAY 522
Name: Class, dtype: int64

In [8]: sns.histplot(x='Class',data=df)

<Axes: xlabel='Class', ylabel='Count'>


Out[8]:
In [9]: df['Class'].value_counts().plot(kind='pie')

<Axes: ylabel='Class'>
Out[9]:

In [10]: df['Class'].unique()

array(['SEKER', 'BARBUNYA', 'BOMBAY', 'CALI', 'HOROZ', 'SIRA', 'DERMASON'],


Out[10]:
dtype=object)

In [11]: df['Class'].value_counts()
DERMASON 3546
Out[11]:
SIRA 2636
SEKER 2027
HOROZ 1928
CALI 1630
BARBUNYA 1322
BOMBAY 522
Name: Class, dtype: int64

In [12]: sns.countplot(x='Class',data=df)

<Axes: xlabel='Class', ylabel='count'>


Out[12]:

In [13]: #plt.figure(figsize=(20,15))
heatmap = sns.heatmap(df.corr(), vmin=-1, vmax=1, annot=True, fmt=".2f", linewidth=
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=12)

Text(0.5, 1.0, 'Correlation Heatmap')


Out[13]:
In [14]: sns.heatmap(df.corr())

<Axes: >
Out[14]:
In [15]: df.hist(bins=30 , figsize=(15,15))

array([[<Axes: title={'center': 'Area'}>,


Out[15]:
<Axes: title={'center': 'Perimeter'}>,
<Axes: title={'center': 'MajorAxisLength'}>,
<Axes: title={'center': 'MinorAxisLength'}>],
[<Axes: title={'center': 'AspectRation'}>,
<Axes: title={'center': 'Eccentricity'}>,
<Axes: title={'center': 'ConvexArea'}>,
<Axes: title={'center': 'EquivDiameter'}>],
[<Axes: title={'center': 'Extent'}>,
<Axes: title={'center': 'Solidity'}>,
<Axes: title={'center': 'roundness'}>,
<Axes: title={'center': 'Compactness'}>],
[<Axes: title={'center': 'ShapeFactor1'}>,
<Axes: title={'center': 'ShapeFactor2'}>,
<Axes: title={'center': 'ShapeFactor3'}>,
<Axes: title={'center': 'ShapeFactor4'}>]], dtype=object)
In [16]: x= df.iloc[:,:-1].values
y= df.iloc[:,-1].values

In [17]: from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=0)

In [18]: #corr_matrix = df.corr()


#plt.figure(figsize=(10,15))
#sns.heatmap(corr_matrix , annot=True , cmap='crest')
#plt.show()

In [19]: from sklearn import metrics


from sklearn.metrics import precision_score,accuracy_score,recall_score,confusion_m
from sklearn.metrics import classification_report

In [20]: from sklearn.linear_model import LogisticRegression


regressor=LogisticRegression(random_state=0)
regressor.fit(x_train,y_train)

Out[20]: ▾ LogisticRegression
LogisticRegression(random_state=0)
In [21]: y_pred=regressor.predict(x_test)

In [22]: regressor.score(x_test,y_test)

0.7064354980899207
Out[22]:

In [23]: cm=confusion_matrix(y_test,y_pred)
print(cm)

[[169 0 91 0 53 1 5]
[ 0 110 1 0 0 0 0]
[ 88 0 311 0 15 2 3]
[ 0 0 0 811 12 28 54]
[ 9 0 16 20 290 5 151]
[ 0 0 0 104 10 279 77]
[ 0 0 0 56 111 87 434]]

In [24]: report=classification_report(y_test,y_pred)
print(report)

precision recall f1-score support

BARBUNYA 0.64 0.53 0.58 319


BOMBAY 1.00 0.99 1.00 111
CALI 0.74 0.74 0.74 419
DERMASON 0.82 0.90 0.86 905
HOROZ 0.59 0.59 0.59 491
SEKER 0.69 0.59 0.64 470
SIRA 0.60 0.63 0.61 688

accuracy 0.71 3403


macro avg 0.73 0.71 0.72 3403
weighted avg 0.70 0.71 0.70 3403

In [25]: Evaluation=pd.DataFrame(['LR'],columns=['Algorithm'])

Evaluation.loc[0,'Algorithm']='LR'
Evaluation.loc[0,'Precision']=metrics.precision_score(y_test, y_pred, average='micr
Evaluation.loc[0,'Recall']=metrics.recall_score(y_test, y_pred, average='micro')
Evaluation.loc[0,'F1 Score']=metrics.f1_score(y_test, y_pred, average='micro')
Evaluation.loc[0,'Accuracy']=metrics.accuracy_score(y_test,y_pred)
Evaluation

Out[25]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

In [26]: from sklearn.neighbors import KNeighborsClassifier


classifier=KNeighborsClassifier(n_neighbors=5 ,metric='minkowski', p=2)
classifier.fit(x_train,y_train)

Out[26]: ▾ KNeighborsClassifier

KNeighborsClassifier()

In [27]: y_pred=classifier.predict(x_test)

In [28]: report=classification_report(y_test,y_pred)
print(report)
precision recall f1-score support

BARBUNYA 0.48 0.48 0.48 319


BOMBAY 1.00 0.99 1.00 111
CALI 0.67 0.65 0.66 419
DERMASON 0.81 0.90 0.85 905
HOROZ 0.71 0.66 0.68 491
SEKER 0.77 0.61 0.68 470
SIRA 0.71 0.75 0.73 688

accuracy 0.73 3403


macro avg 0.73 0.72 0.72 3403
weighted avg 0.73 0.73 0.72 3403

In [29]: classifier.score(x_test,y_test)

0.7272994416691155
Out[29]:

In [30]: cm=confusion_matrix(y_test,y_pred)
print(cm)

[[152 0 113 0 45 0 9]
[ 0 110 1 0 0 0 0]
[116 0 273 0 28 1 1]
[ 0 0 0 816 2 54 33]
[ 44 0 21 15 322 0 89]
[ 0 0 0 94 5 289 82]
[ 4 0 1 84 54 32 513]]

In [31]: Evaluation.loc[1,'Algorithm']='KNN'
Evaluation.loc[1,'Precision']=metrics.precision_score(y_test, y_pred, average='micr
Evaluation.loc[1,'Recall']=metrics.recall_score(y_test, y_pred, average='micro')
Evaluation.loc[1,'F1 Score']=metrics.f1_score(y_test, y_pred, average='micro')
Evaluation.loc[1,'Accuracy']=metrics.accuracy_score(y_test,y_pred)
Evaluation

Out[31]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

1 KNN 0.727299 0.727299 0.727299 0.727299

In [32]: from sklearn.tree import DecisionTreeClassifier


dtc_model = DecisionTreeClassifier()
dtc_model.fit(x_train,y_train)

Out[32]: ▾ DecisionTreeClassifier

DecisionTreeClassifier()

In [33]: y_pred=dtc_model.predict(x_test)

In [34]: report=classification_report(y_test,y_pred)
print(report)
precision recall f1-score support

BARBUNYA 0.90 0.90 0.90 319


BOMBAY 1.00 0.99 1.00 111
CALI 0.93 0.92 0.93 419
DERMASON 0.90 0.89 0.89 905
HOROZ 0.92 0.92 0.92 491
SEKER 0.91 0.92 0.92 470
SIRA 0.82 0.83 0.83 688

accuracy 0.90 3403


macro avg 0.91 0.91 0.91 3403
weighted avg 0.90 0.90 0.90 3403

In [35]: dtc_model.score(x_test,y_test)

0.8959741404642962
Out[35]:

In [36]: cm=confusion_matrix(y_test,y_pred)
print(cm)

[[286 0 13 0 3 7 10]
[ 1 110 0 0 0 0 0]
[ 19 0 387 0 10 1 2]
[ 0 0 0 805 8 21 71]
[ 4 0 12 3 453 0 19]
[ 3 0 0 13 0 434 20]
[ 6 0 3 76 17 12 574]]

In [37]: Evaluation.loc[2,'Algorithm']='Decision Tree'


Evaluation.loc[2,'Precision']=metrics.precision_score(y_test, y_pred, average='micr
Evaluation.loc[2,'Recall']=metrics.recall_score(y_test, y_pred, average='micro')
Evaluation.loc[2,'F1 Score']=metrics.f1_score(y_test, y_pred, average='micro')
Evaluation.loc[2,'Accuracy']=metrics.accuracy_score(y_test,y_pred)
Evaluation

Out[37]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

1 KNN 0.727299 0.727299 0.727299 0.727299

2 Decision Tree 0.895974 0.895974 0.895974 0.895974

In [38]: from sklearn.naive_bayes import GaussianNB


classifier=GaussianNB()
classifier.fit(x_train,y_train)

Out[38]: ▾ GaussianNB

GaussianNB()

In [39]: report=classification_report(y_test,y_pred)
print(report)
precision recall f1-score support

BARBUNYA 0.90 0.90 0.90 319


BOMBAY 1.00 0.99 1.00 111
CALI 0.93 0.92 0.93 419
DERMASON 0.90 0.89 0.89 905
HOROZ 0.92 0.92 0.92 491
SEKER 0.91 0.92 0.92 470
SIRA 0.82 0.83 0.83 688

accuracy 0.90 3403


macro avg 0.91 0.91 0.91 3403
weighted avg 0.90 0.90 0.90 3403

In [40]: y_pred=classifier.predict(x_test)

In [41]: classifier.score(x_test,y_test)

0.7637378783426388
Out[41]:

In [42]: cm=confusion_matrix(y_test,y_pred)
print(cm)

[[150 0 120 0 39 0 10]


[ 0 111 0 0 0 0 0]
[ 77 0 322 0 18 0 2]
[ 0 0 0 782 0 91 32]
[ 18 0 21 10 372 0 70]
[ 3 0 0 73 3 330 61]
[ 0 0 0 40 53 63 532]]

In [43]: Evaluation.loc[3,'Algorithm']='Naive Bayes'


Evaluation.loc[3,'Precision']=metrics.precision_score(y_test, y_pred, average='micr
Evaluation.loc[3,'Recall']=metrics.recall_score(y_test, y_pred, average='micro')
Evaluation.loc[3,'F1 Score']=metrics.f1_score(y_test, y_pred, average='micro')
Evaluation.loc[3,'Accuracy']=metrics.accuracy_score(y_test,y_pred)
Evaluation

Out[43]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

1 KNN 0.727299 0.727299 0.727299 0.727299

2 Decision Tree 0.895974 0.895974 0.895974 0.895974

3 Naive Bayes 0.763738 0.763738 0.763738 0.763738

In [44]: from sklearn.svm import SVC


classifier=SVC(kernel= 'linear',random_state=0)
classifier.fit(x_train,y_train)

Out[44]: ▾ SVC
SVC(kernel='linear', random_state=0)

In [45]: y_pred=classifier.predict(x_test)

In [46]: report=classification_report(y_test,y_pred)
print(report)
precision recall f1-score support

BARBUNYA 0.95 0.86 0.90 319


BOMBAY 1.00 0.99 1.00 111
CALI 0.91 0.94 0.92 419
DERMASON 0.91 0.93 0.92 905
HOROZ 0.94 0.95 0.94 491
SEKER 0.94 0.95 0.95 470
SIRA 0.86 0.85 0.85 688

accuracy 0.91 3403


macro avg 0.93 0.92 0.93 3403
weighted avg 0.91 0.91 0.91 3403

In [47]: classifier.score(x_test,y_test)

0.9138995004407875
Out[47]:

In [48]: cm=confusion_matrix(y_test,y_pred)
print(cm)

[[275 0 26 0 3 4 11]
[ 0 110 1 0 0 0 0]
[ 13 0 393 0 7 1 5]
[ 0 0 0 840 2 12 51]
[ 1 0 12 5 464 0 9]
[ 1 0 0 2 0 445 22]
[ 1 0 2 76 17 9 583]]

In [49]: Evaluation.loc[4,'Algorithm']='SVM_linear'
Evaluation.loc[4,'Precision']=metrics.precision_score(y_test, y_pred, average='micr
Evaluation.loc[4,'Recall']=metrics.recall_score(y_test, y_pred, average='micro')
Evaluation.loc[4,'F1 Score']=metrics.f1_score(y_test, y_pred, average='micro')
Evaluation.loc[4,'Accuracy']=metrics.accuracy_score(y_test,y_pred)
Evaluation

Out[49]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

1 KNN 0.727299 0.727299 0.727299 0.727299

2 Decision Tree 0.895974 0.895974 0.895974 0.895974

3 Naive Bayes 0.763738 0.763738 0.763738 0.763738

4 SVM_linear 0.913900 0.913900 0.913900 0.913900

In [50]: from sklearn.svm import SVC


classifier=SVC(kernel='rbf')
classifier.fit(x_train,y_train)
classifier.score(x_test,y_test)

0.6394357919482809
Out[50]:

In [51]: y_pred=classifier.predict(x_test)

In [52]: report=classification_report(y_test,y_pred)
print(report)
precision recall f1-score support

BARBUNYA 0.37 0.07 0.12 319


BOMBAY 1.00 0.99 1.00 111
CALI 0.61 0.85 0.71 419
DERMASON 0.78 0.87 0.82 905
HOROZ 0.61 0.57 0.59 491
SEKER 0.38 0.26 0.31 470
SIRA 0.59 0.72 0.65 688

accuracy 0.64 3403


macro avg 0.62 0.62 0.60 3403
weighted avg 0.61 0.64 0.61 3403

In [53]: cm=confusion_matrix(y_test,y_pred)
print(cm)

[[ 23 0 209 0 77 0 10]
[ 1 110 0 0 0 0 0]
[ 23 0 357 0 37 0 2]
[ 0 0 0 787 0 90 28]
[ 16 0 23 9 280 19 144]
[ 0 0 0 174 9 123 164]
[ 0 0 0 39 58 95 496]]

In [54]: Evaluation.loc[5,'Algorithm']='SVM_rb funcion'


Evaluation.loc[5,'Precision']=metrics.precision_score(y_test, y_pred, average='micr
Evaluation.loc[5,'Recall']=metrics.recall_score(y_test, y_pred, average='micro')
Evaluation.loc[5,'F1 Score']=metrics.f1_score(y_test, y_pred, average='micro')
Evaluation.loc[5,'Accuracy']=metrics.accuracy_score(y_test,y_pred)
Evaluation

Out[54]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

1 KNN 0.727299 0.727299 0.727299 0.727299

2 Decision Tree 0.895974 0.895974 0.895974 0.895974

3 Naive Bayes 0.763738 0.763738 0.763738 0.763738

4 SVM_linear 0.913900 0.913900 0.913900 0.913900

5 SVM_rb funcion 0.639436 0.639436 0.639436 0.639436

In [55]: #plt.figure(figsize=(10,5))
sns.barplot(x='Algorithm',y='Precision',data=Evaluation)

<Axes: xlabel='Algorithm', ylabel='Precision'>


Out[55]:
In [56]: sns.barplot(x='Algorithm',y='Accuracy',data=Evaluation)

<Axes: xlabel='Algorithm', ylabel='Accuracy'>


Out[56]:

In [57]: sns.barplot(x='Algorithm',y='Recall',data=Evaluation)

<Axes: xlabel='Algorithm', ylabel='Recall'>


Out[57]:
In [58]: sns.barplot(x='Algorithm',y='F1 Score',data=Evaluation)

<Axes: xlabel='Algorithm', ylabel='F1 Score'>


Out[58]:

In [59]: Evaluation[Evaluation.Recall==Evaluation.Recall.min()]
Out[59]: Algorithm Precision Recall F1 Score Accuracy

5 SVM_rb funcion 0.639436 0.639436 0.639436 0.639436

In [60]: Evaluation[Evaluation.Recall==Evaluation.Recall.max()]

Out[60]: Algorithm Precision Recall F1 Score Accuracy

4 SVM_linear 0.9139 0.9139 0.9139 0.9139

In [61]: Evaluation[Evaluation.Precision==Evaluation.Precision.min()]

Out[61]: Algorithm Precision Recall F1 Score Accuracy

5 SVM_rb funcion 0.639436 0.639436 0.639436 0.639436

In [62]: Evaluation[Evaluation.Precision==Evaluation.Precision.max()]

Out[62]: Algorithm Precision Recall F1 Score Accuracy

4 SVM_linear 0.9139 0.9139 0.9139 0.9139

In [63]: Evaluation[Evaluation.Accuracy==Evaluation.Accuracy.min()]

Out[63]: Algorithm Precision Recall F1 Score Accuracy

5 SVM_rb funcion 0.639436 0.639436 0.639436 0.639436

In [64]: Evaluation[Evaluation.Accuracy==Evaluation.Accuracy.max()]

Out[64]: Algorithm Precision Recall F1 Score Accuracy

4 SVM_linear 0.9139 0.9139 0.9139 0.9139

In [65]: Evaluation

Out[65]: Algorithm Precision Recall F1 Score Accuracy

0 LR 0.706435 0.706435 0.706435 0.706435

1 KNN 0.727299 0.727299 0.727299 0.727299

2 Decision Tree 0.895974 0.895974 0.895974 0.895974

3 Naive Bayes 0.763738 0.763738 0.763738 0.763738

4 SVM_linear 0.913900 0.913900 0.913900 0.913900

5 SVM_rb funcion 0.639436 0.639436 0.639436 0.639436

In [ ]:

You might also like