100% found this document useful (1 vote)

618 views

Customer Loan Prediction: Term Project Report

The document is a term project report on customer loan prediction. It uses logistic regression and random forest models to analyze loan application data and predict whether a customer will be approved for a loan. The report introduces the data, describes the methodology, provides a workflow diagram, and compares the results of the two models. Visualizations of the data show that male customers and married customers are more likely to apply for loans, but loan approval rates are similar across gender and marital status.

Uploaded by

Anusha Bhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

618 views

Customer Loan Prediction: Term Project Report

Uploaded by

Anusha Bhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Term Project Report:

Customer Loan Prediction

Submitted by:
Amit Chawla
Anusha Bhatnagar
Bishwajeet Sisodiya
Ketan Sharma
Siddhant Cally
Executive MBA (2018-21)

Under the able guidance of -

Prof. Amlendu Dubey

1
Index

S. No. Contents Page

1. Abstract 3

2. Introduction 3

3. Methodology 4

4. Data Introduction 5

5. Comparison Criteria 5

6. Workflow of project 5

7. Visualization 5

8. Data Cleaning 8

9. Analysis Using GLM 9

10. Analysis Using Random forest 10

11. Result and Comparison 10

12. References 11

2
Abstract

Process of providing loans can be tedious and time consuming. We must have a standard set of defined rules

that can be applied to an entire population and help us to determine if one person is eligible to get a loan or

not. In this project we have described a very effective way for customer loan prediction. Our main interest is to

decide whether a customer will get the loan approved or not based on several factors. We are trying to automate

the loan eligibility process (real time) based on customer details provided while filling an online application form.

We have applied Logistics Regression and Random Forest to analyze and predict. Logistics regression and

Random Forest gives us the probability whether a customer should get loan or not. Depending on the accuracy

of these models we will select which one will best fit our data.

1. Introduction

For the past decade, for the extraction and manipulation of the data, data mining has become very efficient in
order to devise some patterns and to take accurate decisions. As we already know, to decrease randomness, we
must increase information. Data mining has proven to be a very effective method of accumulating data and
analyze it.
In 1997, Berry proposed that the there are six different data mining phases for any human problem that can be
stated as:
1. Classification
2. Estimation
3. Prediction
4. Affinity
5. Grouping
6. Description Of Problems
The whole process is called as “Knowledge Discovery”, that goes hand in hand with the statement of decreasing
randomness by increasing data. In 1998, Weiss classified Data mining into two parts: knowledge discovery and
prediction. First part includes classification, regression whereas second part defines association rules and
summarization.

Knowledge Discovery Database (KDD) has three stages.

• Data Pre-Processing
• Data Mining
• Data Post-Processing

For the initial stage, data processing is done which results in data collection, data smoothing, data
transformation, data cleansing and data reduction.
In the second stage which is called data mining which involves data classification commonly termed as
prediction. The final and the third stage which we called data post-processing, which shows the conclusion
part drawn from the analysis in the previous stage, on the basis of which we devise our further course of
action.

3
Predictive analytics is the use of data, mathematical algorithms and machine learning to identify the likelihood
of future events based on historical data. The main goal of predictive analytics is to use the knowledge of what
has happened to provide the best valuation of what will happen. In other words, predictive analytics can offer a
complete view of what is going on and the information we need to succeed.

Thanks to the diffusion of text analytics that have made the analysis of unstructured data less time consuming,
predictive analysis is increasing. Today, we are increasingly looking to machines that can take past and current
information to forecast future trends, such as sales trends for the coming months or years, or anticipating
customer behavior or as in our case, predicting loan eligibility.

2. Methodology
We divided the data set into two parts, setting the odd numbered data points as “training set” and the even
numbered data set as “test validation set”. The main purpose of using the train data is for model building. To
build a model, a predictive data mining technique is used and the various methods of that techniques has also
been engaged. The model accuracy is checked by uploading it onto the competition website. In this paper, the
study of loan data set has limited to the model validation based on the data set. Finally, all the methods of GLM
technique is described and compared with random forest result and the best result is shown.

A. Data Acquisition
Data set used for the research is Loan Dataset. For carrying out the process, Rstudio software is used for all the
analyses. All the analyses (GLM, RM) have been made before the software is used for the dataset. Before the
technique is used on the dataset, an introductory analysis is done on the data set to gain knowledge of dataset.

B. Data Description and Preprocessing

The plot between each variable in the data set and the indices is made to see the dispersion between variables,
which are different. The plot between the input variables and output variables is made to know the relationship
between them.

C. Generalized logistic model

In statistics, logistic regression, or logit model is a regression model where the dependent variables or output
variable is categorical. It comes under classification techniques. It can be binomial, ordinal or multinomial
depending on the outcome of dependent variables. Binary logistic is used where predictor variable has two
possible outcomes, “1 / 0”, “Yes / No”, “True / False”. Whereas, multinomial is used with more than two
outcomes. Logistic regression can also be thought as a special case of linear regression where the outcome
variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the
probability of occurrence of an event by fitting data to a logit function.

D. Random Forest
Random forest algorithm is a supervised classification algorithm. As the name suggest, this algorithm creates
the forest with a number of trees. Random forest algorithm or the random forest classifier can use for both
classification and the regression task. Random forest classifier will handle the missing values. Random decision
forests correct for decision trees' habit of overfitting to their training set.

E. Procedure
The data set is first introduced, as well as pre-processing on the data set is done, in order to gain insight of the
properties of the data set. By plotting the inputs over the output of the raw data set, relationship check is made.
To reduce the level of dispersion between the variables in the data set, the data is pre-processed. Preprocessing
is done by the scaling or standardizing the data set, it is also known as data preparation.

4
3. Data Introduction
We took dataset from kaggle which was collected from different customers when they fill out their application
while applying the loan. We were having 12 variables and all of them plays an important role in deciding whether
a customer should get the loan approval or not. The data set is divided into test and train part which helped us
to build our model on train data and then apply it on test data. All the variables in the dataset are independent.

4. Comparison Criteria
Mean Square Error, a criterion for model comparison is used in this research. MSE is the most significant
criteria for determining and comparing different data mining techniques. MSE measures the difference
between actual test outputs and the prediction test output. Smaller MSE is better. Large MSE values show
poor prediction. The MSE of the predictions is the mean of the squared difference between the observed and
the predicted values.
R-Square, also termed as R-Sq or R2 . It is used to measure the percentage variability in any of the data matrix,
which is accounted for by the built model. The value closer to 1 of R-Sq, shows a better prediction.

WORKFLOW OF PROJECT

5. Visualization
Many factors decide if the bank will give a customer loan or not. From the loan data available we can visualise
what factors affects this decision. Histogram of loan status for each gender shows that out of 775 male customer
625 were approved for loan which make 80.6% approval rate for male. While for female customers 145 of them
were approved out of 182 which makes approval rate of females to be 79.6%. This shows male customers are
more likely to apply for loan than females but the probability of getting loan approved is not dependent on sex
as the probabilities are near equal.

5
To check if the approval rate is dependent on Marital status or not, plotting a histogram again will help. 631
married customers applied for loan while only 347 unmarried applied for loan. This tells us that married
customers are twice as likely to apply for loan than unmarried customers. This makes sense because married
customers need to keep pace with children’s education as well as needs of the family. Approval rate for
married customers is 82.1% while for unmarried customers is 77.23%. Marital status plays a good role in a
successful loan approval. Lets analyse how loan approval is related to number of dependents of the customers.
From loan data we see that loan is majorly applied by the customers who are having no dependents which is
as high as 545 customers out of 981 customers and approval rate is 80.4%. While the approval rate for
customers with one dependent is 77.5%.

77.78% of customers who are graduated apply for loan verses 22.23% who are not graduated. Therefore we
can infer that application of loan is highly dependent on the education level of customer. Graduated customer
is more likely to get his/her loan approved as its approval rate is 81.7 % than the one not graduated.

Another interesting factor on which loan approval is dependent is Employment status of the Applicant. Self
employed customers are very unlikely to take loan. From the total population of 981, 809 customers who took
loan were NOT self employed while only 119 were self employed and for 55 customers employment status is
missing. There is 80.5% chance to get loan approved if the customer is not self employed. Few values are
missing in this column which are shown in 1st block.

6
Credit history of a customer plays a very significant role in credit card approval. Customers who doesn’t have a
credit history have on 45% of chance to get the loan approved from the bank while for customers who have a
credit history have chances as high as 88%. As the percentage is so high, it proves that credit history does play
a significant role in the decision of loan approval. But we can not say what score of credit history will decide
this as we don't have this data.
For the data taken and analyzed we have created and studied the box plot for the same. Box plot lets us know
about any kind of outliers with respect to different factors. Essentially it can help us to study that despite
having a satisfying number of characteristics, why some Loans were not approved to study more detrimental
factors while analyzing data.
The data below shows number of outliers with respect to loan amount and loan status. The data again shows
the outliers with Applicant Income with respect to Loan Status.

A stacked bar chart also suggests that number of females getting loan approved with number of dependents 3
or more is a lot less than one being granted to males.

7
The analysis also shows a relation between the income of the applicant and the loan amount. Majority of the
money borrowers are of the class of lower income, taking a lower amount of loan. The scatter plot might also
indicate some abnormalities in the data, people who have high income and low loan amount, while some
might have low income, but enough parameters to meet, that could get them high loan.

6. Data Cleaning
Data cleaning is a process of removing data in a database or dataset that is incorrect, incomplete, improperly
formatted or duplicated values or missing. Process of removing errors and resolving inconsistencies in source
data before loading into targets is called data cleaning. In this loan data set 13 variables are present in which 3
variables are error free. Loan_ID, “ApplicantIncome”, “CoApplicantIncome”. Other variables are either having
missing values or N/A values. We can visualize this by using Amelia package.

Before cleaning of dataset library(Amelia) missmap(com,main = 'Main',col= c('yellow','black'),legend =

FALSE)
With the help of amelia package and missmap function one can observe that credit_history, LoanAmount and
LoanAmount_Term has N/A values. And other variable have missing values which can not be displayed by this
function. Removing N/A and missing values and imputing it with the right values is a critical task.

8
Loan Amount is replaced by using range of Applicant Income and imputed based on the mean taken of the
Loan Amount of the specified range. Most of the Applicant range from 0-10000. So range is being taken for the
interval of 2000 starting from 0 till 10000 and one more range for 10000 and above. Taking the mean of Loan
Amount of the specified range and imputing the mean, and replacing it with N/A values.
Credit History has N/A values and that is being replaced by taking mean of the applicant income for credit
history ‘1’ and ‘0’ respectively. Than imputing on the bases of the mean range of Applicant income. N/A
values in Loan term Amount are imputed by using Mode function over Loan Amount term which gave
result as 360.
Other variables have missing values and that is not represented by missmap function like Gender. Gender has
many missing values and that is replaced by using mean of ApplicantIncome for Male and female separately.
Mean of male income was higher as compared to female. Imputation is based on the mean range of
ApplicantIncome. Same procedure was followed for Marriage. As Marriage also has missing values and
imputation is done accordingly. Employee status have also missing values and imputation of missing values is
done by taking mode of the values. Employee statues has 2 variables. ’Yes’ and ‘No’. 85% values were ‘No’ so
replace all the missing values it by ‘No’.
Dependent contains lot of missing values. Trimming of mean ApplicantIncome is done based on marital status
and number of dependent. Imputation is done based on trimmed mean which is taken for all the possible
marital status and number of dependent.

Post cleaning of data

Cleaning of data leads to more accuracy and make the algorithms work in better way.

7. Analysis Using GLM

We can now fit a logistic regression model to the data using the glm function. We will fit the model on all the
independent variables. The code to fit the model is:
log.model<- glm(formula = Loan_Status ~ .,family = binomial(link = 'logit'),data = train[,-1])
Summary(log.model).
From the results in the below Figure we see that the Credit_history and Property_suburban is most significant
features as there P-value is less than 0.05.

9
Now, after this we will predict the values for test data by fitting the GLM model. The code for that is :
df1$Loan_Status = predict(log.model, type="response",newdata = test)
From the results we see that the Credit_history and Property_suburban are most significant features as their
P-value is less than 0.05.
It can be seen that we are getting the continuous values of probability but we need a binomial outcomes
in form of 0 and 1 so we convert the values like those having probability greater than 0.5 to 1 and rest will
be converted to 0. fitted.results <- ifelse(df1$Loan_Status > 0.5,1,0)

8. Analysis Using Random forest

We can now fit a random forest model to the data using the glm function. We will fit the model on all the
independent variables. The code to fit the model is: y_pred <- randomForest(x=train[,c(-1,-
13)],y=train$Loan_Status,ntree = 10) fitted.results <- predict(y_pred,newdata = test[-1]) Summary(log.model)
importance(y_pred) varImpPlot(y_pred)

From the results in the Figure we see that the Applicant Income , Credit_history and Loan Amount Are most
significant features as their Mean DecreaseGini value is higher than others.

9. Result and Comparison

In this project, analysis on the dataset is done. Firstly, GLM, predictive data mining technique is applied on the
dataset and then random forest, another predictive data mining technique is applied and then both are
compared with each other, in order to check the effectiveness of the models. In both the techniques, the
resulting probability is then assigned a value 0 and 1. Those probabilities who are having value 0.5 or greater
have assigned 1 and rest of them 0. After applying the glm model we got 79.24% accuracy whereas random
forest gave us approx 77%. We can see the glm model is more effective here.

10
References

[1] Berry, Michael J. A. et al., Data-Mining Techniques for Marketing, Sales andCustomer Support. U.. S A:
John Wiley and Sons (1997).
[2] Weiss, Sholom M. et al., Predictive Data-Mining: A Practical Guide. San Francisco, Morgan
Kaufmann(1998).
[3] Jolliffe, I.T., Principal Component Analysis, New York: Springer-Verlag (1986).
[4] Naes, T., and H. Martens, "Principal Component Regression in NIR Analysis: Viewpoints,
BackgroundDetails and Selection of Components," J.Chemom. 2 (1988).
[5] Sun, J., "A correlation Principal Component Regression Analysis of NIR." J.Chemom9 (1995) [6]Practice
Problem: Loan Prediction III | Knowledge and Learning. (n.d.). Retrieved February16,2018

R Programming First Unit
No ratings yet
R Programming First Unit
34 pages
A Machine Learning Model For Average Fuel Consumption in Heavy Vehicles
No ratings yet
A Machine Learning Model For Average Fuel Consumption in Heavy Vehicles
20 pages
BI MCQs
33% (3)
BI MCQs
20 pages
IAF Assignment 2
No ratings yet
IAF Assignment 2
23 pages
Loan Prediction Using Machine Learning
No ratings yet
Loan Prediction Using Machine Learning
29 pages
Analysis On Credit Card Fraud Detection Methods
No ratings yet
Analysis On Credit Card Fraud Detection Methods
19 pages
COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
7 pages
Bank Loan Case Study
100% (1)
Bank Loan Case Study
24 pages
Loan Prediction
No ratings yet
Loan Prediction
3 pages
Capstone Project NBFC Loan Foreclosure Prediction
No ratings yet
Capstone Project NBFC Loan Foreclosure Prediction
48 pages
Fradulent Credit Case Study
100% (1)
Fradulent Credit Case Study
31 pages
Online Credit Card Fraud Detection Using Big Data: A Project Review On
No ratings yet
Online Credit Card Fraud Detection Using Big Data: A Project Review On
16 pages
Bankruptcy Prediction Report
No ratings yet
Bankruptcy Prediction Report
32 pages
Bank Loan System (Document)
67% (3)
Bank Loan System (Document)
72 pages
Predicting Personal Loan Approval Using Machine Learning Handbook
No ratings yet
Predicting Personal Loan Approval Using Machine Learning Handbook
31 pages
Fake News Detection Using LSTM
No ratings yet
Fake News Detection Using LSTM
67 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Loan Prediction
No ratings yet
Loan Prediction
37 pages
WORDPAD - CREDIT ANALYSIS OF PERSONAL LOAN" at ABN AMRO BANK FINAL
No ratings yet
WORDPAD - CREDIT ANALYSIS OF PERSONAL LOAN" at ABN AMRO BANK FINAL
53 pages
Mca Project - Synopsis
No ratings yet
Mca Project - Synopsis
11 pages
Signature Verification and Detection
No ratings yet
Signature Verification and Detection
61 pages
Loan Approval System Based On Machine Learning Approach
100% (1)
Loan Approval System Based On Machine Learning Approach
55 pages
Data Mining and Business Intelligence File
No ratings yet
Data Mining and Business Intelligence File
53 pages
Implementation of Credit Card Fraud Detection Using Random Forest Algorithm
100% (1)
Implementation of Credit Card Fraud Detection Using Random Forest Algorithm
10 pages
Machine Learning Applications Used in Accounting and Audits
100% (1)
Machine Learning Applications Used in Accounting and Audits
6 pages
Customer Review Analysis Using Data Science
No ratings yet
Customer Review Analysis Using Data Science
31 pages
Data Preparation
No ratings yet
Data Preparation
12 pages
PYTHON 2021-22 Projects List
No ratings yet
PYTHON 2021-22 Projects List
9 pages
Detection and Localization of Adaptive Hierarchical Cyber Attacks in Active Distribution Systems
No ratings yet
Detection and Localization of Adaptive Hierarchical Cyber Attacks in Active Distribution Systems
4 pages
Cyber Security MFM 228 (Kinu)
No ratings yet
Cyber Security MFM 228 (Kinu)
49 pages
DS notes BCA
No ratings yet
DS notes BCA
16 pages
Onlinepay
No ratings yet
Onlinepay
23 pages
Syed Althaf: Contact: +91 9849639797 E-Mail: Objective
No ratings yet
Syed Althaf: Contact: +91 9849639797 E-Mail: Objective
4 pages
Heart
No ratings yet
Heart
28 pages
Weather Data Analysis
No ratings yet
Weather Data Analysis
4 pages
Movie Ticket Booking System
No ratings yet
Movie Ticket Booking System
4 pages
Fresher Dotnet Resume Model 6 Net
No ratings yet
Fresher Dotnet Resume Model 6 Net
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Visvesvaraya Technological University: Lung Cancer Segmentation and Detection Using Machine Learning
No ratings yet
Visvesvaraya Technological University: Lung Cancer Segmentation and Detection Using Machine Learning
67 pages
Biometric ATM
No ratings yet
Biometric ATM
21 pages
Predicting Loan Default Data Analytics
100% (1)
Predicting Loan Default Data Analytics
3 pages
Credit Eda Case Study
100% (2)
Credit Eda Case Study
17 pages
Capital Budgeting (Or Investment Appraisal) Is The Planning Process Used To Determine Whether A Firm's
No ratings yet
Capital Budgeting (Or Investment Appraisal) Is The Planning Process Used To Determine Whether A Firm's
4 pages
Final YouTube Automating Comment Analysis
No ratings yet
Final YouTube Automating Comment Analysis
19 pages
Block Chain Mini-Project
No ratings yet
Block Chain Mini-Project
27 pages
Features of Mangerail Eco
No ratings yet
Features of Mangerail Eco
10 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Exam Cell Automation Project
0% (1)
Exam Cell Automation Project
17 pages
E Banking Final Hard
No ratings yet
E Banking Final Hard
73 pages
Cryptocurrency Price Prediction and Analysis: Submitted To Prof. Vijayasherly V., SCOPE
No ratings yet
Cryptocurrency Price Prediction and Analysis: Submitted To Prof. Vijayasherly V., SCOPE
24 pages
116222942-Data Mining-On-Forest-Cover-Prediction
No ratings yet
116222942-Data Mining-On-Forest-Cover-Prediction
21 pages
Project Reports On Non Performing Assets NPAs in Banking Industry
No ratings yet
Project Reports On Non Performing Assets NPAs in Banking Industry
73 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Security, Backup, Recovery, Tuning, Testing of Data Mining and Warehousing
No ratings yet
Security, Backup, Recovery, Tuning, Testing of Data Mining and Warehousing
16 pages
Business Data Analytics Question Bank
No ratings yet
Business Data Analytics Question Bank
2 pages
3-Difference Between DA and DBA
No ratings yet
3-Difference Between DA and DBA
2 pages
Credit Card Fraud Detection1
No ratings yet
Credit Card Fraud Detection1
5 pages
Cab Fare Prediction Report by Abhinav Jha
No ratings yet
Cab Fare Prediction Report by Abhinav Jha
41 pages
Project Lit Final1
No ratings yet
Project Lit Final1
15 pages
Fin Irjmets1651834789
No ratings yet
Fin Irjmets1651834789
8 pages
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
No ratings yet
Data Mining Techniques and Its Applications in Banking Section - Chitra and Subashini
8 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
The Morgan Kaufmann Series in Data Management Systems Jiawei Han Micheline Kamber Jian Pei Data Mining. Concepts and Techniques 3rd Edition Morgan Kaufmann 2011
No ratings yet
The Morgan Kaufmann Series in Data Management Systems Jiawei Han Micheline Kamber Jian Pei Data Mining. Concepts and Techniques 3rd Edition Morgan Kaufmann 2011
9 pages
Uas Menejemen Pengetahuan
No ratings yet
Uas Menejemen Pengetahuan
16 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
100% (1)
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
86 pages
CS614 FinalTerm Solved Papers
No ratings yet
CS614 FinalTerm Solved Papers
24 pages
Data Science
No ratings yet
Data Science
9 pages
A Behavior Based Intrusion Detection System Using Machine Learning Algorithms
No ratings yet
A Behavior Based Intrusion Detection System Using Machine Learning Algorithms
16 pages
UNIT I Introduction to Data mining-converted
No ratings yet
UNIT I Introduction to Data mining-converted
22 pages
J-1029 Research Methods
No ratings yet
J-1029 Research Methods
1 page
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
Internet Technologies Notes - TutorialsDuniya
No ratings yet
Internet Technologies Notes - TutorialsDuniya
172 pages
Decision Models For Record Linkage
No ratings yet
Decision Models For Record Linkage
15 pages
Data Mining - Practical Machine Learning Tools and
No ratings yet
Data Mining - Practical Machine Learning Tools and
3 pages
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
No ratings yet
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
12 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
DBMS Notes 2 - TutorialsDuniya
No ratings yet
DBMS Notes 2 - TutorialsDuniya
98 pages
Crime Investigation System PDF
No ratings yet
Crime Investigation System PDF
4 pages
CZ4032 Data Analytics & Mining Notes
No ratings yet
CZ4032 Data Analytics & Mining Notes
16 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
BA Questions - Answers
No ratings yet
BA Questions - Answers
12 pages
Apriori
No ratings yet
Apriori
3 pages
Instant Download Periodic Pattern Mining Theory Algorithms and Applications 1st Edition R Uday Kiran Philippe Fournier Viger Jose M Luna Jerry Chun Wei Lin Anirban Mondal PDF All Chapters
100% (3)
Instant Download Periodic Pattern Mining Theory Algorithms and Applications 1st Edition R Uday Kiran Philippe Fournier Viger Jose M Luna Jerry Chun Wei Lin Anirban Mondal PDF All Chapters
40 pages
Lessons Learned: A Case Study Using Data Mining in The Newspaper Industry
No ratings yet
Lessons Learned: A Case Study Using Data Mining in The Newspaper Industry
10 pages
K-Means Clustering Algorithm With Numerical Example - Coding Infinite
No ratings yet
K-Means Clustering Algorithm With Numerical Example - Coding Infinite
16 pages
DWDM
No ratings yet
DWDM
2 pages
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
No ratings yet
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
11 pages
UNIT 3-Clustering Metrics (1)
No ratings yet
UNIT 3-Clustering Metrics (1)
59 pages
04 Data Mining-Applications
No ratings yet
04 Data Mining-Applications
6 pages

Uploaded by

Uploaded by

Term Project Report:

Customer Loan Prediction

Under the able guidance of -

S. No. Contents Page

9. Analysis Using GLM 9

10. Analysis Using Random forest 10

11. Result and Comparison 10

Knowledge Discovery Database (KDD) has three stages.

B. Data Description and Preprocessing

C. Generalized logistic model

Before cleaning of dataset library(Amelia) missmap(com,main = 'Main',col= c('yellow','black'),legend =

Post cleaning of data

7. Analysis Using GLM

8. Analysis Using Random forest

9. Result and Comparison

You might also like