0% found this document useful (0 votes)
2K views

KIIT Deemed To Be University: A Project Report

This document is a project report on housing price prediction submitted to KIIT Deemed to be University by 5 students under the guidance of Prof. Bhaswati Sahoo. It includes an introduction to the project, literature survey of previous work on housing price prediction using machine learning algorithms, software requirements specification, system design, testing, implementation details using a housing dataset from Ames, Iowa, screenshots of the project, and conclusions. The objective is to identify important variables and define the best regression model to predict housing prices by analyzing over 1500 property sales between 2006-2012 described by 26 explanatory variables.

Uploaded by

SHADAB NADEEM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

KIIT Deemed To Be University: A Project Report

This document is a project report on housing price prediction submitted to KIIT Deemed to be University by 5 students under the guidance of Prof. Bhaswati Sahoo. It includes an introduction to the project, literature survey of previous work on housing price prediction using machine learning algorithms, software requirements specification, system design, testing, implementation details using a housing dataset from Ames, Iowa, screenshots of the project, and conclusions. The objective is to identify important variables and define the best regression model to predict housing prices by analyzing over 1500 property sales between 2006-2012 described by 26 explanatory variables.

Uploaded by

SHADAB NADEEM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

A PROJECT REPORT

on

“HOUSING PRICE
PREDICTION”

Submitted to
KIIT Deemed to be University

In Partial Fulfillment of the Requirement for the Award of

BACHELOR’S DEGREE IN COMPUTER


SCIENCE & ENGINEERING
BY

SOMYA RAJ SINHA 1606309


DEEPESH RATHORE 1606349
SWATI LALL 1606397
TUSHAR 1606398
NIDHI AGRAWAL 1606443
UNDER THE GUIDANCE OF
PROF. BHASWATI SAHOO

SCHOOL OF COMPUTER ENGINEERING


KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY
BHUBANESWAR, ODISHA - 751024
J u l y 2019
KIIT Deemed to be University
School of Computer Engineering
Bhubaneswar, ODISHA 751024

CERTIFICATE
This is certify that the project entitled
“HOUSING PRICE
PREDICTION”

submitted by

SOMYA RAJ SINHA 1606309


DEEPESH RATHORE 1606349
SWATI LALL 1606397
TUSHAR 1606398
NIDHI AGRAWAL 1606443

is a record of bonafide work carried out by them, in the partial fulfillment of the
requirement for the award of Degree of Bachelor of Engineering (Computer Sci- ence
& Engineering OR Information Technology) at KIIT Deemed to be university,
Bhubaneswar. This work is done during year 2018-2019, under our guidance.

Date: 05 /07 /19

(Prof. BHASWATI
SAHOO)
Acknowledgement
We are profoundly grateful of Prof. Bhaswati Sahoo for her expert guidance and
continuous encouragement throughout the project right from its commencement to
its completion.

SOMYA RAJ SINHA


DEEPESH RATHORE
SWATI LALL
TUSHAR
NIDHI AGRAWAL

School of Computer Engineering, KIIT, BBSR


ABSTRACT
Considering the fact that buying of houses is not a seasonal activity,but a regular
thing and describing homes and their variance with price is of utmost interest and
importance, using the linear regression model in Python,we are analyzing the
pricing patterns and identifying the features affecting the price of a house ,we are
predicting the price of houses of a city.

Broadly, this paper finds the solution to the question of how house prices are
affected by housing characteristics (both internally, such as the number of
bathrooms, bedrooms, etc. and externally, such as schools, or parks, etc. in the
neighbourhood). Using data from Kaggle, a prominent dataset website, this paper
utilizes both the Linear Regression model, to briefly predict house prices. This
paper also identifies the important attributes in housing price prediction such as
comparable houses, sold price, price per square foot, year in which the house is
sold, building type and bedroom, etc.

It also sees the variations of the sales price with the existing as well as derived
features for a more brief and easy to understand prediction through six graphs. We
seek answers to different questions on capabilities of data set and through the
scatterplot ,we show the comparison between actual price and predicted price and
arrive on a range of price which sees the most number of sales.

School of Computer Engineering, KIIT, BBSR


Contents: Page No.
1. Introduction …………………………………….1
2. Literature Survey ………………………………4
3. Software Requirements Specification………….5
3.1 Introduction………………………….………...5
3.2 Objective………………………………………..5
3.3 Problem Statement…………………………….5
3.4 System Overview………………………………6
3.5 Hardware Requirement……………………….6
3.6 Software Requirement……………………… .6
3.6.1 Technical Specification………………………6
4. System Design……………………………………8
5. System Testing………………………………… 9
6. Project Planning…………………………………10
7. Implementation…………………………………..11
7.1 Data set and features………………………… 11
7.2 Output label………………………………… 11
7.3 Data preprocessing and cleaning…………… 12
7.4 Data visualization and feature Engineering… 13
7.5 Applying the model and checking accuracy… 16
8. Screenshots of Project………………………… 17
8.1 Graphs for data visualization…………………17
8.2 Feature Selection………………………………18
8.3 Checking correlation………………………… 20
8.4 Model and Prediction………………………… 20
9. Conclusion and future scope……………………21
References...………………………………… 22
Appendix……………………………………. 23

School of Computer Engineering, KIIT, BBSR


Chapter 1

Introduction
The fact that buying of houses is not a seasonal activity,but a regular thing and
describing homes and their variance with price is of utmost interest and
importance, using the linear regression model in Python,we are analyzing the
pricing patterns and identifying the features affecting the price of a house ,we
are predicting the price of houses of a city.

Objective of this project is to identify the most important variables and to define
the best regression model for predicting the housing prices in Ames, Iowa. The
data set used for the project purposes, describes 1500 residential property sales
in Ames, Iowa between 2006 and 2012. It contains 26 explanatory variables
describing every aspect of the home. Continuous variables determine the various
area dimensions such as the size of the living area, the basement while discrete
variables quantify the number of rooms, baths, kitchens, parking spots etc.
Nominal variables typically describe the various types or classes of dwellings,
materials and locations such as the name of the neighborhood, the garage type,
the sale type etc. Ordinal variables typically rate the quality and condition of
different house parts and utilities. The fact that the data-set was over
parameterized and heterogeneous lead to the following hardships and increased
the difficulty of the analysis.

School of Computer Engineering, KIIT, BBSR Page 1


School of Computer Engineering, KIIT, BBSR Page 2
School of Computer Engineering, KIIT, BBSR Page 3
Chapter 2

Literature Survey
This section focuses on the most popular and relevant methods used for
predicting the housing prices. Many research has been done to practice the
prediction of the housing prices of different cities considering the different
attributes for each city. Methods like Linear Regression, Random Forest,SVM
and also other machine learning algorithms are used to predict the prices of the
house.

One of the famous research paper was written by An Nguyen. This paper
explores the question of how house prices in five different counties are affected
by housing characteristics (both internally, such as the number of bathrooms,
bedrooms, etc. and externally, such as public schools’ scores or the walkability
score of the neighborhood). This paper also identifies the four most important
attributes in housing price prediction across as assessment, comparable houses’
sold price, listed price and number of bathrooms.

The machine learning algorithms used in this paper are Random Forest and
Support Vector Machine (SVM) to do the prediction of houses in Zillow, Trulia,
and Red-fin.
Using a data-set of 1,457 houses from 5 different counties scraped from Zillow,
Trulia and Red-fin, this paper addresses the following questions:
1. Can the models propose in this paper outperform or get close to Zillow’s
prediction score baseline?
2. Can the overestimate to underestimated house ratio be reduced?
3. What are the most important attributes that affect the sold price?

For Hunt (TX), SVM outperforms the baseline by 3.2%. Random Forest outputs
close predictions scores to the baseline with the data-set from Cowlitz (WA) and
Montgomery (IL). Moreover, results suggest that using one single set of 10
attributes for all counties will not change the models’ accuracy scores by a lot in
comparison to using different sets of attributes for different counties.

School of Computer Engineering, KIIT, BBSR Page 4


Chapter 3
Software Requirements Specification

1.1 Introduction:
Through housing price prediction a user can predict the price of a house by
providing certain information about the house such as number of bedroom,
number of bathroom, kitchen area, living area, parking lot and various other
attributes.After providing these information we analyse the data and do data
engineering and select the relevant features to predict the price of the house.
3.2 Objective:
Objective of this project is to identify the most important variables and to define
the best regression model for predicting the housing prices in Ames, Iowa. The
data set used for the project purposes, describes 1500 residential property sales
in Ames, Iowa between 2006 and 2012. It contains 26 explanatory variables
describing every aspect of the home. Continuous variables determine the various
area dimensions such as the size of the living area, the basement while discrete
variables quantify the number of rooms, baths, kitchens, parking spots etc.
Nominal variables typically describe the various types or classes of dwellings,
materials and locations such as the name of the neighborhood, the garage type,
the sale type etc. Ordinal variables typically rate the quality and condition of
different house parts and utilities. The fact that the data-set was over
parameterized and heterogeneous lead to the following hardships and increased
the difficulty of the analysis.
3.3 Problem Statement:
Let's take a real estate company that has a dataset containing the prices of
properties. It wants to utilize the data to optimise the sale prices of the properties
based on important features.
Essentially, the company wants to —
 Identify the variables affecting house prices.
 Design a linear model that quantitatively relates house prices with variables
or factors such as number of rooms, area, number of bathrooms, etc.
 Know the accuracy of the model, i.e. how well these features can predict
house prices.

School of Computer Engineering, KIIT, BBSR Page 5


3.4 System Overview:

Figure 1 : Rough System Architecture

3.5 Hardware Requirement


Hardware Requirement
RAM : 8 GB
SYSTEM TYPE : 64-Bit Operating System,x64-based processor

3.6 Software Requirement


Software Specification
Operating System:Windows 10

3.6.1 TECHNICAL SPECIFICATION

The technical tools used in making this project include the following:

Python3: Python is an interpreted high-level programming language for


general-purpose programming. Created by Guido van Rossum and first
released in 1991, Python has a design philosophy that emphasizes code
readability, notably using significant whitespace. It provides constructs that
enable clear programming on both small and large scales.
School of Computer Engineering, KIIT, BBSR Page 6
Anaconda: is a free and open source distribution of the Python and R
programming languages for data science and machine learning related
applications (large-scale data processing, predictive analytics, scientific
computing), that aims to simplify package management and deployment.
Package versions are managed by the package management system conda,
which makes it quite simple to install, run, and update complex data science
and machine learning software libraries like Scikit-learn, TensorFlow, and
SciPy.

Jupyter Notebook: The Jupyter Notebook is an open-source web application


that allows you to create and share documents that contain live code, equations,
visualizations and narrative text. Uses include: data cleaning and
transformation, numerical simulation, statistical modeling, data visualization,
machine learning, and much

School of Computer Engineering, KIIT, BBSR Page 7


Chapter 4

System Design

School of Computer Engineering, KIIT, BBSR Page 8


Chapter 5

System Testing
Test Cases and Test Results

Test Test Case Title Test Condition System Behavior Expected Result
ID

T01 Accuracy Check on test data 76% 90%

School of Computer Engineering, KIIT, BBSR Page 9


Chapter 6
Project Planning
6.1 DATA COLLECTION :
We have collected the data set from Kaggle on Ames housing prices.

6.2 DATA CLEANING :

6.2.1 Removing null values.


6.2.2 Imputing missing values.

6.3 DATA PRE PROCESSING :

6.3.1 Handling categorical values.


6.3.2 Converting to proper data types.
6.3.3 Eliminating Constant and Quasi-constant columns.

6.4 DATA VISUALIZATION :


6.4.1 Plotting the graph between different columns to visualize the relation or trend between each
columns.
6.4.2 Plotting the heat map and visualizing the correlation between different columns, and
eliminating the columns having higher correlation (>0.8).

6.5 APPLYING LINEAR REGRESSION :


Calculating the accuracy of the model.

School of Computer Engineering, KIIT, BBSR Page 10


Chapter 7

Implementation

7.1 DataSet and Features


Sale Price, Bedrooms, Bathrooms, Kitchens, Ground Living Area, Year Sold, Year Built,
Garages are the main data set we are working on to train our model and to predict the
prices all these datas were obtained from Kaggle.

7.1.1 HOUSE AGE


The age of house has a great impact on it’s price. We have derived a feature of house age
by subtracting Year Sold and Year built. This feature will let us know that how the prices
are varying with the increase in its age. It would help us in predicting our model.

7.1.2 HOUSE TYPE


This feature is derived using the Building Type and Garages. It is used to know the
variation of prices with respect to the type of building(1BHK, Duplex, 3BHK) and the
numbers of cars can be parked in that building’s garage.

7.1.3 BATHROOMS

In the dataset we were given different types of bathroom of a house like full bathroom, half
half bathroom and bathroom in basement. So we combined all these into one column as
they all comes under bathrooms, and also bathrooms are something which everyone looks
for while buying a house. So it is very helpful in predicting the price of houses.

7.1.4 Ground Living Area

The Ground Living Area shows the area in Square foot in which the house is built. The
price of houses is almost dependent on this, as they are directly proportional.

7.2 OUTPUT LABEL

7.2.1 Sale Price

Sales price is the output label here as we have to predict the Sales price of the houses
considering the different attributes and features from the given dataset.

School of Computer Engineering, KIIT, BBSR


Page 11
7.3 DATA PREPROCESSING AND CLEANING

The data set of Ames Housing Price has been used which was taken from Kaggle. We have
cleaned and preprocessed the data by checking out the NULL values and corelation(>=0.8)
between the columns. We also have to deal with the categorical variables so, the attributes
which were not significant in predicting the price was also removed.

We can see that we have many Null values data, so we will replace them and also we have
Some constant and Quasi-Constant attributes we will also remove them as they won’t be
any helpful in prediction.
We will copy all the attributes which are useful in other data set in pandas(python).

Now, we will check for corelation between the attributes and deal with it.

Page 12
Here, from the corelation matrix we can see that OverAllQual is highly corelated with Sales Price.

7.4 DATA VISUALIZATION AND FEATURE ENGINEERING

7.4.1 Plotting the graph of Sales vs Sales Price

With this graph we can visualize the No. Of sales which is happening in a given price range.

Page 13
7.4.2 Plotting the graph of Ground Living Area vs Sales Price

With the help of this scatter plot we can visualize the outliers and also the variation in prices
with respect to the living area which is in square foot.

7.4.3 Plotting the graph of House Age vs Sales Price

We have made a new feature called house age(Year Sold - Year Built) to see the variation
in price trend of houses, this graph will help us to see how the price varies when the
age of house is more.

7.4.4 Plotting the graph of HouseType vs Sales Price

We have made a new feature out of Building Type and Garage Cars to see the price trend of houses
depending on the type of building and number of cars the garage can park.

We have done encoding of Building type to convert it from categorical variable to numerical variable.

Page 14
7.4.5 Plotting the graph of Bathroom vs Sales Price

In the dataset we were given different types of bathroom of a house like full bathroom, half
half bathroom and bathroom in basement. So we combined all these into one column as
they all comes under bathrooms, and also bathrooms are something which everyone looks
for while buying a house. So it is very helpful in predicting the price of houses.

Page 15
7.5 APPLYING THE MODEL AND CHECKING ACCURACY

Using the Linear Regression Model we are getting an accuracy something near to 76%.

Page 16
NAME OF PROJECT

Chapter 8

Screenshots of Project

8.1 Graphs for Data Visualization

Page 17
8.2 Feature Selection

School of Computer Engineering, KIIT, BBSR Page 18


Page 19
8.3 CHECKING CORELATION

8.4 MODEL AND PREDICTION

Page 20
Chapter 9
Conclusion and Future Scope

9.1 Conclusion
By analysing the pricing patterns and identifying the features affecting the
price of a house we are predicting the price of houses of that city.
It is observed by creating a scatter plot between the actual price and
observed price that the houses costing in between 1000000$ to
2000000$ are predicted quite accurately and is also observed that the
houses costing between 1500000$-200000$ are sold the most.

9.2 Future Scope


Once our model has been trained on a given set of data, it can now be
used to make predictions on new sets of input data. The model has learned
what the best questions to ask about the input data are, and can respond
with a prediction for the target variable. We can use these predictions to
gain information about data where the value of the target variable is
unknown — such as data the model was not trained on.

Page 21
References
 Kaggle.com

 Wikipedia.com

 Google.com

 Balsamiq.cloud : wireframe used as image

 Towardsdatascience.com

Page 22
Appendix-I
STUDENT'S CONTRIBUTION TO THE PROJECT

NAME OF STUDENT Somya Raj Sinha


ROLL NO 1606309

PROJECT TITLE Housing price prediction


ABSTRACT OF THE Considering the fact that buying of houses is not a seasonal
PROJECT (WITHIN 80 activity,but a regular thing and describing homes and their
WORDS) variance with price is of utmost interest and importance,using
linear regression model in Python,we are analyzing the pricing
patterns and identifying the features affecting the price of a
house ,we are predicting the price of houses of a city.

CONTRIBUTION
1. CONTRIBUTION TO Contributed in the report regarding project planning and
THE PROJECT implementation along with screenshot of project.
REPORT

2. CONTRIBUTION Derived features from the existing set of features and plotted
DURING the bar graph for house age(in years) vs sales price($).
IMPLEMENTATION

3. CONTRIBUTION FOR Histogram,scatterplot and bar graph for house age(in years) vs
THE PROJECT sales price($).
DEMONSTRATION /
PRESENTATION

SIGNATURE OF STUDENT

Page 23
SIGNATURE OF GUIDE
Appendix-II
STUDENT'S CONTRIBUTION TO THE PROJECT

NAME OF STUDENT Deepesh Rathore


ROLL NO 1606349

PROJECT TITLE Housing price prediction


ABSTRACT OF THE Considering the fact that buying of houses is not a seasonal
PROJECT (WITHIN 80 activity,but a regular thing and describing homes and their
WORDS) variance with price is of utmost interest and importance,using
linear regression model in Python,we are analyzing the pricing
patterns and identifying the features affecting the price of a
house ,we are predicting the price of houses of a city.

CONTRIBUTION
4. CONTRIBUTION TO Contributed in system design and testing along with a section
THE PROJECT of screenshot of project.
REPORT

5. CONTRIBUTION Plotted scatterplot of sale price against living area and bar
DURING graph for different features against sale price.
IMPLEMENTATION

6. CONTRIBUTION FOR Two bar graphs of house type and no of bathrooms vs sales
THE PROJECT price along with a scatterplot showing comparison between
DEMONSTRATION / predicted price and actual price.
PRESENTATION

SIGNATURE OF STUDENT

Page 24
SIGNATURE OF GUIDE
Appendix-III
STUDENT'S CONTRIBUTION TO THE PROJECT

NAME OF STUDENT Swati Lall


ROLL NO 1606397

PROJECT TITLE Housing price prediction


ABSTRACT OF THE Considering the fact that buying of houses is not a seasonal
PROJECT (WITHIN 80 activity,but a regular thing and describing homes and their
WORDS) variance with price is of utmost interest and importance,using
linear regression model in Python,we are analyzing the pricing
patterns and identifying the features affecting the price of a
house ,we are predicting the price of houses of a city.

CONTRIBUTION
7. CONTRIBUTION TO Contributed in introduction,software requirement specification
THE PROJECT and a section of screenshot of project.
REPORT

8. CONTRIBUTION Plotted scatter plot for the comparison of actual and predicted
DURING price and histogram of sale price against no of sales.
IMPLEMENTATION

9. CONTRIBUTION FOR Problem statement,motivation for the same and Matplotlib.


THE PROJECT
DEMONSTRATION /
PRESENTATION

SIGNATURE OF STUDENT

Page 25
SIGNATURE OF GUIDE
Appendix-IV
STUDENT'S CONTRIBUTION TO THE PROJECT

NAME OF STUDENT Tushar


ROLL NO 1606398

PROJECT TITLE Housing price prediction


ABSTRACT OF THE Considering the fact that buying of houses is not a seasonal
PROJECT (WITHIN 80 activity,but a regular thing and describing homes and their
WORDS) variance with price is of utmost interest and importance,using
linear regression model in Python,we are analyzing the pricing
patterns and identifying the features affecting the price of a
house ,we are predicting the price of houses of a city.

CONTRIBUTION
10. CONTRIBUTION TO Contributed in conclusion and future scope along with
THE PROJECT screenshot of project.
REPORT

11. CONTRIBUTION Applied Linear regression model.


DURING
IMPLEMENTATION

12. CONTRIBUTION FOR Applicability of the data,conclusion of the prediction and


THE PROJECT future outcomes expected from this project.
DEMONSTRATION /
PRESENTATION

SIGNATURE OF STUDENT

Page 26
SIGNATURE OF GUIDE
Appendix-V
STUDENT'S CONTRIBUTION TO THE PROJECT

NAME OF STUDENT Nidhi Agrawal


ROLL NO 1606443

PROJECT TITLE Housing price prediction


ABSTRACT OF THE Considering the fact that buying of houses is not a seasonal
PROJECT (WITHIN 80 activity,but a regular thing and describing homes and their
WORDS) variance with price is of utmost interest and importance,using
linear regression model in Python,we are analyzing the pricing
patterns and identifying the features affecting the price of a
house ,we are predicting the price of houses of a city.

CONTRIBUTION
13. CONTRIBUTION TO Contributed in data pre-processing,data cleaning and a section
THE PROJECT of screenshot of project.
REPORT

14. CONTRIBUTION Data cleaning and pre - processing.


DURING
IMPLEMENTATION

15. CONTRIBUTION FOR Tools used for the project along with the reason and Linear
THE PROJECT regression model in Python.
DEMONSTRATION /
PRESENTATION

SIGNATURE OF STUDENT

Page 27
SIGNATURE OF GUIDE
Page 28

You might also like