0% found this document useful (0 votes)
53 views

A Novel Method For Rainfall Prediction Using Machine Learning

Rainfall Prediction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

A Novel Method For Rainfall Prediction Using Machine Learning

Rainfall Prediction
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

© 2017 IJSRSET | Volume 3 | Issue 6 | Print ISSN: 2395-1990 | Online ISSN : 2394-4099

Themed Section: Engineering and Technology

A Novel method for Rainfall Prediction using Machine


Learning
Priti Pandey, Pankaj Richhariya
Bhopal Institute of Technology and Science, Bhopal, Madhya Pradesh, India

ABSTRACT

Surges are viewed as catastrophic events that can cause setbacks and destroying of infra structures. Uncertainty of
rainfall also creates problem, a reduced amount of rainfall and high amount of rainfall both are not desirable
henceforth for both the cases water resource management is necessary. Prediction of rainfall can play impotent role
for WRM (Water resource management). After studying different literature, work can be carried out using data
mining techniques and machine learning model. In this we have proposed a rainfall prediction model which is an
integration of clustering data mining technique and multiple regression, which will make efficient and accurate
prediction. Proposed algorithm used k- nearest neighbor regression, and we have also implemented k-medoid
regression. Further we have passed predicted data to classifier which will generate confusion matrix with two values
TPR (True Positive Rate) and FNR (False negative Rate).
Keywords: WRM, TPR, FNR

I. INTRODUCTION Boundaries of forecasting involves following points:-

Forecasting is a procedure of estimating or predicting  The analysis and collection of data about the present,
the future depends on past and nearby data. Forecasting history and future involves lots of time and capital.
provides information about the impending future Consequently, managers have to equilibrium the
measures and their consequences for the administration. cost of forecasting with its reimbursement. Most of
It may not decrease the difficulties and hesitation of the the small firms don't do forecasting on account of
future. Nevertheless, it increases the self-reliance of the the high cost.
management to craft imperative decisions. Forecasting  Forecasting task can only approximate the future
is the foundation of premising. Forecasting uses various measures. It cannot pledge that these measures will
statistical data. Consequently, it is also called as take place in the future. Long-term prediction will
Statistical Analysis. Significance of forecasting involves be fewer accurate in comparison with to short-range
following points: forecast.
 Data Prediction is based on convinced assumptions.
 Forecasting provides reliable and relevant If these assumptions are mistaken, the forecasting
information about the present and past events and will be incorrect. Forecasting is depend on past
the probable future measures. This is very essential measures. On the other hand, past may not reiterate
for sound planning. itself at all times.
 It gives self-belief to the managers for making  Forecasting need proper skills and judgment on the
imperative decisions. part of managers. Forecasts may go incorrect due to
 It is the source for making planning grounds. terrible judgment and skills on the part of some of
 It keeps managers alert and active to face the the managers. Consequently, predicting data are
challenges of future measures and the changes in subject to human error.
the atmosphere.

IJSRSET173647 | Received : 13 Sep 2017 | Accepted : 26 Sep 2017 | September-October-2017 [(3)6: 347-352] 347
Forecast is merely a prediction about the future values II. Literature Survey
of data. However, most extrapolative model forecasts
assume that the past is a proxy for the future. There are Andrew Kusiak et. al. said that Rainfall affects local
many traditional models for forecasting: exponential water quantity and quality. A data-mining approach is
smoothing, regression, time series, and composite applied to predict rainfall in a watershed basin at
model forecasts, often involving expert forecasts. Oxford, Iowa, based on radar reflectivity and tipping-
Regression analysis is a statistical technique to analyze bucket (TB) data. Five data-mining algorithms, neural
quantitative data to estimate model parameters and network, random forest, classification and regression
make forecasts. tree, support vector machine, and k-nearest neighbor,
are employed to build prediction models. The algorithm
Regression analysis is a statistical process for offering the highest accuracy is selected for further
estimating the relationships among variables. It includes study. Model I is the baseline model constructed from
many techniques for modeling and analyzing several radar data covering Oxford. Model II predicts rainfall
variables, when the focus is on the relationship between from radar and TB data collected at Oxford. Model III is
a dependent variable and one or more independent constructed from the radar and TB data collected at
variables (or 'predictors'). The horizontal line is called South Amana (16 km west of Oxford) and Iowa City
the X-axis and the vertical line the Y-axis. Regression (25 km east of Oxford). The computation results
analysis looks for a relationship between the X variable indicate that the three models offer similar accuracy
(sometimes called the “independent” or “explanatory” when predicting rainfall at current time. Model II
variable) and the Y variable (the “dependent” variable). performs better than the other two models when
predicting rainfall at future time horizons [IEEE 2013].

Pinky Saikia Dutta et. al. said that Meteorological data


mining is a form of data mining concerned with finding
hidden patterns inside largely available meteorological
data, so that the information retrieved can be
transformed into usable knowledge. Weather is one of
the meteorological data that is rich in important
knowledge. The most important climatic element which
Fig.-1 Linear Regression impacts on agricultural sector is rainfall. Thus rainfall
prediction becomes an important issue in agricultural
In simple regression analysis, one seeks to measure the country like India. Author uses data mining technique in
statistical association between two variables, X and Y. forecasting monthly Rainfall of Assam. This was carried
Regression analysis is generally used to measure how out using traditional statistical technique -Multiple
changes in the independent variable, X, influence Linear Regression. The data include Six years period
changes in the dependent variable, Y. Regression [2007-2012] collected locally from Regional
analysis shows a statistical association or correlation Meteorological Center, Guwahati,Assam, India . The
among variables, rather than a causal relationship performance of this model is measured in adjusted R-
among variables. The case of simple, linear, least squared .Our experiments results shows that the
squares regression may be written in the form: prediction model based on Multiple linear regression
indicates acceptable accuracy [IJCSE 2014].

M.Kannan et. al. concluded that Rainfall time series


Where Y, the dependent variable, is a linear function of may be unfounded. The topic of monsoon-rainfall data
X, the independent variable. The parameters α and β series is highly complex; the role that multiple linear
characterize the population regression line and e is the regressions might play in this topic is one for future
randomly distributed error term. The regression research—it appears, from the evidence here, not to be
estimates of α and β will be derived from the principle useful as a predictive model. Whether it might be useful
of least squares. for offering an approximate value of future monsoon
rainfall remains to be seen. Using this regression

International Journal of Scientific Research in Science, Engineering and Technology (ijsrset.com)


348
method, we have to forecast rainfall for our state also
[IJET 2010]. 3. Andrew Kusiak et. k- Among the five
al./ Modeling and NN, data-mining
Prediction of SVM, algorithms tested in
Ravinesh C. Deo et. al. siad that The prediction of
Rainfall Using MLP, this paper, the MLP
drought events is a topic of significant interest for the
Radar Reflectivity Rand has performed best.
management of water resources agriculture, facilities Data: A Data- om It has been selected
maintenance, control and infrastructural (floodgates, Mining Approach/ forest to predict rainfall for
airports, motor-roads, etc.). Our study attempted to IEEE 2013 three models for all
determine an effective data-driven machine learning future time horizons.
model for predicting the monthly Effective Drought The baseline Model
Index (Byun and Wilhite, 1999) using meteorological I has been
datasets from eastern Australia for the first time. A new constructed with
machine learning model (ELM), which was an radar reflectivity
data only. The
improved version of the SLFN architecture, was
proposed
investigated and the prediction skills were compared
methodology has
with the conventional ANN model with back demonstrated high-
propagation algorithm. The monthly variables used as accuracy rainfall
inputs to both models were the mean rainfall and mean, predictions in
maximum and minimum temperatures and the climate Oxford, Iowa.
mode indices (Southern Oscillation Index, Pacific
Decadal Oscillation, Indian Ocean Dipole and Southern
Annular Mode) [Elsevier 2014]. 4. Pinky Saikia Dutta Multi Uses data mining
Et. Al. / Prediction ple technique in
Of Rainfall Using linear forecasting monthly
Datamining regres Rainfall of Assam.
S. Author/Title/Year/ Meth Description
Technique Over sion This was carried out
N Publication od
Assam/ IJCSE 2014 using traditional
o. Used
statistical technique
-Multiple Linear
1. Shubhendu Trivedi K- Observed that use of Regression. The data
e. al. The Utility of Mena a predictor in include Six years
Clustering in s conjunction with period [2007-2012]
Prediction Tasks clustering improved collected locally
Centre for the prediction from Regional
Mathematics and accuracy in most Meteorological
Cognition gran 2011 datasets Center,
Guwahati,Assam,
India . The
2. Hakan Tongal et. al. k- A comparison of performance of this
Phase-space neare two nonlinear model model is measured
reconstruction and st approaches was in adjusted R-
self-exciting neigh made. Author used squared.
threshold modeling bour the k-NN approach
approach to forecast (k- and SETAR model
lake water levels NN) for prediction of 5. M.Kannan et. Regre Rainfall prediction
Springer-Verlag model water levels for the al./Rainfall ssion becomes a
Berlin Heidelberg & three largest lakes in Forecasting Using significant factor in
2013 SETA Sweden. Data Mining agricultural
R Technique/ IJET countries like India.
model 2010 Rainfall forecasting
has been one of the
most scientifically
and technologically

International Journal of Scientific Research in Science, Engineering and Technology (ijsrset.com)


349
challenging III. Problem Identification
problems around the
world in the last Jae-Hyun Seo et. al. Hindavi 2014 developed a method
century. Regression
to predict heavy rainfall in South Korea which uses k-
technique provides
NN and Variant k-NN as prediction model.
sifnificent accuracy.

K-nearest neighbours - Algorithm


6. Ravinesh C. Deo et. ANN The ELM model is Step-1. Training: Store all the examples
al./ Application of model seen to enhance the
Step-2. Prediction: h(xnew )
the extreme learning prediction skill of
Let be x1, . . . , xk the k more similar examples
machine algorithm the monthly
for the prediction of Effective Drought to xnew
monthly Effective Index over the ANN h(xnew )= combine predictions(x1, . . . , xk )
Drought Index in model, and Step-3. The parameters of the algorithm are the number
eastern Australia/ therefore, can k of neighbours and the procedure for combining the
Elsevier 2014 overcome predictions of the k examples
deficiencies in Step-4. The value of k has to be adjusted
prediction when (crossvalidation)
applied to climate
analysis that
typically requires There are some bottlenecks of k-nearest neighbor
thousands of training prediction are as follows:
data points and time
efficacy of the 1. The straightforward algorithm has a cost O(n
modeling log(k)), not good if the dataset is large.
framework. 2. The model cannot be interpreted (there is no
description of the learned concepts).
3. It is computationally expensive to find the k nearest
7. Jae-Hyun Seo et. k-NN In comparative SVM
al./Feature Selection and tests using
neighbors when the dataset is very large.
for Very Short-Term k- evolutionary 4. Performance depends on the number of dimensions
Heavy Rainfall VNN algorithms, the that we have (curse of dimensionality) =⇒ Attribute
Prediction Using results showed that Selection.
Evolutionary genetic algorithm 5. The more dimensions we have, the more examples
Computation/ was considerably we need to approximate a hypothesis.
Hindawi 2013 superior to 6. This is especially bad for k-nearest neighbors i.e. if
differential the number of dimensions is very high the nearest
evolution. Te
neighbors can be very far away.
equitable treatment
7. The number of examples that we have in a volume
score of SVM with
polynomial kernel
of space decreases exponentially with the number of
was the highest dimensions.
among our 8. K-means has problems when clusters are of
experiments on differing
average. k-VNN a. Sizes
outperformed k-NN, b. Densities
but it was dominated c. Non-globular shapes
by SVM with 9. Problems with outliers
polynomial kernel.
10. K-means is slow and scales poorly with respect to
the time it takes for large number of points.

International Journal of Scientific Research in Science, Engineering and Technology (ijsrset.com)


350
IV. Solution Methodology Apply KNN Regression
Prediction Model
As we have discussed in problem identification section
to overcome the drawback of k-means clustering
algorithm we will use K-nearest neighbours -
Regression algorithm accordingly our proposed scheme Rainfall Forecasting
layout as shown in fig-2.

In our proposed algorithm k-nn classification and Fig. 2 Proposed Scheme Layout
regression is integrated to overcome the bottle neck if
exsiting k-nn algorithm. Proposed Algorithm
Regression (featTrain classTrain, featTest, classTest,
Further we have also compared the performance of featName, classifier)
earlier prediction algorithm with our proposed /*featTrain- A NUMERIC matrix of training features (N
algorithm. x M)
classTrain- A NUMERIC vector representing the values
Algorithm of the dependent variable of the training data (N x 1)
featTest- A NUMERIC matrix of testing features (Nts x
Algorithm: K-Medoid M)
classTest- A NUMERIC vector representing the values
1. Initialize: randomly select(without of the dependent variable of the testing data (Nts x 1)
replacement) k of the n data points as the featName- The CELL vector of string representing the
medoids label of each features, (1 x M) cell*/
2. Associate each data point to the closest medoid. //classifier as KNN Regression
3. While the cost of the configuration decreases: NNBestFeat = floor(Datapoints()/10) //nearest neighbor
1. For each medoid m, for each non-
medoid data point o: trainModel=KNN Regression model
1. Swap m and o, recompute the NNSearch=Initialize earch function for KNNReg as
cost (sum of distances of linearsearch
points to their medoid) //Set the distance measure for NNSearch
distFunc = Euclidean distance (or similarity) function
2. If the total cost of the
trainModel.setNearestNeighbourSearchAlgorithm
configuration increased in the
(NNSearch)
previous step, undo the swap
trainModel.setKNN(NNBestFeat)

K-nn linear regression fits the best line between the


Rainfall neighbors. A linear regression problem has to be solved
Data for each query (least squares regression).

K-Medoid Clustering

Apply KNN Linear Search


Classifier

Fig. 3 k-nn Regression

International Journal of Scientific Research in Science, Engineering and Technology (ijsrset.com)


351
V. Result and Discussion Table-1 Meteorological Data
S. Methodology TPR(True
For implementation of our proposed algorithm we have No. Positive Rate)
used Matlab 2015b. We have used rainfall dataset from 1. Earlier 85%
Department of Agricultural Meteorology Indira Gandhi 2. Proposed 92%
Agricultural University, Raipur Station: Labhandi
Monthly Meteorological Data: 2015. Example as VI. CONCLUSION
follows:
Forecast is merely a prediction about the future values
Min. Relative Wind of data. After experimental evaluation we came into
Rainf
Max. Temp Humidi Veloci conclusion that our proposed algorithm produces TPR
all
Mont . ty ty as 92% henceforth proposed algorithm having accuracy
h Tem (Kmp is better.
(°C) (mm) (%)
p. h)
(°C) In future, we can apply proposed algorithm to image
I II dataset, to increase the prediction accuracy, some other
Jan. 26.5 11.4 9.4 91 37 2.8 prediction model or deep learning can be applied.
Feb. 30.9 14.4 2.2 85 33 2.9
Mar 33.6 19.1 19.3 75 34 3.8
Apr. 37.3 23.1 51.4 73 35 7.2 VII. REFERENCES
May 41.9 27.4 13.4 58 28 7.1
Jun. 36 26 271.6 80 54 8.4 [1]. Shubhendu Trivedi e. al. The Utility of Clustering
Jul. 31.9 25.3 173.2 87 71 8.4 in Prediction Tasks Centre for Mathematics and
Aug. 31.3 25.2 267.4 91 73 7 Cognition gran 2011
Sep. 32.2 25.2 219.6 93 66 4.4 [2]. Hakan Tongal et. al. Phase-space reconstruction
Oct. 33.3 22.3 0 91 47 2.7 and self-exciting threshold modeling approach to
Nov. 31.3 17.2 0 89 37 2.8 forecast lake water levels Springer-Verlag Berlin
Dec. 29.1 14.9 13.8 85 39 2.6 Heidelberg 2013
[3]. Andrew Kusiak et. al./ Modeling and Prediction
1041.
Total of Rainfall Using Radar Reflectivity Data: A
3
Data-Mining Approach/ IEEE 2013
Avera
32.9 21 83 46 5 [4]. Pinky Saikia Dutta Et. Al. / Prediction Of Rainfall
ge
Using Datamining Technique Over Assam/ IJCSE
2014
[5]. M.Kannan et. al./Rainfall Forecasting Using Data
Mining Technique/ IJET 2010
[6]. Ravinesh C. Deo et. al./ Application of the
extreme learning machine algorithm for the
prediction of monthly Effective Drought Index in
eastern Australia/ Elsevier 2014
[7]. Jae-Hyun Seo et. al./Feature Selection for Very
Short-Term Heavy Rainfall Prediction Using
Evolutionary Computation/ Hindawi 2013
[8]. Meghali A.Kalyankar,Prof. S.J.Alaspurkar.Data
Mining Technique to analyse Meterological
Data.IEEE Paper.
[9]. E. H. Habib, E. A. Meselhe, and A. V. Aduvala,
“Effect of local errors of tipping-bucket rain
Fig.4- Main GUI of Proposed System
gauges on rainfall-runoff simulations,” J. Hydrol.
Eng., vol. 13, no. 6, pp. 488-496, Jun. 2008.
International Journal of Scientific Research in Science, Engineering and Technology (ijsrset.com)
352

You might also like