Prediction_Of_Rainfall_Using_Machine_Lea
Prediction_Of_Rainfall_Using_Machine_Lea
Abstract: Rainfall prediction is important as heavy rainfall can lead to many disasters. The prediction helps people to take preventive measures and
moreover the prediction should be accurate. There are two types of prediction short term rainfall prediction and long term rainfall. Prediction mostly
short term prediction can gives us the accurate result. The main challenge is to build a model for long term rainfall prediction. Heavy precipitation
prediction could be a major drawback for earth science department because it is closely associated with the economy and lifetime of human. It’s a cause
for natural disasters like flood and drought that square measure encountered by individuals across the world each year. Accuracy of rainfall statement
has nice importance for countries like India whose economy is basically dependent on agriculture. The dynamic nature of atmosphere, applied
mathematics techniques fail to provide sensible accuracy for precipitation statement. The prediction of precipitation using machine learning techniques
may use regression. Intention of this project is to offer non-experts easy access to the techniques, approaches utilized in the sector of precipitation
prediction and provide a comparative study among the various machine learning techniques.
1. INTRODUCTION
Rainfall forecasting is very important because heavy and y = β0 + β1x + ε where β0 and β1 are parameters, and ε is
irregular rainfall can have many impacts like destruction of a probabilistic error term. Regression analysis is a vital tool for
crops and farms, damage of property so a better forecasting modeling and analyzing information. It is used for predictive
model is essential for an early warning that can minimize analysis that is forecasting of rainfall or weather, predicting
risks to life and property and also managing the agricultural trends in business, finance, and marketing. It can also be used
farms in better way. This prediction mainly helps farmers and for correcting errors and also provide quantitative support.
also water resources can be utilized efficiently. Rainfall
prediction is a challenging task and the results should be The advantages of regression analysis are:
accurate. There are many hardware devices for predicting 1. It is a powerful technique for testing relationship
rainfall by using the weather conditions like temperature, between one dependent variable and many
humidity, pressure. These traditional methods cannot work in independent variables.
an efficient way so by using machine learning techniques we 2. It allows researchers to control extraneous factors.
can produce accurate results. We can just do it by having the 3. Regression asses the cumulative effect of multiple
historical data analysis of rainfall and can predict the rainfall factors.
for future seasons. We can apply many techniques like 4. It also helps to attain the measure of error using the
classification, regression according to the requirements and regression line as a base for estimations.
also we can calculate the error between the actual and
prediction and also the accuracy. Different techniques 2 LITERATURE REVIEW
produce different accuracies so it is important to choose the Thirumalai, Chandrasegar, et al. [1] discusses the amount of
right algorithm and model it according to the requirements. rainfall in past years according to the crop seasons and
predicts the rainfall for future years. The crop seasons are
Regression analysis: Rabi, Kharif and Zaid. Linear regression method is applied
Regression analysis deals with the dependence of one for early prediction. Here, Rabi and kharif were taken as
variable (called as dependent variable) on one or more other variables if one variable was given then other can be
variables, (called as independent variables) which is useful for predicted using linear regression. Standard deviation and
estimating and/ or predicting the mean or average value of the Mean was also calculated for future prediction of crop
former in terms of known or fixed values of the latter. For seasons. This implementation will be used for farmers to
example, the salary of a person is based on his/her have an idea of which crop to harvest according to crop
experience here, the experience attribute is independent seasons. Geetha, A., and G. M. Nasira. [2] implements a
variable salary is dependent variable. Simple linear regression model which predicts the weather conditions like rainfall,
defines the relationship between a single dependent fog, thunderstorms and cyclones which will be helpful to the
variable and a single independent variable. The below people to take preventive measures. Data mining techniques
equation is the general form of regression. were used and a data mining tool named Rapid miner was
_______________________ used to model the decision trees. The data set of
Dr. Moulana Mohammed currently working as Associate Professor in Trivandrum with attributes like day, temperature, dew point,
Computer Science and Engineering in Koneru Lakshmaiah Education pressure etc. The dataset is divided into training set and
Foundation, India, E-mail: [email protected] testing set and decision tree algorithm is applied. The
Kolapalli Roshitha, Niharika Golla and Siva Sai Maturi are currently accuracy is calculated, actual and predicted values are
pursuing Bachelor degree program in Computer Science and compared. The accuracy is 80.67 and to achieve high value
Engineering in Koneru Lakshmaiah Education Foundation, India,
E-mail: [email protected] it can be extended by applying soft computing techniques
like fuzzy logic and genetic algorithms. Parmar, Aakash,
Kinjal Mistree, and Mithila Sompura [3] discusses the different
methods used for rainfall prediction for weather forecasting
3236
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616
with their limitations. Various neural networks algorithm which techniques. South Korea data from 2007 to 2012 was taken
are used for prediction are discussed with their steps in detail and performance is measured by some criteria and a
categorizes various approaches and algorithms used for confusion matrix was produced. The logistic regression with
rainfall prediction by various researchers in today’s era. feature selection and PCA was proposed. F-measure is
Finally, presents conclusion of paper. Done the background calculated for estimating the efficiency of model.
work about some models of machine learning ARIMA Model,
Artificial neural network and types like Back- Propagation 3 PROPOSED METHOD
Neural Network - Cascade Forward Back Propagation Network The predictive model is used to prediction of the precipitation.
Layer Recurrent Network, Self-Organizing Map and Support The first step is converting data in to the correct format to
Vector Machine, Collected, surveyed and table presents conduct experiments then make a good analysis of data and
categorization of different approaches of rainfall prediction. observe variation in the patterns of rainfall. We predict the
Dash, Yajnaseni, Saroj K. Mishra, and Bijaya K. Panigrahi [4] rainfall by separating the dataset into training set and testing
has used artificial intelligence techniques like Artificial Neural set then we apply different machine learning approaches
Network (ANN), Extreme Learning Machine (ELM), K nearest (MLR, SVR, etc.) and statistical techniques and compare and
neighbor (KNN) are applied for prediction of summer draw analysis over various approaches used. With the help of
monsoon and post monsoon rainfall. The dataset used is the numerous approaches we attempt to minimize the error.
time series data of Kerala from 1871 to 2016 taken from Dataset Description:
Indian Institute of Tropical Meteorology (IITM).The data is The dataset [10] consists of the measurement of rainfall from
pre-processed and normalization was performed on the data year 1901-2015 for each state.
next, the data is divided into training and testing the data up • Data consists of 19 attributes (individual months,
to 2010 was taken as training set and the data from 2011- annual, and combinations of 3 consecutive months)
2016 taken as test set. The above mentioned algorithms for 36 sub divisions.
were applied and its performance was calculated by using • The data is available only from 1950 to 2015 for
MAE, RMSE, and MASE. The ELM algorithm has given some of the subdivisions
accurate results compared to the others. Singh, Gurpreet, • The attributes are the amount of rainfall measured in
and Deepak Kumar[5] states that there are many machine mm.
learning algorithms applied for the prediction of rainfall and in As the dataset is very large, feature reduction is done so that it
this, they have used a hybrid approach that is combining two improves the accuracy, reduces the computation time and also
techniques, Random forest and Gradient boosting with many storage. Principal Component Analysis (PCA) is a technique of
machine learning techniques like ada boost, K-Nearest extracting necessary variables from a huge set of variables. It
Neighbor(KNN), Support vector machine(SVM), and Neural extracts low dimensional set with a motive to capture the
Network(NN).These have been applied on the rainfall data of maximum amount of information. With few variables,
North Carolina from 2007 – 2017 and also the performance is visualization becomes more significant. It is done by using
calculated by applying different metrics F-score, precision, covariance matrix and by obtaining Eigen values from it. In our
accuracy, recall. Finally, eight hybrid models have been dataset by using PCA it has reduced the attributes by
proposed and Gradient boosting-Ada boost has been the considering only the rainfall data of combination of three
superior which exhibited good results. Kar, Kaveri, Neelima consecutive months and annual data from every subdivision.
Thakur, and Prerika Sanghvi [6] has used the fuzzy logic Techniques used: Multiple Linear Regression:
approach for the prediction of rainfall on the data of Multiple regression tries to model the
temperature in a geographic location. The fuzzy model has connection between two or additional variables and a
been applied Due to other climatic factors the prediction is not response by fitting an equation to determined information.
accurate so they have considered other influencing factors like Clearly, it's nothing however an extension of straight
humidity also analyzed the advantages of fuzzy system over forward regression toward the mean. The general form of
other techniques. Sardeshpande, Kaushik D., and Vijaya R. multivariable linear regression model is: y=α+β1x1+ β2x2+…+
Thool [7] has used the artificial neural networks, back βkxk+ε where y = dependent variable and x1, x2… xk are
propagation (BPNN), radial basis function (RBFNN) and independent variables,α,β are coefficients. Multiple regression
generalized regression (GRNN) on the rainfall data of India will model additional complicated relationship that comes
mainly Nanded district, Maharashtra was considered and the from numerous options along they should to be employed
data is normalized between 0 to 1 and the algorithms are in cases wherever one explicit variable isn't evident enough to
applied and the performance of those was calculated and map the link between the independent and also the variable
compared. BPNN and RBFNN has given good results quantity.
compared to GRNN. Chen, Binghong, et al. [8] focuses on the
non-linear machine learning approaches like gradient boosting Support Vector Regression:
decision tree model and deep neural networks for a short term Support Vector regression machine learning and data science
prediction of rainfall and these algorithms were built on Alibaba with the term SVM or support vector machine but SVR that is
cloud and data was collected from different sites and support vector regression is a bit different from SVM that is
effectiveness is calculated by using classification metrics AUC, support vector machine as the name suggests that is
F1 score, precision and accuracy and by Regression metric integration algorithm so we can use SVR for working with
RMSE, correlation. It has been observed that DNN showed continuous value instead of classification which is SVM
better result than ECData. Moon, Seung-Hyun, et al [9] Support Vector Machines support linear and nonlinear
implements an early warning system (EWS) that produces a regression that we can refer to as Support Vector Regression.
signal when it reaches a threshold limit that givesWarning Instead of trying to fit the largest possible street between two
before 3 hrs. This was done by using machine learning classes while limiting margin violations, Support Vector
3237
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616
3238
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616
Fig 3. Line graph for distribution of rainfall from the year 1901-
2015.
The below bar graph shows the amount of rainfall for all
months in the subdivisions and it is observed that the
volume of rainfall is sensibly good in Eastern India in the
months of March, April, May.
The below plot is the line graph for the amount of rainfall over
the years and it is detected that there was a high volume of
rainfall in 1950s.
3239
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616
REFERENCES
[1] Thirumalai, Chandrasegar, et al. "Heuristic prediction of
rainfall using machine learning techniques." 2017
International Conference on Trends in Electronics and
Informatics (ICEI). IEEE, 2017.
[2] Geetha, A., and G. M. Nasira. "Data mining for
meteorological applications: Decision trees for modeling
rainfall prediction." 2014 IEEE International Conference on
Computational Intelligence and Computing Research.
Fig 5. Scatter plot between the predictions and testing set IEEE, 2014
[3] Parmar, Aakash, Kinjal Mistree, and Mithila Sompura.
Table 1 Comparative results "Machine learning techniques for rainfall prediction: A
review." 2017 International Conference on Innovations in
Prediction Model Mean Absolute Error R2 score information Embedded and Communication Systems.
Multiple Linear 10.95375724150944 0.995778395500872 2017.
Regression [4] Dash, Yajnaseni, Saroj K. Mishra, and Bijaya K. Panigrahi.
Support Vector 4.3506984199111525 0.995899174760731
"Rainfall prediction for the Kerala state of India using
Regression
Lasso Regression 11.716073498072355 0.995750795102249 artificial intelligence approaches." Computers & Electrical
Engineering 70 (2018): 66-73.
[5] Singh, Gurpreet, and Deepak Kumar. "Hybrid Prediction
Then, for each regression model the MAE and r2 score are Models for Rainfall Forecasting." 2019 9th International
calculated and compared and a graph is plotted. Conference on Cloud Computing, Data Science &
Engineering (Confluence). IEEE, 2019.
[6] Kar, Kaveri, Neelima Thakur, and Prerika Sanghvi.
"Prediction of Rainfall Using Fuzzy Dataset." (2019).
[7] Sardeshpande, Kaushik D., and Vijaya R. Thool. "Rainfall
Prediction: A Comparative Study of Neural Network
Architectures." Emerging Technologies in Data Mining and
Information Security. Springer, Singapore, 2019. 19-28.
[8] Chen, Binghong, et al. "Non-Linear Machine Learning
Approach to Short-Term Precipitation Forecasting."
(2018).
[9] Moon, Seung-Hyun, et al. "Application of machine learning
to an early warning system for very short-term heavy
rainfall.―Journal of hydrology 568 (2019): 1042-1054.
[10] https://data.gov.in/resources/subdivision-wise-rainfall-and-
its-departure-1901-2015
5 CONCLUSION
This project concentrated on estimation of rainfall and it is
estimated that SVR is a valuable and adaptable strategy,
helping the client to manage the impediments relating to
distributional properties of fundamental factors, geometry of
the information and the normal issue of model over fitting. The
decision of bit capacity is basic for SVR displaying. We
prescribe tenderfoots to utilize straight and RBF piece for
direct and non-straight relationship individually. We see that
SVR is better than MLR as an expectation strategy. MLR can't
catch the non-linearity in a data set and SVR winds up helpful
in such circumstances. We additionally process Mean
3240
IJSTR©2020
www.ijstr.org