Time Series Analysis With Arima Model: Business Analytics Project (Group 5)
Time Series Analysis With Arima Model: Business Analytics Project (Group 5)
ARIMA MODEL
Business Analytics Project (Group 5)
Table of Contents
Sr no Topic Page no
1 Introduction 2
2 Methodology 2
3 Theory 3
4 Challenges 3
5 Process 4
6 Conclusion 11
7 Reference 11
1|Page
Introduction:
Today, Analytics has taken the entire world over. It has been extensively used in
every Industry to identify its weak points and extract the maximum out of the company’s
available resources. The margin for error has reduced drastically and thus data analytics has
become an integral part of any Industry.
Time series analysis is one of the concept under the big umbrella of data analytics
used for analysing time series data in order to extract meaningful statistics and other
characteristics of the data. Time series forecasting is the use of a model to predict future
values based on previously observed values. Whether we wish to predict the trend in
financial markets or electricity consumption, time is an important factor that must now be
considered in our models. For example, it would be interesting to not only know that a stock
will move up in price, but also when it will move up.
Data Set:
The dataset used in the modelling is rainfall data in Kerala from Kaggle.
Link: https://www.kaggle.com/rajanand/rainfall-in-india
This dataset is released by Indian Meteorological Department (IMD) Govt. of India
under Govt. Open source license- India. It contains monthly rainfall data from 36
meteorological subdivisions in India. The time period is from 1901 to 2015 and the unit of
measurement of rainfall is mm.
Methodology:
The methodology used for estimating and forecasting the univariate series of rainfall
data is B-J method (Box-Jenkins). The model adopted for the case is ARIMA (Auto
Regressive Integrated Moving Average). The flow of the entire process is as follows:
1. Check for White noise
2. Check if the data series is stationary, if not, use differential method to transform it into
stationary data series
3. Identification of p and q values for Autoregressive and Moving average model using
the Autocorrelation function (ACF) and Partial Autocorrelation function (PACF).
4. Estimation of appropriate model
5. Performing residual Diagnostics
2|Page
Theory:
Trend and prediction of time series can be computed by using ARIMA model. ARIMA
(p,d,q) model is a complex linear model. This acronym is descriptive, capturing the key
aspects of the model itself. Briefly, they are:
AR: Auto-regression- A model that uses the dependent relationship between an observation
and some number of lagged observations.
I: Integrated- The use of differencing of raw observations (i.e. subtracting an observation
from an observation at the previous time step) in order to make the time series stationary.
MA: Moving Average- A model that uses the dependency between an observation
and residual errors from a moving average model applied to lagged observations.
Each of these components are explicitly specified in the model as a parameter.
A standard notation is used of ARIMA (p,d,q) where the parameters are substituted with
integer values to quickly indicate the specific ARIMA model being used.
The parameters of the ARIMA model are defined as follows:
p: The number of lag observations included in the model, also called the lag order.
d: The number of times that the raw observations are differenced, also called the degree of
differencing.
q: The size of the moving average window, also called the order of moving average.
Challenges:
The biggest challenge that the team faced was that none of the members had
any analytical background nor an experience with any of the analytical tools. Thus
the team had to start right from scratch. We divided the entire project into 3 phases:
The first phase was to understand the concept of Time-series analysis. We referred
to various research papers, online tutorials to understand time series analysis.
The second phase included understanding different tools and finalising one. After
exploring R, SPSS, Tableau and Python, the team decided to use SPSS based on
ease of use, its familiarity with excel and convenience.
The third phase was execution where the dataset was processed using ARIMA in
SPSS and model was developed.
3|Page
A. Diagnosis:
The idea of diagnostic checking is to look for evidence that the model is not a
good fit for the data. The tool used to check in this case is Residual error. A
review of the distribution of errors can help tease out bias in the model. The
errors from an ideal model would resemble white noise that is a Gaussian
distribution with a mean of zero and a symmetrical variance.
Autocorrelations
Box-Ljung Statistic
4|Page
To check whether the model is ok and does not contain any noise, we perform
Autocorrelation/Std error value for lag 1. If the value is less than 1.25, then the
selected model is ok.
In our case the value is 0.076/0.094 = 0.8085. Thus, the value is less than 1.25
and thus the model estimated is a correct one. The ACN and PACF graphs
obtained are
5|Page
Conclusion:
By working on this project, the team studied the basics of Time series
analysis with ARIMA model. Certain pre-requisites, rules and checks that needs
to be performed before developing model for the data series were understood.
The project was performed on SPSS platform. The Data, Output and Syntax
windows were explored for the activity.
Reference:
https://people.duke.edu/~rnau/411arim3.htm
https://www.youtube.com/watch?v=_7jivAiwZGw
https://www.youtube.com/watch?v=erlRfau80PM
https://www.youtube.com/watch?v=NoAUprEguoY
6|Page
https://ncss-wpengine.netdna-ssl.com/wp-
content/themes/ncss/pdf/Procedures/NCSS/The_Box-Jenkins_Method.pdf
https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf
7|Page