0% found this document useful (0 votes)
66 views

Time Series Analysis With Arima Model: Business Analytics Project (Group 5)

The document discusses using an ARIMA model for time series analysis and forecasting of rainfall data from Kerala, India. It describes the methodology used, which includes checking for stationarity, identifying the p and q values using ACF and PACF plots, estimating the best ARIMA model, and performing residual diagnostics. The main challenges were the team's lack of analytical experience and needing to learn time series concepts and tools. SPSS was used to process the data and develop an ARIMA model that adequately fit the data based on the autocorrelation of residuals being less than 1.25.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Time Series Analysis With Arima Model: Business Analytics Project (Group 5)

The document discusses using an ARIMA model for time series analysis and forecasting of rainfall data from Kerala, India. It describes the methodology used, which includes checking for stationarity, identifying the p and q values using ACF and PACF plots, estimating the best ARIMA model, and performing residual diagnostics. The main challenges were the team's lack of analytical experience and needing to learn time series concepts and tools. SPSS was used to process the data and develop an ARIMA model that adequately fit the data based on the autocorrelation of residuals being less than 1.25.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

TIME SERIES ANALYSIS WITH

ARIMA MODEL
Business Analytics Project (Group 5)
Table of Contents

Sr no Topic Page no
1 Introduction 2
2 Methodology 2
3 Theory 3
4 Challenges 3
5 Process 4
6 Conclusion 11
7 Reference 11

1|Page
Introduction:
Today, Analytics has taken the entire world over. It has been extensively used in
every Industry to identify its weak points and extract the maximum out of the company’s
available resources. The margin for error has reduced drastically and thus data analytics has
become an integral part of any Industry.
Time series analysis is one of the concept under the big umbrella of data analytics
used for analysing time series data in order to extract meaningful statistics and other
characteristics of the data. Time series forecasting is the use of a model to predict future
values based on previously observed values. Whether we wish to predict the trend in
financial markets or electricity consumption, time is an important factor that must now be
considered in our models. For example, it would be interesting to not only know that a stock
will move up in price, but also when it will move up.

Data Set:
The dataset used in the modelling is rainfall data in Kerala from Kaggle.
Link: https://www.kaggle.com/rajanand/rainfall-in-india
This dataset is released by Indian Meteorological Department (IMD) Govt. of India
under Govt. Open source license- India. It contains monthly rainfall data from 36
meteorological subdivisions in India. The time period is from 1901 to 2015 and the unit of
measurement of rainfall is mm.

Tool Used: SPSS


SPSS (Statistical Package for the Social Sciences) was first launched in 1968. It was
acquired by IBM in 2009 and thereafter it is officially known as IBM SPSS Statistics. SPSS is
a software used for analysing all types of data. It can open all file formats commonly used for
structured data like MS Excel, text file (.txt .csv) , SQL, Stata and SAS etc.

Methodology:
The methodology used for estimating and forecasting the univariate series of rainfall
data is B-J method (Box-Jenkins). The model adopted for the case is ARIMA (Auto
Regressive Integrated Moving Average). The flow of the entire process is as follows:
1. Check for White noise
2. Check if the data series is stationary, if not, use differential method to transform it into
stationary data series
3. Identification of p and q values for Autoregressive and Moving average model using
the Autocorrelation function (ACF) and Partial Autocorrelation function (PACF).
4. Estimation of appropriate model
5. Performing residual Diagnostics

2|Page
Theory:
Trend and prediction of time series can be computed by using ARIMA model. ARIMA
(p,d,q) model is a complex linear model. This acronym is descriptive, capturing the key
aspects of the model itself. Briefly, they are:
AR: Auto-regression- A model that uses the dependent relationship between an observation
and some number of lagged observations.
I: Integrated- The use of differencing of raw observations (i.e. subtracting an observation
from an observation at the previous time step) in order to make the time series stationary.
MA: Moving Average- A model that uses the dependency between an observation
and residual errors from a moving average model applied to lagged observations.
Each of these components are explicitly specified in the model as a parameter.
A standard notation is used of ARIMA (p,d,q) where the parameters are substituted with
integer values to quickly indicate the specific ARIMA model being used.
The parameters of the ARIMA model are defined as follows:
p: The number of lag observations included in the model, also called the lag order.
d: The number of times that the raw observations are differenced, also called the degree of
differencing.
q: The size of the moving average window, also called the order of moving average.

Challenges:
The biggest challenge that the team faced was that none of the members had
any analytical background nor an experience with any of the analytical tools. Thus
the team had to start right from scratch. We divided the entire project into 3 phases:
The first phase was to understand the concept of Time-series analysis. We referred
to various research papers, online tutorials to understand time series analysis.
The second phase included understanding different tools and finalising one. After
exploring R, SPSS, Tableau and Python, the team decided to use SPSS based on
ease of use, its familiarity with excel and convenience.
The third phase was execution where the dataset was processed using ARIMA in
SPSS and model was developed.

3|Page
A. Diagnosis:

The idea of diagnostic checking is to look for evidence that the model is not a
good fit for the data. The tool used to check in this case is Residual error. A
review of the distribution of errors can help tease out bias in the model. The
errors from an ideal model would resemble white noise that is a Gaussian
distribution with a mean of zero and a symmetrical variance.

The diagonal residue is found out by Autocorrelation function of the ‘Error’


column in the data set that was obtained during processing of the data set.
The resultant is some tables and graphs. The autocorrelation table as shown
below is of importance.

Autocorrelations

Series: Error for JAN from ARIMA, MOD_1, CON

Box-Ljung Statistic

Lag Autocorrelation Std. Errora Value df Sig.b

1 -.076 .094 .670 1 .413

2 -.160 .094 3.684 2 .159

3 -.276 .097 12.764 3 .005

4 -.020 .103 12.814 4 .012

5 .113 .103 14.368 5 .013

6 -.072 .104 15.006 6 .020

7 -.099 .105 16.226 7 .023

8 -.030 .106 16.336 8 .038

9 .249 .106 24.170 9 .004

10 -.080 .111 24.991 10 .005

11 -.036 .111 25.160 11 .009

12 -.153 .111 28.211 12 .005

13 .081 .113 29.067 13 .006

14 .123 .114 31.073 14 .005

15 .059 .115 31.538 15 .007

16 -.077 .115 32.332 16 .009

4|Page
To check whether the model is ok and does not contain any noise, we perform
Autocorrelation/Std error value for lag 1. If the value is less than 1.25, then the
selected model is ok.

In our case the value is 0.076/0.094 = 0.8085. Thus, the value is less than 1.25
and thus the model estimated is a correct one. The ACN and PACF graphs
obtained are

5|Page
Conclusion:
By working on this project, the team studied the basics of Time series
analysis with ARIMA model. Certain pre-requisites, rules and checks that needs
to be performed before developing model for the data series were understood.
The project was performed on SPSS platform. The Data, Output and Syntax
windows were explored for the activity.

Reference:
 https://people.duke.edu/~rnau/411arim3.htm
 https://www.youtube.com/watch?v=_7jivAiwZGw
 https://www.youtube.com/watch?v=erlRfau80PM
 https://www.youtube.com/watch?v=NoAUprEguoY

6|Page
 https://ncss-wpengine.netdna-ssl.com/wp-
content/themes/ncss/pdf/Procedures/NCSS/The_Box-Jenkins_Method.pdf
 https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/
 https://arxiv.org/ftp/arxiv/papers/1302/1302.6613.pdf

7|Page

You might also like