0% found this document useful (0 votes)
396 views

Introduction To Data Mining: Business Analytics, 1e

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
396 views

Introduction To Data Mining: Business Analytics, 1e

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

8

Introduction to Data
Mining

Business Analytics,1e
By Sanjiv Jaggia, Alison Kelly, Kevin Lertwachara, and Leida Chen

Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior
8/26/2020 written consent of McGraw-Hill Education.
8-1
Chapter 8 Learning Objectives (LOs)
LO 8.1 Describe the data mining process.
LO 8.4 Conduct principal component analysis.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-2


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-2
Introductory Case: Social Media Marketing
• Alissa Bridges is the marketing director of FashionTech. Alissa has hired a
social media marketing firm, MarketWiz, to develop predictive models that
would help FashionTech acquire new customers as well as increase sales from
existing customers.
• Using FashionTech’s historical social media marketing and sales data,
MarketWiz develops two types of predictive models.
– A classification model that predicts potential customers’ purchase probability from FashionTech
within 30 days of receiving a promotional message in their social media account.
– Two prediction models that predict the one-year purchase amounts of customers acquired
through social media channels.
• In order to assess the performance of the predictive models, Alissa’s team
would like to use the validation data set to:
– Evaluate how accurately the classification model classifies potential customers into the purchase
and no-purchase classes.
– Compare the performance of prediction models to estimate the one-year purchase amounts of
customers acquired through social media channels.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-3


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-3
8.1: Data Mining Overview (1/8)
• Applications of computer software used to obtain insightful solutions that
traditional data analysis techniques may not be able to achieve.
• Artificial Intelligence (AI)
– Demonstrate human-like intelligence and cognitive functions
– Deduction, pattern recognition, and interpretation of complex data
– Examples: Deep Blue playing chess, Watson
• Machine Learning
– Application of AI that allows the computer to learn automatically
– Uncover hidden patterns and relationships
– Use self-learning algorithms to evaluate results and improve performance over time
– Example: predict rider demand to strategically dispatch drivers for Uber
• Data Mining
– Process of applying a set of analytical techniques for machine learning and AI
– Uncover hidden patterns and relationships in data
– Data segmentation, pattern recognition, classification, prediction
– Example: group customers into segments for customized promotions
• These are often grouped together or used interchangeably.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-4


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-4
8.1: Data Mining Overview (2/8)
• Data mining is a complex process of examining large
sets of data for identifying patterns and then using them
for valuable business insights.
• Established standards with defined steps
– Cross-Industry Standard Process for Data Mining (CRISP-
DM): SPSS, TeraData, Daimler AG, NCR, and OHRA
– Sample, Explore, Modify, Model, and Assess (SEMMA): SAS
– CRISP-DM found to be more popular
– Not every step needed in every application
– Data preparation and modification are the most important
– 80% of time related to data
• It is always important to fully understand the
socioeconomic climate, business goals, and underlying
issues.
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-5
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-5
8.1: Data Mining Overview (3/8)
CRISP-DM

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-6


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-6
8.1: Data Mining Overview (4/8)
• Business understanding: situational context, specific objectives,
project schedule, deliverables
• Data understanding: collecting raw data, preliminary results,
potential hypotheses
• Data preparation: record and variable selection, wrangling,
cleaning
• Modeling: selection and execution of data mining techniques,
convert or transform data to formats/types needed for certain
analyses, document assumptions, cross-validation
• Evaluation: evaluate performance of competing models, select
best models, review and interpret results, develop
recommendations
• Deployment: develop a set of actionable insights and a strategy
for deployment/monitoring/feedback
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-7
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-7
8.1: Data Mining Overview (5/8)
SEMMA

• Sample: identify appropriate variables, merging and/or


dividing, sample
• Explore: exploratory data analysis
• Modify: variables are selected, created, and/or
transformed
• Model: analysis techniques and models are chosen and
applied
• Assess: results from different models are presented to end
users, compare outcomes and performance, new
observations are scored

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-8


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-8
8.1: Data Mining Overview (6/8)
• Data mining algorithms are classified into two types of
techniques depending on the way they learn about data
• Supervised
– Target is known/exists
– Regression, k-Nearest Neighbors, naïve Bayes, Decision Trees
– Classification model: target is categorical, objective is to predict
class
– Prediction model: target is numeric, objective is to predict the target
for a new case
• Unsupervised
– Target is not known
– Principal components, clustering
– Dimension reduction: convert high-dimensional data to lower
– Pattern recognition: recognize patterns using machine learning

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-9


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-9
8.1: Data Mining Overview (7/8)
• In supervised data mining, the target (response) is known.
• Historical values of the target variable exist, can examine
the impact of the predictor variables on the target.
• Common algorithms are based on statistical techniques.
• Example: linear and logistic regression
– Predict the target (response) based on several predictor variables
– Data are denoted as 𝑦, 𝑥1 , 𝑥2 , … , 𝑥𝑘
– Trained or supervised because of the known target
• Classification example: classify stock buy, hold, or sale
• Prediction example: spending of a customer

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-10


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-10
8.1: Data Mining Overview (8/8)
• In unsupervised data mining, the target (response) is not known.
• Allow the computer to identify complex processes without guidance
• Used in exploratory data analysis and descriptive analytics
• Used prior to supervised learning
– Understand the data
– Formulate questions
– Summarize data
• Dimension reduction
– Convert a set of high-dimensional data (large number of variables) into data
with lesser dimensions without losing much of the information
– Reduce information redundancy
– Improve model stability
• Pattern recognition
– Recognize patterns using machine learning
– Recurring sequences, frequent combinations, recognizable features, common
characteristics

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-11


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-11
8.4: Principal Component Analysis (1)
• High-dimensional data
– Variables are redundant or highly correlated which causes model instability
• Principal Components Analysis (PCA) is a dimension reduction
technique used to reduce the number of variables without losing
important information.
– Transforms a large number of possibly correlated variables into a smaller
number of uncorrelated variables (principal components).
– Summarize the data with less variables that collectively explain most of the
variation in the data.
– Use the principal components in other unsupervised learning methods such
as clustering.
– Use the principal components in other supervised learning such as
regression (as predictors).

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-12


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-12
Use TAM
• Put all variables that you want to group
• How many components?
• Are they all on separate columns?
• Take a look at BI1. Where should it be
placed?

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-13


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-13
Use Health
• Put all variables that you want to group
• How many components?
• Are they all on separate columns?

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-14


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-14
Exploratory Factory Analysis
• Similar to PCA
• Input all variables
• How many components?
• Choose Maximum Likelihood for
Extraction
• Choose for Promax Rotation
• You can change number of factors (if you
already have an idea)
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-15
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-15
Confirmatory Factor Analysis
• You already have a set of scales given the
model.
• Create one factor at a time.
• Take a look at the final model.

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-16


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-16
Reliability Analysis
• For Likert Scale questions only
• Use TAM (compute for reliability for each
major variable)
• Use Online Gaming Study (Q1 to Q8)

BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-17


Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-17
Exercises
• Please use only continuous variables.

• Baseball Players (PCA, EFA, CFA)


• Internet Addiction (PCA, EFA, CFA)
• Happiness (PCA, EFA, CFA)

• For the CFA, use the results of PCA for


number of factors (just call them Factor 1, 2
et al)
BUSINESS ANALYTICS, 1e | Jaggia, Kelly, 8-18
Copyright © 2021 McGraw-Hill Education. AllLertwachara,
rights reserved.Chen
No reproduction or distribution without the prior
written consent of McGraw-Hill Education.
Copyright © 2021 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education. 8-18

You might also like