0% found this document useful (0 votes)
4 views

Week 6 Practical

The document outlines a Week 6 in-class practical focused on k-Means clustering techniques using the Mall_Customers.csv dataset. Students will preprocess data, determine the optimal number of clusters through the Elbow Method and Davies-Bouldin Index, and interpret clustering results. The practical includes tasks such as data exploration, training the k-Means algorithm, visualizing clusters, and deriving business insights from the clustering analysis.

Uploaded by

tpnvi95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week 6 Practical

The document outlines a Week 6 in-class practical focused on k-Means clustering techniques using the Mall_Customers.csv dataset. Students will preprocess data, determine the optimal number of clusters through the Elbow Method and Davies-Bouldin Index, and interpret clustering results. The practical includes tasks such as data exploration, training the k-Means algorithm, visualizing clusters, and deriving business insights from the clustering analysis.

Uploaded by

tpnvi95
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Week 6 In-Class Practical

Objective:

This practical aims to introduce students to clustering techniques, particularly k-Means


clustering. Students will learn to preprocess data, determine the optimal number of clusters using
the Elbow Method and Davies-Bouldin Index (DBI), and interpret clustering results.

Dataset:

The dataset Mall_Customers.csv contains customer information, including:

• CustomerID
• Gender
• Age
• Annual Income (in $1000s)
• Spending Score (1-100)

Tasks:

Part 1: K-Means Clustering using the Elbow Method

1. Data Exploration and Preprocessing:


o Load the dataset and inspect its structure.
o Convert categorical columns (e.g., Gender) to factors.
o Rename columns for consistency.
o Select relevant features (Annual Income and Spending Score) and normalize
them.
2. Determine Optimal Clusters:
o Implement the Elbow Method by computing the Within-Cluster Sum of Squares
(WSS) for k = 1 to 10.
o Visualize the WSS values and identify the optimal number of clusters.
o Discussion Question: Based on the Elbow Method plot, what is the optimal k?
Justify your choice.
3. Train the K-Means Algorithm:
o Apply k-Means clustering using the chosen number of clusters.
o Extract cluster centroids and cluster assignments.
4. Visualize the Clusters:
o Use ggplot2 to create a scatter plot with clusters colored differently.
o Discussion Question: How do you interpret the centroids given that the data was
normalized?
5. Interpret the Clusters:
o Compute summary statistics for each cluster (mean Annual Income, Spending
Score, and gender distribution).
o Assign meaningful names to clusters (e.g., "Low Spenders - Low Income",
"Impulsive Buyers").

1
o Discussion Question: Based on your results, how would you use this information
for business decisions?

Part 2: K-Means Clustering using the Davies-Bouldin Index (DBI)

1. Compute DBI for Different k values:


o Implement k-Means clustering for k = 2 to 10.
o Compute the DBI for each k and identify the optimal number of clusters (lower
DBI is better).
o Discussion Question: How does DBI compare to the Elbow Method for
determining k?
2. Train K-Means with Optimal k (based on DBI):
o Apply k-Means clustering using the best k obtained from DBI.
o Visualize the clusters using Principal Component Analysis (PCA) for
dimensionality reduction.
3. Interpretation and Business Insights:
o Compute and analyze cluster summary statistics.
o Assign meaningful cluster names.
o Discussion Question: How does the clustering result using DBI compare to that
of the Elbow Method?

You might also like