Lecture9 ML Introduction.pptx
Lecture9 ML Introduction.pptx
Machine Learning
Machine learning
🡪 Machine learning is a subset of AI and focuses on the
ability of machines to receive a set of data and learn for
themselves, changing algorithms as they learn more about
the information they are processing.
Machine Learning definition
🡪 Arthur Samuel (1959). Machine Learning: Field
of study that gives computers the ability to
learn without being explicitly programmed.
Machine Learning definition
🡪 Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some
task T and some performance measure P, if
its performance on T, as measured by P,
improves with experience E.
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”
🡪 Supervised learning
🡪 Unsupervised learning
🡪 Reinforcement learning,
8
Supervised learning
🡪 Supervised learning algorithm learns from labeled
training data (It means some data is already tagged with
the correct answer), helps you to predict outcomes for
unforeseen data.
🡪 Machines are fed with data such as characteristics,
patterns, dimensions, color and height of objects, people
or situations repetitively until the machines are able to
perform accurate output-prediction or classifications.
Supervised Learning
🡪 1- Regression
🡪 Predict continues valued output.
● Example: weather prediction, Predicting house prices
Supervised Learning
🡪 2-Classification
🡪 Estimate discrete valued output
🡪 Examples would you address using classification
🡪 Given email labeled as spam/not spam, learn a spam filter.
🡪 Given a dataset of patients diagnosed as either having diabetes or not,
learn to classify new patients as having diabetes or not.
Supervised learning: classification
1(Y)
Malignant?
0(N)
Tumor Size
Supervised learning
Othe features
1(Y) - Clump Thickness
-Uniformity of Cell Size
Malignant?
-Uniformity of Cell Shape
0(N) …
Tumor Size
Supervised learning
🡪 Cancer Diagnosis
🡪 It is a classification problem
🡪 Discrete valued output (0 or 1) two classes
🡪 The output can be more the two options or classes
◻ 0 🡪 benign
◻ 1 🡪 Type 1 Cancer
◻ 2 🡪 Type 2 Cancer
◻ 3 🡪 Type 3 Cancer
Example
🡪 You’re running a company, and you want to develop learning algorithms
to address each of the following problems.
🡪 Should you treat these as classification or as regression problems?
🡪 Problem 1:You have a large inventory of identical items. You want to
predict how many of these items will sell over the next 3 months.
Market segmentation
Supervised vs. Unsupervised machine
learning
Reinforcement Learning
● (RL) involves training an agent to make a sequence of
decisions by interacting with an environment.
● The agent learns to achieve a goal by maximizing
cumulative rewards through trial and error.
● Applications:
− Training a robot to navigate a maze.
− Developing a game-playing AI (e.g., AlphaGo).
− Optimizing strategies in dynamic pricing.
● Hybrid Approaches
●
Feature extraction
- It is the process of reducing a large amount of
information down to a smaller set of more useful
variables (data reduction)
It’s used to make working with data easier and to
improve the accuracy of predictions.
Apply techniques like Principal Component Analysis
(PCA) or t-SNE to reduce the number of features while
retaining important information.
Feature Creation (Feature Engineer)
● Feature creation is the process of adding extra
information to your dataset where none existed
previously.
● Feature creation is a type of data augmentation, and
it’s a common technique in machine learning. It’s used to
make use of data that would otherwise be ignored, and it
can improve the accuracy of predictions.
Feature Creation
● For instance, you might have a dataset of photos, but
there’s no information about when each photo was taken.
● Feature creation can be used to add this information to
the dataset. This might be done by looking at the EXIF
data of each photo (this is the data that’s automatically
added by the camera when a photo is taken), or by
cross-referencing with scraped web data.
Data normalization
● Data normalization is the process of making sure all
values in your dataset are on the same scale. It’s a
common data transformation technique, and it’s often
used when working with numerical data.
● For instance, you might have a dataset with values that are
measured in inches and values that are measured in
centimeters.
● Another column might have a metric that ranges from 0
to 100, and another column might have a metric that
ranges from 0 to 1.
Examples of Data normalization Techniques
data transformation technique, and it’s often used when working with data
● For instance, you might have data from two different surveys, each with
datasets into one. This way, you can analyze the data from both surveys
together.
Data disaggregation
● Data disaggregation is the opposite of data aggregation. It’s the
into smaller datasets, one for each country. This way, you can
● Split data into training, validation, and test sets to evaluate model
performance properly.
● Common splits are 70-20-10 or 80-10-10 for training, validation,
and testing, respectively.
● why
- Prevent Overfitting:
- Model Evaluation:
● A separate test set provides an unbiased evaluation
metric to understand
Online Dataset
● Kaggle website
● UCI Machine Learning Repository
● Google Dataset Search
● Data.gov
● AWS Public Datasets
Education Data