0% found this document useful (0 votes)
2 views

Lecture9 ML Introduction.pptx

The document provides an overview of Machine Learning (ML), a subset of Artificial Intelligence (AI), detailing its definitions, types (supervised, unsupervised, reinforcement), and applications. It emphasizes the importance of data transformation and preparation in ML projects, outlining techniques such as data cleaning, normalization, and feature extraction. Additionally, it discusses the lifecycle of ML, including data splitting and dataset selection criteria.

Uploaded by

Alaaeee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture9 ML Introduction.pptx

The document provides an overview of Machine Learning (ML), a subset of Artificial Intelligence (AI), detailing its definitions, types (supervised, unsupervised, reinforcement), and applications. It emphasizes the importance of data transformation and preparation in ML projects, outlining techniques such as data cleaning, normalization, and feature extraction. Additionally, it discusses the lifecycle of ML, including data splitting and dataset selection criteria.

Uploaded by

Alaaeee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Artificial Intelligence (AI):

Machine Learning
Machine learning
🡪 Machine learning is a subset of AI and focuses on the
ability of machines to receive a set of data and learn for
themselves, changing algorithms as they learn more about
the information they are processing.
Machine Learning definition
🡪 Arthur Samuel (1959). Machine Learning: Field
of study that gives computers the ability to
learn without being explicitly programmed.
Machine Learning definition
🡪 Tom Mitchell (1998) Well-posed Learning
Problem: A computer program is said to learn
from experience E with respect to some
task T and some performance measure P, if
its performance on T, as measured by P,
improves with experience E.
“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”

Suppose your email program watches which emails you do or do


not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
Classifying emails as spam or not spam.
Watching you label emails as spam or not spam.

The number (or fraction) of emails correctly classified as spam/not spam.


“A computer program is said to learn from experience E with respect to
some task T and some performance measure P, if its performance on T,
as measured by P, improves with experience E.”

Suppose your email program watches which emails you do or do


not mark as spam, and based on that learns how to better filter
spam. What is the task T in this setting?
Classifying emails as spam or not spam. 🡪 (T)
Watching you label emails as spam or not spam. 🡪 (E)

The number (or fraction) of emails correctly classified as spam/not spam.


🡪 (P)
Machine Learning Applications
🡪 Examples:
🡪 Database mining, Email filtering, Web search engin
🡪 handwriting recognition, most of Natural Language
Processing (NLP), Computer Vision.
🡪 Self-customizing programs
🡪 E.g., Amazon, Netflix product recommendations
Machine learning algorithms:

🡪 Supervised learning
🡪 Unsupervised learning
🡪 Reinforcement learning,

8
Supervised learning
🡪 Supervised learning algorithm learns from labeled
training data (It means some data is already tagged with
the correct answer), helps you to predict outcomes for
unforeseen data.
🡪 Machines are fed with data such as characteristics,
patterns, dimensions, color and height of objects, people
or situations repetitively until the machines are able to
perform accurate output-prediction or classifications.
Supervised Learning
🡪 1- Regression
🡪 Predict continues valued output.
● Example: weather prediction, Predicting house prices
Supervised Learning
🡪 2-Classification
🡪 Estimate discrete valued output
🡪 Examples would you address using classification
🡪 Given email labeled as spam/not spam, learn a spam filter.
🡪 Given a dataset of patients diagnosed as either having diabetes or not,
learn to classify new patients as having diabetes or not.
Supervised learning: classification

Example2: Cancer Diagnosis (malignant, benign)


🡪 One feature or variable (Tumor size)

1(Y)

Malignant?

0(N)
Tumor Size
Supervised learning

Example2: Cancer Diagnosis (malignant, benign)


🡪 One feature or variable (Tumor size)

Othe features
1(Y) - Clump Thickness
-Uniformity of Cell Size
Malignant?
-Uniformity of Cell Shape
0(N) …
Tumor Size
Supervised learning
🡪 Cancer Diagnosis
🡪 It is a classification problem
🡪 Discrete valued output (0 or 1) two classes
🡪 The output can be more the two options or classes
◻ 0 🡪 benign
◻ 1 🡪 Type 1 Cancer
◻ 2 🡪 Type 2 Cancer
◻ 3 🡪 Type 3 Cancer
Example
🡪 You’re running a company, and you want to develop learning algorithms
to address each of the following problems.
🡪 Should you treat these as classification or as regression problems?
🡪 Problem 1:You have a large inventory of identical items. You want to
predict how many of these items will sell over the next 3 months.

🡪 Problem 2:You’d like software to examine individual customer


accounts, and for each account decide if it has been
hacked/compromised.
🡪
Example
🡪 You’re running a company, and you want to develop learning algorithms
to address each of the following problems.
🡪 Should you treat these as classification or as regression problems?
🡪 Problem 1:You have a large inventory of identical items. You want to
predict how many of these items will sell over the next 3 months.
🡪 (Regression)
🡪 Problem 2:You’d like software to examine individual customer
accounts, and for each account decide if it has been
hacked/compromised.
🡪 (classification)
Unsupervised learning
🡪 Unsupervised learning is modeling the underlying or
hidden structure or distribution in the data in order to
learn more about the data.
🡪 Unsupervised learning is where you only have input data
and no corresponding output variables (unlabelled
data.
🡪 you need to allow the model to work on its own to
discover information.
Unsupervised learning
🡪 Examples:
🡪 Given a database of customer data, automatically discover
market segments and group customers into different market
segments.
🡪 Given a set of news articles found on the web, group them
into set of articles about the same story.
Organize computing clusters Social network analysis

Market segmentation
Supervised vs. Unsupervised machine
learning
Reinforcement Learning
● (RL) involves training an agent to make a sequence of
decisions by interacting with an environment.
● The agent learns to achieve a goal by maximizing
cumulative rewards through trial and error.
● Applications:
− Training a robot to navigate a maze.
− Developing a game-playing AI (e.g., AlphaGo).
− Optimizing strategies in dynamic pricing.
● Hybrid Approaches

● Many real-world applications involve a combination of


supervised and unsupervised learning:
● Semi-Supervised Learning: Uses a small amount of
labeled data with a large amount of unlabeled data to
improve learning accuracy.
● Reinforcement Learning: Often uses supervised
learning for policy learning but also explores the
environment in an unsupervised manner.
ML lifeCycle
Data
🡪 Data is the heart of every machine Learning Algorithm
🡪 Data comes in all shapes and sizes: from images to text
to time series data.
🡪 A simple Excel spreadsheet might have data in a few
columns, while a more complex BigQuery dataset could
have millions of rows and thousands of columns.
🡪 No matter the format, though, all data has to be
transformed before it can be used in a machine learning
(ML) project..
Data transformation
🡪 Data transformation is also known as data preparation
or data preprocessing.
🡪 It makes sure that your data is clean and ready to be
used by your machine learning algorithm. Without data
transformation, your AI won’t be able to make accurate
predictions.
Data transformation
🡪 There are many different types of data transformation,
depending on what kind of data you have and what you
want to do with it. Some common types include:
🡪 Data cleaning ( remove irrelevant data)
🡪 Feature extraction
🡪 Feature creation
🡪 Data normalization
🡪 Data aggregation/disaggregation
1-Data cleaning
● removing incorrect or incomplete information from your dataset
● adding or fixing missing values,
● dealing with outliers or extreme values.
● Each Data column should be in proper format e.g Date
● it’s often the most time-consuming.
− These errors can happen for a number of reasons, including
● human error, software bugs, or simply because data is missing in
the original source.


Feature extraction
- It is the process of reducing a large amount of
information down to a smaller set of more useful
variables (data reduction)
It’s used to make working with data easier and to
improve the accuracy of predictions.
Apply techniques like Principal Component Analysis
(PCA) or t-SNE to reduce the number of features while
retaining important information.
Feature Creation (Feature Engineer)
● Feature creation is the process of adding extra
information to your dataset where none existed
previously.
● Feature creation is a type of data augmentation, and
it’s a common technique in machine learning. It’s used to
make use of data that would otherwise be ignored, and it
can improve the accuracy of predictions.
Feature Creation
● For instance, you might have a dataset of photos, but
there’s no information about when each photo was taken.
● Feature creation can be used to add this information to
the dataset. This might be done by looking at the EXIF
data of each photo (this is the data that’s automatically
added by the camera when a photo is taken), or by
cross-referencing with scraped web data.
Data normalization
● Data normalization is the process of making sure all
values in your dataset are on the same scale. It’s a
common data transformation technique, and it’s often
used when working with numerical data.
● For instance, you might have a dataset with values that are
measured in inches and values that are measured in
centimeters.
● Another column might have a metric that ranges from 0
to 100, and another column might have a metric that
ranges from 0 to 1.
Examples of Data normalization Techniques

1. Min-Max normalization: This technique scales the values of a feature to a


range between 0 and 1. This is done by subtracting the minimum value of
the feature from each value, and then dividing by the range of the feature.
2. Z-score normalization: This technique scales the values of a feature to
have a mean of 0 and a standard deviation of 1. This is done by subtracting
the mean of the feature from each value, and then dividing by the standard
deviation.
Data aggregation
● It is the process of combining multiple datasets into one. It’s a common

data transformation technique, and it’s often used when working with data

from different sources.

● For instance, you might have data from two different surveys, each with

different questions. Data aggregation can be used to combine the two

datasets into one. This way, you can analyze the data from both surveys

together.
Data disaggregation
● Data disaggregation is the opposite of data aggregation. It’s the

process of splitting one large dataset into several smaller ones.

● For instance, you might have data that’s been aggregated by

country. Data disaggregation can be used to split this dataset

into smaller datasets, one for each country. This way, you can

analyze the data for each country separately.


● Data Augmentation
○ Increase the diversity of the training data without collecting
new data.
■ Common in image processing (e.g., rotating, flipping,
cropping images).
Data Splitting

● Split data into training, validation, and test sets to evaluate model
performance properly.
● Common splits are 70-20-10 or 80-10-10 for training, validation,
and testing, respectively.
● why
- Prevent Overfitting:
- Model Evaluation:
● A separate test set provides an unbiased evaluation
metric to understand
Online Dataset

● Kaggle website
● UCI Machine Learning Repository
● Google Dataset Search
● Data.gov
● AWS Public Datasets
Education Data

● National Center for Education Statistics (NCES): The primary federal


entity for collecting and analyzing education-related data in the U.S. NCES
● EdX and Coursera Datasets: Both platforms provide datasets for
educational purposes and research.
Choosing Dataset
When selecting a dataset, consider the following factors:

● Relevance: Ensure the dataset aligns with your research or project


goals.
● Quality: Check for completeness, accuracy, and the presence of
necessary metadata.
● Size: Consider whether the dataset size is manageable with your
available computational resources.
● Accessibility: Verify that the dataset is accessible and that you have
the necessary permissions to use it.
Python for Machine Learning (Key Libraries and Frameworks)
Deep Learning
Machine Learning
based on Machine Learning course on Coursera
by Andrew Ng
(20) Andrew Ng | LinkedIn
on Youtube
https://youtu.be/gb262LDH1So?si=oRXw28ir6vMGQMFX (original course)
https://www.youtube.com/watch?v=vStJoetOxJg&list=PLkDaE6sCZn6FNC6YRfRQ
c_FbeQrF8BwGI (Current Course)

You might also like