0% found this document useful (0 votes)

178 views

Week 12 Intro to DS and ML

Uploaded by

laplluve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

178 views

Week 12 Intro to DS and ML

Uploaded by

laplluve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

INTRODUCTION TO DATA SCIENCE

AND MACHINE LEARNING

OUTLINES
➢Introduction
➢What Data can do for us?
➢What is the Data Science?
➢Main Data Science stages
➢What is Machine Learning (ML)?
➢Types of ML Algorithms
➢Measuring the performance of ML
➢Data science is not machine learning!
➢Job Description related to DS and ML
INTRODUCTION
Data is being collected all around us. Every like, click, email,
credit card swipe, or tweet is a new piece of data that can be
used to better describe the present or predict the future.
Types of Data:
❖Structured: Organized into rows and columns (e.g., databases,
spreadsheets).
❖Unstructured: Unorganized, often textual or multimedia (e.g.,
emails, videos).
❖Semi-structured: Falls between structured and unstructured
(e.g., JSON, XML).
INTRODUCTION
Big Data refers to extremely large datasets that are too complex or
voluminous for traditional data processing tools to handle efficiently. These
datasets can come from a variety of sources such as social media, sensors,
transactions, and more. Big Data typically involves three main characteristics,
often referred to as the "Three Vs":
1. Volume: The sheer amount of data being generated and stored.
2. Velocity: The speed at which data is generated, processed, and analyzed.
3. Variety: The different types and formats of data, including structured
(databases), semi-structured (XML, JSON), and unstructured (text, images,
videos).
Data
Explosion

The world generates 2.5 quintillion bytes of

data daily – enough to fill millions of DVDs!
WHAT DATA CAN DO
FOR US?
1. Describe our current state, like
our energy consumption.
2. Diagnose the causes of
observed events and behaviors.
3. Detect anomalous events, such
as fraudulent purchases.
4. Predict future events
WHAT IS THE DATA SCIENCE?
“At a high level, data science is a set of fundamental principles that guide the
extraction of knowledge from data.”
•Principles can be statistical, computational, algorithmic, visual, etc.
•DS a set of methodologies for taking in thousands of forms of data then using them to
draw meaningful conclusions.
•Data science is interdisciplinary, due to its goal to aid discoveries and decision making,
such as: Statistics and Mathematics, Computer Science, Domain Expertise, etc.
•Applicable to many domains (e.g., sciences, finance, healthcare, etc.)
WHY IS DATA SCIENCE IMPORTANT?
• Data Science unlocks potential of data in solving societal
challenges and large-scale complex problems across
domains, from business, technology, science, engineering,
healthcare, to government, and many more.
• As data continues to grow in volume, velocity and
complexity, there is a strong demand for data science
talents to help design the best solutions.
MODELLING PROCESS IN DATA SCIENCE

12/3/2024 9
MAIN DATA SCIENCE STAGES
Problem definition

Extermination
Data Data Exploratory/ , predication
collection preprocessing visualization and
Evaluation.

Insight / Policy Decisions

1. DATA COLLECTION
➢Gathering data from various sources, such as sensors, databases, APIs, or surveys. This is the first
step to begin the data science process.
➢There are two general types of data:
1. Quantitative data can be expressed in numbers. For example, the fridge is 60 inches tall, has two
apples in the basket, and costs 1000 dollars.
2. Qualitative data are things that can be observed but not measured. For example, the fridge is
red, was built in Italy, and might need to be cleaned out because it smells like fish.
➢Other than the traditional quantitative and qualitative data such as image data, text data,
geospatial data, network data, and many more.
➢To select the storage type, we need to determine: where we want to store the data, what kind of
data we are storing, and how we can retrieve our data from storage.
➢The storage could be a single computer, parallel storage, cloud-based, etc.
2. DATA PREPROCESSING
A. Data Cleaning C. Data Transformation
oHandle missing values (e.g., imputation, deletion). oNormalize or scale features to bring them into a
oCorrect errors (e.g., typos, outliers). uniform range (e.g., Min-Max Scaling, Z-score).
oStandardize formats (e.g., dates, currencies). oEncode categorical data (e.g., one-hot encoding,
oHandle outliers: Remove or adjust anomalous data label encoding).
points oCreate new features (feature engineering) to
oExample: Filling missing "age" with the median or enhance predictive power.
mean. oExample: Converting "Date of Birth" to "Age."

B. Data Integration D. Data Reduction

oCombine data from multiple sources into a single oRemove irrelevant or redundant features.
dataset. oReduce dimensionality using techniques like PCA.
oResolve inconsistencies between datasets. oExample: Dropping columns with low variance
oExample: Joining tables using a common key.
3. EXPLORATION / VISUALIZATION
Analyzing the data through visualizations, like graphs
and charts, to understand patterns and relationships
within the data. This step helps to reveal trends or
anomalies.
Exploratory Data Analysis (EDA) consists of exploring
the data and then formulating hypotheses about it, and
assessing its main characteristics, with a strong emphasis
on visualization
EDA happens after data preparation, but EDA can
reveal new things that need cleaning.
Histograms
Histograms: Show the distribution of a single variable,
useful for understanding the frequency of values in
numerical data.
3. EXPLORATION / VISUALIZATION
Bar Charts: Useful for comparing
Box Plots: Help identify the spread quantities across different categories
and outliers in a dataset. in categorical data.
3. EXPLORATION / VISUALIZATION
Heatmaps: represents data values using
colors on a map. Could be used to show the
Scatter Plots: Show relationships between correlation matrix between different
two continuous variables, helping to numerical variables to identify which
identify trends, correlations, or clusters. variables are correlated.
4. EXTERMINATION, PREDICATION AND EVALUATION

What is Machine Learning

(ML)?

“Field of study that gives

computers the ability to learn
without being explicitly
programmed.”

Arthur Samuel (1959)

A checker game between a human player and an electronic player
WHAT IS MACHINE LEARNING (ML)?
Definition by Tom Mitchell (1998)
Machine Learning is the study of algorithms that:
• improve their performance P
• at some task T
• with experience E
A well-defined learning task is given by <P, T, E>.
WHAT IS MACHINE LEARNING (ML)?

❑ Task T: Playing checkers

❑ Performance P: Percentage of games

won against an arbitrary opponent

❑ Training Experience E: Playing

practice games against itself
WHAT IS MACHINE LEARNING (ML)?
Handwriting recognition learning problem

❑ Task T: Recognizing and classifying handwritten

words within images

❑ Performance P: Percent of words correctly classified

❑ Training experience E: A dataset of handwritten

words with given classifications
WHAT IS MACHINE LEARNING (ML)?
A robot driving learning problem

❑ Task T : Driving on highways using vision sensors

❑ Performance P : Average distance travelled before an

error

❑ Training experience E : A sequence of images and

steering commands recorded while observing a human
driver
TRADITIONAL PROGRAMMING VS ML
Traditional Programming

Data
computer Output
Rules

Machine Learning
Data
computer Rules
Output
WHAT DOSE LEARNING MEAN?
oImagine teaching a child the difference between dogs and cats by
using flashcards?
oAs the child practices, his performance improves.
oHuman cognition has built-in classification mechanisms.
oAfter the child is proficient with the flashcards, he’ll be able to classify
not only the images on the flashcards, but also any cat or dog image
oThis ability to generalize, to apply knowledge gained through
training to new unseen examples, is a key characteristic of both human
and machine learning
TYPES OF ML
ALGORITHMS
1. Supervised learning
2. Unsupervised learning
3. Semi-supervised systems
4. Reinforcement learning
1. SUPERVISED LEARNING

input output label

Supervised Learning is where you have both the input variable x and the
output variable y and you use an algorithm to learn the mapping function from
the input to the output Y = f(X)

Learns from being given “right answers”

TYPES OF SUPERVISED LEARNING
1. Regression: 2. Classification:
Used to predict a continuous Used to predict a categorical output.
numerical output.
For example, a classification
For example, a regression algorithm algorithm could be used to predict
could be used to predict the price of whether an email is spam or not.
a house based on its size, location,
and other features.
REGRESSION: HOUSE PRICE PREDICTION
400

300
Price in
$1000's 200

100

0
0 500 1000 1500 2000 2500

House House
size in size
feet2
in feet2
REGRESSION: HOUSE PRICE PREDICTION
400

300
Price in
$1000's 200

100

0
0 500 1000 1500 2000 2500

House House
size in size
feet2
in feet2
let's say a friend wants to know what’s the price for their 750 square foot house. How can the
learning algorithm help you?
REGRESSION: HOUSE PRICE PREDICTION
400

300
Price in
$1000's 200

100

0
0 500 1000 1500 2000 2500

House House
size in size
feet2
in feet
Let's say a friend wants to know what’s the price for2 their 750 square foot house. How
can the learning algorithm help you?
REGRESSION: HOUSE PRICE PREDICTION
- Fitting a straight line isn't the only 400
learning algorithm you can use.
300
-There are others that could work better Price in
for this application. $1000's 200

- For example, you might decide that it's 100

better to fit a curve, a function that's 0
slightly more complicated than a straight 0 500 1000 2000 2500
line. 1500
House
House size in feet2
size in
feet2
REGRESSION: HOUSE PRICE PREDICTION
•This was an example of supervised learning. Because we gave the algorithm a
dataset in which the label (the correct price y is given for every house on the plot).
•The task of the learning algorithm is to predict what is the likely price for other
houses like your friend's house.
•This housing price prediction is the particular type of supervised learning called
Regression.
•In regression, we are trying to predict a number from infinitely many possible
numbers such as the house prices in our example, which could be 150,000 or 70,000
or 183,000 or any other number in between.
CLASSIFICATION: CANCER DETECTION
ML system to diagnose a Lump (Malignant – Benign)
The dataset has tumors of different sizes and labels either (0 for benign) or (1
for malignant)
CLASSIFICATION: CANCER DETECTION
Now we plot the data
The tumor size on x-axis
y-axis for the type of the tumor (0 for benign) or (1 for malignant).

tumor size 𝑥
(diameter in cm)
CLASSIFICATION: CANCER DETECTION

0
tumor size 𝑥
(diameter in cm)

benign
malignant
CLASSIFICATION: CANCER DETECTION

benign
Malignant type 1
0cm diameter(cm) 10cm
Malignant type 2

Remember: Classification predict categories

In classification we may have more than 2 classes.
CLASSIFICATION: CANCER DETECTION
Two or more inputs (not just the size also you have the age of the patient).

Age

Tumor size
CLASSIFICATION: CANCER DETECTION

Age

Tumor size
CLASSIFICATION: CANCER DETECTION

Decision Boundary
Age

Tumor size
SUPERVISED LEARNING (RECAP)
2. UNSUPERVISED LEARNING
Supervised learning learn from data Unsupervised learning f ind something
labeled with the “right answers” interesting in unlabeled data.

age age

tumor size tumor size

UNSUPERVISED LEARNING
➢You're given data on patients and their tumor size and the
patient's age.
➢But not whether the tumor was benign or malignant.
➢We're not asked to diagnose whether the tumor is benign
or malignant, because we're not given any labels.
➢Our job is to find some structure or some pattern or just
find something interesting in the data.
UNSUPERVISED LEARNING

Data only comes with inputs x, but not output labels y. Algorithm has to find structure in the data.
UNSUPERVISED LEARNING
(CLUSTERING: GOOGLE NEWS
UNSUPERVISED LEARNING
(CLUSTERING: DNA MICROARRAY)
Unsupervised learning algorithms can analyze genetic data to identify patterns and
relationships, leading to insights in personalized medicine and genetic research.

genes
(each row)

individuals
(each column)
3. SEMI-SUPERVISED LEARNING
➢Semi-supervised learning is a type of machine learning that falls in between supervised and
unsupervised learning.
➢It is a method that uses a small amount of labeled data and a large amount of unlabeled data
to train a model.
➢First stage: train the model on the small labeled dataset to learn a function that can accurately
predict the output variable based on the input variables, similar to supervised learning.
➢Second stage: several purposes such as:
1. Self-training: the model trained on the labeled data is used to predict labels for the
unlabeled data.
2. Cluster: use the small labeled dataset to inform and guide the pattern discovery in the
unlabeled data.
3. SEMI-SUPERVISED LEARNING

➢Why use semi-supervised Learning?

1. Cost of Labeling Data
2. Better Performance with Less Labeling
➢It uses with Text Classification, Speech Recognition, Image
Classification, etc.
4. REINFORCEMENT LEARNING
➢Reinforcement learning is the problem of getting
an agent to take actions that maximize reward in
a particular situation
➢A learner is not told what actions to take as in
most forms of machine learning but instead must
discover which actions yield the most reward by
trying them.
➢For example — Consider teaching a dog a new
trick: we cannot tell him what to do, what not to do,
but we can reward/punish it if it does the
right/wrong thing.
4. REINFORCEMENT LEARNING
The problem is as follows: We have an agent and a
reward, with many hurdles in between. The agent is
supposed to find the best possible path to reach the
reward.
The robot learns by trying all the possible paths and
then choosing the path which gives him the reward with
the least hurdles.
Each right step will give the robot a reward and each
wrong step will subtract the reward of the robot.
The total reward will be calculated when it reaches the
final reward that is the diamond.
TYPES OF ML ALGORITHMS
1. Supervised learning: Models learn from labeled data (input-output pairs)
Given: training data + desired outputs (labels)
Example: Logistic Regression, Liner regression, Random forest, Decision trees and
Naive Bayes
2. Unsupervised learning: Models find hidden patterns in unlabeled data
Given: training data (without desired outputs)
Example: K-means clustering and Principal component analysis PCA
TYPES OF ML ALGORITHMS
3. Semi-supervised learning:
Given: training data + a few desired outputs
4. Reinforcement learning: Models learn by interacting with the environment and
receiving rewards.
Example: Gaming (AlphaGo), robotics.
What type of ML was the
Arthur Samuel Checker's
Program?
DS & ML

Training set
Features (X) Label / Target (Y)
70%
Test Set
Features (X) Label / Target (Y)
30%
TERMINOLOGIES OF MACHINE LEARNING
Model: A model is a specific representation learned from data by applying some
machine learning algorithm.

Training: The idea is to give a set of inputs (training set) and its expected outputs
(labels), so after training, we will have a model that will then map new data to one of
the categories trained on.

Prediction: Once our model is ready, it can be fed a set of inputs (test set) to which it
will provide a predicted output (label).
MEASURING THE PERFORMANCE OF ML
CLASSIFICATION
1. Confusion Matrix
A table summarizing the performance of a classification model by
showing the actual vs predicted classifications. It includes:
True Positives (TP): Correctly predicted positive instances Actual value
1 0
True Negatives (TN): Correctly predicted negative instances
1 TP FP
Predict value
False Positives (FP): Incorrectly predicted positive instances 0 FN TN
False Negatives (FN): Incorrectly predicted negative instances
MEASURING THE PERFORMANCE OF ML
CLASSIFICATION
2. Accuracy
The proportion of correctly classified instances over the total instances.
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = Actual value
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
True Positives (TP): Correctly predicted positive instances 1 0
True Negatives (TN): Correctly predicted negative instances 1 TP FP
Predict value
0 FN TN
False Positives (FP): Incorrectly predicted positive instances
False Negatives (FN): Incorrectly predicted negative instances
MEASURING THE PERFORMANCE OF ML
CLASSIFICATION
3. Precision
The proportion of true positive predictions out of all the instances that
were predicted as positive.
𝑇𝑃 Actual value
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃+𝐹𝑃 1 0
True Positives (TP): Correctly predicted positive instances 1 TP FP
Predict value
True Negatives (TN): Correctly predicted negative instances 0 FN TN
False Positives (FP): Incorrectly predicted positive instances
False Negatives (FN): Incorrectly predicted negative instances
MEASURING THE PERFORMANCE OF ML
CLASSIFICATION
3. Recall (Sensitivity(
The proportion of true positive predictions out of all the actual positive
instances.
𝑇𝑃 Actual value
Recall =
𝑇𝑃+𝐹𝑁 1 0
True Positives (TP): Correctly predicted positive instances 1 TP FP
Predict value
True Negatives (TN): Correctly predicted negative instances 0 FN TN
False Positives (FP): Incorrectly predicted positive instances
False Negatives (FN): Incorrectly predicted negative instances
MEASURING THE PERFORMANCE OF ML
CLASSIFICATION
4. F1-Score
The harmonic mean of precision and recall. It balances the trade-off between
precision and recall.
Precision×𝑅𝑒𝑐𝑎𝑙𝑙
𝐹1 − score = 2 ×
Precision+𝑅𝑒𝑐𝑎𝑙𝑙
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃+𝐹𝑃
𝑇𝑃
Recall =
𝑇𝑃+𝐹𝑁

Actual value
True Positives (TP): Correctly predicted positive instances
1 0
True Negatives (TN): Correctly predicted negative instances
1 TP FP
False Positives (FP): Incorrectly predicted positive instances Predict value
0 FN TN
False Negatives (FN): Incorrectly predicted negative instances
MEASURING THE PERFORMANCE OF ML
CLASSIFICATION
Example: Email spam (Classification)

Example Target y prediction 𝑦ො

1 1 1
2 1 0 Accuracy: 3/5= 0.6 = 60%
3 0 1
4 0 0
5 1 1
MEASURING THE PERFORMANCE OF ML
REGRESSION
1. Mean Absolute Error (MAE)
The average of the absolute differences between the predicted values and the actual values.
1
𝑀𝐴𝐸 = σ𝑛𝑖=1 𝑦𝑡𝑟𝑢𝑒,𝑖 − 𝑦𝑝𝑟𝑒𝑑,𝑖
𝑛
2. Mean Squared Error (MSE)
The average of the squared differences between the predicted values and the actual values. It
penalizes larger errors more heavily than MAE.
1 𝑛 2
𝑀𝑆𝐸 = σ 𝑦𝑡𝑟𝑢𝑒,𝑖 − 𝑦𝑝𝑟𝑒𝑑,𝑖
𝑛 𝑖=1

MSE emphasizes larger errors because the differences are squared

MEASURING THE PERFORMANCE OF ML
REGRESSION
Example: house pricing

Example Target y prediction 𝑦ො

1 1 0.8
2 2 1.9
MSE: 1/N * Σ(y_i — ŷ_i)²
= 1/5 (1-0.8) ²+ (2-1.9) ²+ (3-2.9) ²+ (4-4.1) ²+ (5-5.2) ²
3 3 2.9
= 0.02200
4 4 4.1
5 5 5.2
DATA SCIENCE IS NOT
MACHINE LEARNING!
Even though DS and ML are related
closely, but they are not the same!
Machine learning has a heavy focus on
fancy and complex algorithm and involves
computation and statistics
These algorithms need a clean and ready
datasets from DS field to test the complex
algorithm
But sometimes the best way to solve a
problem is just by visualizing the data, for
instance
JOBS DESCRIPTIONS

1. Data engineer
2. Data analyst
3. Data scientist
4. Machine learning scientist/
engineer
1. DATA ENGINEER
➢ Data engineers control the flow of data, build custom data pipelines and
storage systems.
➢They design infrastructure so that data is not only collected, but easy to
obtain and process.
➢Within the data science workflow, they focus on the first stage: data collection
and storage.
➢Data engineering tools: SQL, Java, Scala, or Python to process data, and
cloud computing to ingest and store large amounts of data.
2. DATA ANALYST
➢Data analysts describe the present via data.
➢They do this by exploring the data and creating visualizations and
dashboards.
➢To do these tasks, they often have to clean data first.
➢Within the workflow, they focus on the middle two stages: data preparation
and exploration and visualization.
➢Data analyst tools: SQL, spreadsheets, Business Intelligence (BI) tools such as
Tableau, Power BI, or Looker, to create dashboards and share their analyses.
3. DATA SCIENTIST
➢Data Scientists have a strong background in statistics, enabling them to find
new insights from data, rather than solely describing data.
➢They also use traditional machine learning for prediction and forecasting.
➢Within the workflow, they focus on the last three stages: data preparation
and exploration and visualization, and experimentation and prediction.
➢Data scientist tools: SQL, Python, and R.
4. MACHINE LEARNING SCIENTIST/ ENGINEER
➢Machine learning scientists are similar to data scientists but with a machine
learning specialization.
➢Focuses on developing, training, and deploying machine learning models in
production environments.
➢They go beyond traditional machine learning with deep learning.
➢Within the workflow, they do the last three stages with a strong focus on
prediction.
➢Machine learning tools: Python or R to create their predictive models

Machine Learning Unit 1
100% (7)
Machine Learning Unit 1
112 pages
360DigiTMG Practical Data Science New
100% (1)
360DigiTMG Practical Data Science New
168 pages
Da Session 1
No ratings yet
Da Session 1
50 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
Workshop 0
No ratings yet
Workshop 0
22 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
Module1 Introduction
No ratings yet
Module1 Introduction
35 pages
Data Science 1
100% (3)
Data Science 1
133 pages
ml
No ratings yet
ml
333 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
6220010
No ratings yet
6220010
37 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Data Science_ppt
No ratings yet
Data Science_ppt
45 pages
LM #01-Introduction To ML
No ratings yet
LM #01-Introduction To ML
33 pages
4.Introductin to Machine Learning
No ratings yet
4.Introductin to Machine Learning
28 pages
Unit_I_1
No ratings yet
Unit_I_1
203 pages
Unit 2
No ratings yet
Unit 2
48 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
33 pages
Lecture 1 - Introduction To Data Science
No ratings yet
Lecture 1 - Introduction To Data Science
14 pages
Unit - I & II
No ratings yet
Unit - I & II
59 pages
chp4 (10) fam
No ratings yet
chp4 (10) fam
16 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
ETI microproject
No ratings yet
ETI microproject
11 pages
Previous Lecture
No ratings yet
Previous Lecture
43 pages
Machine-Learning NOTE2025 2
No ratings yet
Machine-Learning NOTE2025 2
331 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
ML Lect1
100% (1)
ML Lect1
51 pages
ML Chapter 01
No ratings yet
ML Chapter 01
38 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
No ratings yet
EEL 6935 Data Analytics: Introduction To Data Science & Machine Learning
12 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
ML 01
No ratings yet
ML 01
15 pages
Copy of Introduction to DS.pdf
No ratings yet
Copy of Introduction to DS.pdf
34 pages
Project Report
No ratings yet
Project Report
29 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
40 pages
360DigiTmg E Book Data Science
100% (1)
360DigiTmg E Book Data Science
168 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Ai CH 2
No ratings yet
Ai CH 2
43 pages
Chapter-1 Ml Intro
No ratings yet
Chapter-1 Ml Intro
36 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Day 1 Intro To DS and ML - New
No ratings yet
Day 1 Intro To DS and ML - New
41 pages
Overview of machine learning
No ratings yet
Overview of machine learning
60 pages
COMP323 - Topic C - Introduction To Machine Learning 1
No ratings yet
COMP323 - Topic C - Introduction To Machine Learning 1
20 pages
ch_01
No ratings yet
ch_01
18 pages
Lecture9 ML Introduction.pptx
No ratings yet
Lecture9 ML Introduction.pptx
43 pages
ML Unit-1
No ratings yet
ML Unit-1
139 pages
Mlunit 1
No ratings yet
Mlunit 1
139 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Lesson 4 -Introduction Machine Learning
No ratings yet
Lesson 4 -Introduction Machine Learning
44 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
BACTERIA
No ratings yet
BACTERIA
15 pages
Getting Started With Energyplus: Basic Concepts Manual - Essential Information You Need About Running Energyplus
No ratings yet
Getting Started With Energyplus: Basic Concepts Manual - Essential Information You Need About Running Energyplus
75 pages
Such A Long Journey
No ratings yet
Such A Long Journey
23 pages
14.4 Extraction and uses of metals (2C) MS Part 4
No ratings yet
14.4 Extraction and uses of metals (2C) MS Part 4
11 pages
I. Similar To Rounded Abdomen Only Greater. Anticipated in Pregnancy, Also Seen in Obesity, Ascites, and Other Conditions II
No ratings yet
I. Similar To Rounded Abdomen Only Greater. Anticipated in Pregnancy, Also Seen in Obesity, Ascites, and Other Conditions II
3 pages
Centro Escolar University School of Medical Technology Group Reporting Rubrics
No ratings yet
Centro Escolar University School of Medical Technology Group Reporting Rubrics
2 pages
Organizational Culture and Organizational Performance A Review of Literature
No ratings yet
Organizational Culture and Organizational Performance A Review of Literature
12 pages
Baliga2014 - Expanding - The - Use - of - Ultrasonic - Gas - Leak - Detectors - A Review of Gas Release Characteristics For Adequate Detection
No ratings yet
Baliga2014 - Expanding - The - Use - of - Ultrasonic - Gas - Leak - Detectors - A Review of Gas Release Characteristics For Adequate Detection
7 pages
Test Unit 2
No ratings yet
Test Unit 2
2 pages
The Courant. Day One Day Two Day Three Day Four Day Five Reform Forum Database Links Home
No ratings yet
The Courant. Day One Day Two Day Three Day Four Day Five Reform Forum Database Links Home
27 pages
(A) Separable Differential Equations
No ratings yet
(A) Separable Differential Equations
2 pages
At The Hard Rock Cafe Like Many Organizations Project Management
No ratings yet
At The Hard Rock Cafe Like Many Organizations Project Management
2 pages
Computer: For Other Uses, See - "Computer Technology" Redirects Here. For The Company, See
No ratings yet
Computer: For Other Uses, See - "Computer Technology" Redirects Here. For The Company, See
29 pages
Appian Developer Resume Example With Appropriate Skills
No ratings yet
Appian Developer Resume Example With Appropriate Skills
9 pages
Ficha Técnica Nimac 6-GFM-200AJ 12V 200A
No ratings yet
Ficha Técnica Nimac 6-GFM-200AJ 12V 200A
2 pages
"Next Generation Traders": SFTC V3
No ratings yet
"Next Generation Traders": SFTC V3
48 pages
Powerz Pitch Summer 2020
No ratings yet
Powerz Pitch Summer 2020
25 pages
Dissimilar Welding of AISI 309 Stainless Steel To AISI 1020 Carbon Steel Using Arc Stud Welding
No ratings yet
Dissimilar Welding of AISI 309 Stainless Steel To AISI 1020 Carbon Steel Using Arc Stud Welding
6 pages
Data Envelopment Analysis Sample Computation
No ratings yet
Data Envelopment Analysis Sample Computation
10 pages
SE250SM-Ce (1) - User Manual
No ratings yet
SE250SM-Ce (1) - User Manual
10 pages
wound-bed-structures
No ratings yet
wound-bed-structures
2 pages
Chapter 1 - Limits, Alternatives and Choices
No ratings yet
Chapter 1 - Limits, Alternatives and Choices
9 pages
Title Defense
No ratings yet
Title Defense
12 pages
Disaster Recovery Plan With Examples Presentation
No ratings yet
Disaster Recovery Plan With Examples Presentation
15 pages
SP 175 4
No ratings yet
SP 175 4
7 pages
Re5r05a Catalogo Peçs PDF
No ratings yet
Re5r05a Catalogo Peçs PDF
4 pages
06_0620_42_3RP.indd
No ratings yet
06_0620_42_3RP.indd
3 pages
Research on Translator and Interpreter Training A Collective Volume of Bibliometric Reviews and Empirical Studies on Learners 1st Edition Jackie Xiu Yan - The ebook in PDF format is ready for download
No ratings yet
Research on Translator and Interpreter Training A Collective Volume of Bibliometric Reviews and Empirical Studies on Learners 1st Edition Jackie Xiu Yan - The ebook in PDF format is ready for download
57 pages
Vincent Attatchment Report
No ratings yet
Vincent Attatchment Report
20 pages
Top 10 Hardware Startups in 2023
No ratings yet
Top 10 Hardware Startups in 2023
16 pages

Uploaded by

Uploaded by

INTRODUCTION TO DATA SCIENCE

AND MACHINE LEARNING

The world generates 2.5 quintillion bytes of

Insight / Policy Decisions

B. Data Integration D. Data Reduction

What is Machine Learning

“Field of study that gives

Arthur Samuel (1959)

❑ Task T: Playing checkers

❑ Performance P: Percentage of games

❑ Training Experience E: Playing

❑ Task T: Recognizing and classifying handwritten

❑ Performance P: Percent of words correctly classified

❑ Training experience E: A dataset of handwritten

❑ Task T : Driving on highways using vision sensors

❑ Performance P : Average distance travelled before an

❑ Training experience E : A sequence of images and

input output label

Learns from being given “right answers”

- For example, you might decide that it's 100

Remember: Classification predict categories

tumor size tumor size

➢Why use semi-supervised Learning?

Example Target y prediction 𝑦ො

MSE emphasizes larger errors because the differences are squared

Example Target y prediction 𝑦ො

You might also like