0% found this document useful (0 votes)
75 views

Lec 1,2

This document discusses machine learning and various related topics. It begins with an introduction to machine learning and deep learning, noting that deep learning has become the most visible face of machine learning. It then discusses several new developments in AI/ML including privacy preserving ML, edge AI, explainable AI, multimodal learning, and addressing bias in AI systems. It also discusses new application domains such as biodiversity, earth observation, and autonomous vehicles. The document then goes into more depth on topics such as the role of ML in autonomous vehicles, examples of bias in AI, and the history and role of mathematics in machine learning.

Uploaded by

ABHIRAJ E
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Lec 1,2

This document discusses machine learning and various related topics. It begins with an introduction to machine learning and deep learning, noting that deep learning has become the most visible face of machine learning. It then discusses several new developments in AI/ML including privacy preserving ML, edge AI, explainable AI, multimodal learning, and addressing bias in AI systems. It also discusses new application domains such as biodiversity, earth observation, and autonomous vehicles. The document then goes into more depth on topics such as the role of ML in autonomous vehicles, examples of bias in AI, and the history and role of mathematics in machine learning.

Uploaded by

ABHIRAJ E
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Prof.

Navneet Goyal
Department of Computer Science, BITS-Pilani, Pilani Campus,
India
PC - GeeksforGeeks
Introduction
• Machine Learning is the most visible face of AI!
• Now, Deep Learning has become the most visible face of
Machine Learning!
• Some new developments in AI/ML
– Privacy preserving ML/Federated Learning
– Edge AI/Private AI
– Explainable AI (XAI)
– Multimodal Learning
– Bias in AI Systems
• Some new application domains
– Biodiversity/Bioacoustics
– Earth Observation
– Geo Tagging
– Social Good
– Autonomous Cars
Autonomous Cars
• What role Machine Learning has to play?
Bias In AI
• A CHILD WEARING sunglasses is labeled as a “failure,
loser, nonstarter, unsuccessful person.” This is just
one of the many systemic biases exposed by
ImageNet Roulette, an art project that applies labels
to user submitted photos by sourcing its
identification system from the original ImageNet
database.
• ImageNet, which has been one of the instrumental
datasets for advancing AI, has deleted more than half
a million images from its “person” category since this
instance was reported in late 2019.
Bias In AI
• Earlier in 2019, researchers showed how
Facebook’s ad-serving algorithm for deciding who
is shown a given ad exhibits discrimination based
on race, gender, and religion of users.
• There have been reports of commercial facial-
recognition software (notably Amazon’s
Rekognition, among others) being biased against
darker-skinned women.
Introduction
• What exactly is Machine Learning?
• Why we need it?
• If at all we need it, how can we make the machines learn?
– in the same way humans (animals) learn?
• What is Deep Learning?
Introduction: Early Days of ML
• During the WW II, noted British computer scientist Alan
Turing worked to crack the ‘Enigma’ code which was
used by German forces to send messages securely.
• Turing and his team created the Bombe machine that
was used to decipher Enigma’s messages.
• The Enigma and Bombe Machines laid the foundations
for Machine Learning.
• Turing Test - a machine that could converse with
humans without the humans knowing that it is a
machine would win the “imitation game” and could be
said to be “intelligent”.
Recent Developments in ML
• Google open sourcing Tensor Flow
• Microsoft open sourced CNTK (Cognitive Toolkit)
• Baidu open sourced its Deep Learning Platform - PaddlePaddle
• Amazon will back MXNet (Apache) – a Deep Learning Framework
in their new AWS ML platform
• Facebook supporting the development of 2 Deep Learning
frameworks: Torch (Open Source ML Lib. – Sci. Comp. framework
based on Lua PL) & Caffe (Deep Learning framework by Berkeley
AI Research – BAIR)
• Google is also supporting Keras (NN API which can run on Tensor
Flow, CNTK, Theano)
• Wavenet’s audio generation using Deep Learning
– Outperforms Google’s TTS (Text 2 Speech)
• Lip reading – application of video recognition
• Machine Translation
Source: Article in Forbes – Best ML breakthroughs of 2016
Author - Xavier Amatriain, VP Engg. @ Quora
Former Netflix Recommendations Researcher & Professor
Recent Developments in ML
Deep Learning has taken ML to the next level

HPC & AI
Recent Developments in ML: AlphaGo Zero
• Mastering the game of Go without human knowledge*
– Much progress towards AI has been made using
supervised learning systems that are trained to replicate
the decisions of human experts
– expert data sets are often expensive, unreliable or simply
unavailable.
– Even when reliable data sets are available, they may
impose a ceiling on the performance of systems trained
in this manner
– Reinforcement Learning systems learn from their own
experience, allowing them to exceed human capabilities
– AlphaGo was the first program to achieve superhuman
performance in Go

Reinforcement Learning - Exploitation vs. Exploration

* David Silver et al. , 2017, Nature


Recent Success Stories of ML/DL
• Aftershock Prediction
• Arrhythmia Detection
• Poker Game
Introduction: Role of Mathematics in ML
There is no branch of
mathematics, however
abstract, which may not
some day be applied to
phenomena of the real
world."
-Nikolai Lobachevsky
Role of Mathematics in ML
• Before you can learn Machine Learning, you need
to learn a lot!!
– Linear Algebra
– Optimization
– Probability & Statistics (Bayes’ Theorem)
– High dimensional vectors: tensors

Disappointed??
Prerequisites for ML
Introduction
List down the tasks which we humans can do better than
machines!!
Introduction
Let’s look at these incredible things that humans can do:
1. Identifying a song by just listening to a very small part of it
2. Identifying a movie by looking at a very short clip
3. Identifying a person
4. Identifying a person even after you see him after many many years
5. Recollecting memories
6. Identifying a person from a distance
7. Identifying a person by just listening to his/her voice
8. Identifying a person by his chat/message signature
9. Our own GPS!
10. Identifying spam mails
11. Object identification in images/videos
12. Image/document tagging
13. Suspicious activity or person
14. Medical diagnosis
15. Handwriting recognition
16. Conversation/discussion
17. …
Introduction
Ever wondered how we could do all this which such accuracy
and efficiency?

1. Pattern recognition
2. Information retrieval

Human Brain!!
Neurons!!

Ever wondered how we can make Machines learn to do all


such tasks and that too with the efficiency and accuracy of
Humans?
What is Machine Learning?
• Machines DO Machines LEARN
• Shift in paradigm!
• Machines can be made to learn!
• How and for what purpose?
• How? By writing algorithms!
• Purpose: Mainly to Predict and to take Decisions!
Types of Learning
• Supervised
• Unsupervised
• Semi-supervised
• Reinforced
• Active
• Deep
• Federated
• Transfer
Types of Learning
• Supervised
– Most common setting is supervised learning of predictive
models. Typical tasks are classification and regression
– Needs labelled data
Types of Learning
• Unsupervised
– Descriptive models can be learned in an unsupervised
setting. Clustering & Association rule mining are examples
– Unsupervised learning of a predictive model occurs when
we cluster data with the intention of using the clusters to
assign class labels to new data.
– Predictive clustering
Supervised vs. Unsupervised
Types of Learning
• Semi-supervised
– Semi-supervised learning for predictive models
– In many situations, data is cheap, but labelled data is
expensive
• Web documents, images
– Use a small labelled training data to build an initial model,
which is then refined using the unlabelled data
– Use the initial model to make predictions on the labelled
data, and use the most confident predictions as new
training data
Types of Learning
• Reinforced
– Reinforcement learning fills the gap between supervised
learning and unsupervised learning
– Middle ground – information is provided whether or not
the answer is correct, but not how to improve it
– Reinforcement learner has to try out different strategies
and figure out which works best
– “Trying out” is like search
Types of Learning
• Active
– If I tell you that you can achieve better accuracy with less
training, would you believe me?
– NO!!
– It is possible when the learning algorithm is:
• Allowed to be “curious”
• Allowed to choose the data from which it learns
– It is possible with ACTIVE LEARNING!
Some applications where Active Learning is useful:
– Speech Recognition
– Document Classification
– Image & Video annotation
Active Learning

An active learner differs from a passive learner which simply


receives a random data set from the world and then outputs
a classifier or model

Figure taken from Simon Tong’s PhD Thesis


Passive Learning

Figure taken from Simon Tong’s PhD Thesis


Introduction

qZoologists study learning in animals


qPsychologists study learning in humans
qIn this course, we focus on
“Learning in Machines”

qCourse Objective
qStudy of approaches and algorithms that can make a
machine learn
Introduction

qMachine Learning
qSubarea of AI that is concerned with
algorithms/programs that can make a machine learn
qImprove automatically with experience
qFor example- doctors learning from experience
qFaculty learning how to control the class and be effective
qWe all learn from experience

Imagine computers learning from medical records and


suggesting treatment (automated diagnosis & prescription)
Machine Learning

Definition:
A computer program is said to learn from experience
E with respect to some class of tasks T and
performance measure P, if its performance at tasks in
T, as measured by P, improves with experience E.
What is Machine Learning?
• To solve a problem, we need an algorithm!
• For example: sorting a list of numbers
• Input: list of numbers
• Output: sorted list of numbers
• For some tasks, like filtering spam mails
• Input: an email
• Output: Y/N
• We do not know how to transform Input to Output
• Definition of Spam changes with time and from one
individual to individual
• What to DO?
Reference: E Alpaydin’s Machine Learning Book, 2010 (MIT Press)
What is Machine Learning?
• Collect lots of emails (both genuine and spam)
• “Learn” what constitutes a spam mail (or for that
matter a genuine mail)
• Learn from DATA!!
• For many similar problems, we may not have
algorithm(s), but we do have example data (called
Training Data)
• Ability to process training data has been made
possible by advances in computer technology
What is Machine Learning?
• Face Recognition!!!
• We humans are so good at it!!!
• Ever thought how we do it, despite
– Different light conditions, pose, hair style, make up,
glasses, ageing etc..
• Since we do not know how we do it, we can not
write a program to do it
• ML is about making inference from a sample
Machine Learning Applications
• What kind of data I would require for learning?
– Credit card transactions
– Face Recognition
– Spam filter
– Handwriting/Character Recognition
Handwriting Recognition
• Task T
– recognizing and classifying handwritten words within images
• Performance measure P
– percent of words correctly classified
• Training experience E
– a database of handwritten words with given classifications
Handwriting Recognition
Pattern Recognition Example
• Handwriting Digit Recognition

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Pattern Recognition Example
• Handwriting Digit Recognition
– Non-trivial problem due to variability in handwriting
– What about using handcrafted rules or heuristics for
distinguishing the digits based on shapes of strokes?
– Not such a good idea!!
– Proliferation of rules
– Exceptions of rules and so on…
– Adopt a ML approach!!

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Pattern Recognition Example
• Handwriting Digit Recognition
– Each digit represented by a 28x28 pixel image
– Can be represented by a vector of 784 real no.s
– Objective: to have an algorithm that will take such a vector as
input and identify the digit it is representing
– Take images of a large no. of digits (N) – training set
– Use training set to tune the parameters of an adaptive model
– Each digit in the training set has been identified by a target
vector t, which represents the identity of the corresp. digit.
– Result of running a ML algo. can expressed as a fn. y(x) which
takes input a new digit x and outputs a vector y. Vector y is
encoded in the same way as t
– The form of y(x) is determined through the learning (training)
phase
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
Pattern Recognition Example
• Generalization
– The ability to categorize correctly new examples that differ
from those in training
– Generalization is a central goal in pattern recognition
• Preprocessing
– Input variables are preprocessed to transform them into some
new space of variables where it is hoped that the problem will
be easier to solve (see fig.)
– Images of digits are translated and scaled so that each digit is
contained within a box of fixed size. This reduces variability.
– Preprocessing stage is referred to as feature extraction
– New test data must be preprocessed using the same steps as
training data

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
A word about Preprocessing!!
• Preprocessing
– Can also speed up computations
– For eg.: Face detection in a high resolution video stream
– Find useful features that are fast to compute and yet that also
preserve useful discriminatory information enabling faces to be
distinguished form non-faces
– Avg. value of image intensity in a rectangular sub-region can be
evaluated extremely efficiently and a set of such features are
very effective in fast face detection
– Such features are smaller in number than the number of pixels,
it is referred to as a form of Dimensionality Reduction
– Care must be taken so that important information is not
discarded during pre processing
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
Curse of Dimensionality!!
The picture can't be displayed.

• Poses serious challenges !


• Important factor influencing the design on pattern recognition techniques
• Mixture of oil, water & gas (homogeneous , annular & laminar)
• Each data point is a point in a 12-dimensional space.
• 100 points along only two dimensions, x6 & x7
• x – predict its class?
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Curse of Dimensionality!!
The picture can't be displayed.

• Unlikely that it belongs to the blue class!


• Surrounded by lot of red points
• Also, many green points nearby
• Intuition: identity of the x should be determined strongly by nearby points
and less strongly by more distant points
• How can we turn this intuition into a learning algorithm?
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Curse of Dimensionality!!
The picture can't be displayed.

• Make grid lines!


• Use majority voting
• Problems??

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Curse of Dimensionality

} No. of cells grow exponentially with D


} Need exponentially large no. of training data points
} Not a good approach for more than a few dimensions!
Curse of Dimensionality

Source:https://www.opendatascience.com/blog/curse-of-dimensionality-explained/
Curse of Dimensionality
o As the number of dimensions tend to infinity, the
volume of a unit hypersphere tends to zero!!
(think about it!!)
o Data becomes increasingly sparse with increasing
dimensions
o As number of dimensions in a dataset increases,
distance measures become increasingly meaningless
o In very high dimensions, they are almost equidistant from each
other (relative distance between points tend to zero)
Different Aspects of the “Curse”*
o Optimization Problem
o Concentration Effect of Lp-Norms
o Irrelevant Attributes
o Correlated Attributes
o Intrinsic dimensionality < Embedding dimensionality
o Varying relative volume of an ℇ-Hypersphere

• Data Clustering By Charu C Aggarwal and C K Reddy


Different Aspects of the “Curse”*
Optimization Problem
• The difficulty of any global optimization approach
increases exponentially with an increasing number of
variables (dimensions).
• General relation to clustering: fitting of functions (each
function explaining one cluster) becomes more difficult
with more degrees of freedom.
• Direct relation to subspace clustering: number of
possible subspaces increases dramatically with
increasing number of dimensions

Zimek: Data Mining and the 'Curse of Dimensionality’ (iDB Workshop


2011)
Different Aspects of the “Curse”*
Optimization Problem
• The difficulty of any global optimization approach
increases exponentially with an increasing number of
variables (dimensions).
• General relation to clustering: fitting of functions (each
function explaining one cluster) becomes more difficult
with more degrees of freedom.
• Direct relation to subspace clustering: number of
possible subspaces increases dramatically with
increasing number of dimensions

Zimek: Data Mining and the 'Curse of Dimensionality’ (iDB Workshop


2011)
Different Aspects of the “Curse”*
Concentration effect of Lp-norms
• ratio of (Dmaxd – Dmind) to Dmind converges to zero
with increasing dimensionality d
• Dmind = distance to the nearest neighbor in d dimensions
• Dmaxd = distance to the farthest neighbor in d dimensions
Formally:

Zimek: Data Mining and the 'Curse of Dimensionality’ (iDB Workshop


2011)
Different Aspects of the “Curse”*
Concentration effect of Lp-norms
• Distances to near and to far neighbors become more
and more similar with increasing data dimensionality
(loss of relative contrast or concentration effect of
distances).

Zimek: Data Mining and the 'Curse of Dimensionality’ (iDB Workshop


2011)
Different Aspects of the “Curse”*
Relevant and Irrelevant attributes
• A subset of the features may be relevant for clustering
• Groups of similar (“dense”) points may be identified when
considering these features only
• Different subsets of attributes may be relevant for different
clusters
• Separation of clusters relates to relevant attributes (helpful to
discern between clusters) as opposed to irrelevant attributes
(indistinguishable distribution of attribute values for different
clusters)

Zimek: Data Mining and the 'Curse of Dimensionality’ (iDB Workshop


2011)
Different Aspects of the “Curse”*
Relevant and Irrelevant attributes

Zimek: Data Mining and the 'Curse of Dimensionality’ (iDB Workshop


2011)
Curse of Dimensionality
• Solutions
– Linear Dimensionality Reduction Techniques
• Principal Component Analysis
• Singular Value Decomposition
– Non-Linear Dimensionality Reduction Techniques
• Manifold Learning
– Isomap
– Kernel PCA
Brush up your Linear Algebra…
A word about DATA
• If data had mass, the Earth would be a black hole!!
• Data is the new Oil!!
• Expected to reach 40 ZB by 2020!!
• In 2012, we had about 2.8 ZB*
– Only 1/4th of this data could produce useful information
– Only 3% of it was tagged
– Only 0.5% of it was actually used for some kind of analysis
(*Report by John Gantz & David Reinsel – sponsored by EMC)
A word about DATA
• Comes in different sizes, shapes, and colors
• BIG DATA!!
– Characterized by 5Vs!

Source: http://bigdata.black/featured/what-is-big-data/ Source: http://iihtofficialblog.blogspot.in/2014/07/5-vs-of-hadoop-big-


data.html
A word about DATA

Source: Foundations of Data Science by John Hopcroft & Ravi Kannan


Patterns
• Curve Fitting
• Partitioning of space
Linear Classifiers in High-Dimensional
Spaces
Constructed
Var1
Feature 2

Var2
Constructed
Feature 1
Find function F(x) to map to a
different space
68
Go back
Getting Started…

You might also like