unit1
unit1
Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning Systems,
Main Challenges of Machine Learning.
Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs in
Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk
Minimization
------------------------------------------------------------------------------------------------------------
The term Machine Learning was first coined by Arthur Samuel in the year 1959. Looking back, that year was
probably the most significant in terms of technological advancements.
If you browse through the net about ‘what is Machine Learning’, you’ll get at least 100 different definitions.
However, the very first formal definition was given by Tom M. Mitchell:
“A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured by P, improves with experience E.”
In simple terms, Machine learning is a subset of Artificial Intelligence (AI) which provides
machines the ability to learn automatically & improve from experience without being
explicitly programmed to do so. In the sense, it is the practice of getting Machines to
solve problems by gaining the ability to think.
But wait, can a machine think or make decisions? Well, if you feed a machine a good
amount of data, it will learn how to interpret, process and analyze this data by using
Machine Learning Algorithms, in order to solvereal-world problems.
Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques used
to learn patterns from data and draw significant information from it. It is the logic behind
a Machine Learning model. An example of a Machine Learning algorithm is the Linear
Regression algorithm.
Predictor Variable: It is a feature(s) of the data that can be used to predict the output.
Response Variable: It is the feature or the output variable that needs to be predicted
by using the predictorvariable(s).
Training Data: The Machine Learning model is built using the training data. The
training data helps the model to identify key trends and patterns essential to predict the
output.
Testing Data: After the model is trained, it must be tested to evaluate how accurately it
can predict an outcome. This is done by the testing data set.
Once you know the types of data that is required, you must understand how you can
derive this data. Data collection can be done manually or by web scraping. However, if
you’re a beginner and you’re just looking to learn Machine Learning you don’t have to
worry about getting the data. There are 1000s of data resources on the web, you can
just download the data set and get going.
Coming back to the problem at hand, the data needed for weather forecasting includes
measures such as humidity level, temperature, pressure, locality, whether or not you live
in a hill station, etc. Such data must becollected and stored for analysis.
The data you collected is almost never in the right format. You will encounter a lot of
inconsistencies in the data set such as missing values, redundant variables, duplicate
values, etc. Removing such inconsistencies is very essential because they might lead to
wrongful computations and predictions. Therefore, at this stage, you scan the data set for
any inconsistencies and you fix them then and there.
Step 4: Exploratory Data Analysis
Grab your detective glasses because this stage is all about diving deep into data and finding
all the hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming
stage of Machine Learning. Data Exploration involves understanding the patterns and
trends in the data. At this stage, all the useful insights are drawn and correlations
between the variables are understood.
For example, in the case of predicting rainfall, we know that there is a strong possibility
of rain if the temperature has fallen low. Such correlations must be understood and
mapped at this stage.
All the insights and patterns derived during Data Exploration are used to build the
Machine Learning Model. This stage always begins by splitting the data set into two
parts, training data, and testing data. The trainingdata will be used to build and analyze
the model. The logic of the model is based on the Machine Learning Algorithm that is
being implemented.
In the case of predicting rainfall, since the output will be in the form of True (if it will rain
tomorrow) or False (no rain tomorrow), we can use a Classification Algorithm such as
Logistic Regression.
Choosing the right algorithm depends on the type of problem you’re trying to solve, the
data set and the level of complexity of the problem. In the upcoming sections, we will
discuss the different types of problems thatcan be solved by using Machine Learning.
After building a model by using the training data set, it is finally time to put the model
to a test. The testing data set is used to check the efficiency of the model and how
accurately it can predict the outcome. Once the accuracy is calculated, any further
improvements in the model can be implemented at this stage. Methods like parameter
tuning and cross-validation can be used to improve the performance of the model.
Step 7: Predictions
Once the model is evaluated and improved, it is finally used to make predictions. The
final output can be a Categorical variable (eg. True or False) or it can be a Continuous
Quantity (eg. the predicted value of a stock).
In our case, for predicting the occurrence of rainfall, the output will be a categorical
variable.
So that was the entire Machine Learning process. Now it’s time to learn about the
different ways in which
Machines can learn.
2) Machine Learning Types
A machine can learn to solve a problem by following any one of the following three
approaches. These are the ways in which a machine can learn:
1) Supervised Learning
2) Unsupervised Learning
3) Reinforcement Learning
Supervised Learning
In supervised learning, models are trained using labelled dataset, where the model
learns about each type of data. Once the training process is completed, the model is
tested on the basis of test data (a subset of the training set), and then it predicts the
output.
The working of Supervised learning can be easily understood by the below example
and diagram:
suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled
as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.
Steps Involved in Supervised Learning:
o First Determine the type of training dataset
o Collect/Gather the labelled training data.
o Split the training dataset into training dataset, test dataset, and validation
dataset.
o Determine the input features of the training dataset, which should have enough
knowledge so that the model can accurately predict the output.
o Determine the suitable algorithm for the model, such as support vector machine,
decision tree, etc.
o Execute the algorithm on the training dataset. Sometimes we need validation sets as
the control parameters, which are the subset of training datasets.
o Evaluate the accuracy of the model by providing the test set. If the model predicts
the correct output, which means our model is accurate.
1. Regression
Regression algorithms are used if there is a relationship between the input variable
and the output variable. It is used for the prediction of continuous variables, such as
Weather forecasting, Market Trends, etc.
2. Classification
Classification algorithms are used when the output variable is categorical, which means
there are two classes such as
Unsupervised Learning
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
The unsupervised learning algorithm can be further categorized into two types of problems:
o Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities
with the objects of another group. Cluster analysis finds the commonalities between
the data objects and categorizes them as per the presence and absence of those
commonalities.
o Association: An association rule is an unsupervised learning method which is used
for finding the relationships between variables in the large database. It determines
the set of items that occurs together in the dataset. Association rule makes
marketing strategy more effective. Such as people who buy X item (suppose a
bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
Reinforcement Learning
"Reinforcement learning is a type of machine learning method where an intelligent
agent (computer program) interacts with the environment and learns to act within
that."
o Reinforcement Learning is a feedback-based Machine learning technique in which
an agent learns to behave in an environment by performing the actions and seeing
the results of actions. For each good action, the agent gets positive feedback, and
for each bad action, the agent gets negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without
any labeled data, unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.
Example: Suppose there is an AI agent present within a maze environment, and his
goal is to find the diamond. The agent interacts with the environment by performing
some actions, and based on those actions, the state of the agent gets changed, and it
also receives a reward or penalty as feedback.
o The agent continues doing these three things (take action, change state/remain in
the same state, and get feedback), and by doing these actions, he learns and
explores the environment.
o The agent learns that what actions lead to positive feedback or rewards and what
actions lead to negative feedback penalty. As a positive reward, the agent gets a
positive point, and as a penalty, it gets a negative point.
As these technologies look similar, most of the persons have misconceptions about
'Deep Learning, Machinelearning, and Artificial Intelligence' that all three are similar to
each other. But in reality, although all thesetechnologies are used to build intelligent
machines or applications that behave like a human, still, they differby their
functionalities and scope.
It means these three terms are often used interchangeably, but they do not quite
refer to the same things.Let's understand thefundamental difference between deep
learning, machine learning, and Artificial Intelligence with the below image.
With the above image, you can understand Artificial Intelligence is a branch of computer
science that helps us to create smart, intelligent machines. Further, ML is a subfield of AI
that helps to teach machines and build AI- driven applications. On the other hand, Deep
learning is the sub-branch of ML that helps to train ML models with a huge amount of
input and complex algorithms and mainly works with neural networks.
"A computer system able to perform tasks that normally require human intelligence, such as
visual perception, speech recognition, decision-making, and translation between
languages."
What is Deep Learning?
"Deep learning is defined as the subset of machine learning and artificial intelligence that
is based on artificial neural networks". In deep learning, the deep word refers to the
number of layers in a neural network.
Deep Learning is a set of algorithms inspired by the structure and function of the human
brain. It uses a huge amount of structured as well as unstructured data to teach
computers and predicts accurate results. The maindifference between machine learning
and deep learning technologies is of presentation of data. Machine learning uses
structured/unstructured data for learning, while deep learning uses neural networks for
learningmodels.
Machine Learning is not quite there yet; it takes a lot of data for most Machine
Learning algorithms to work properly. Even for very simple problems you typically need
thousands of examples, and for complex problems such as image or speech recognition
you may need millions of examples.
Obviously, if your training data has lots of errors, outliers, and noise, it will make it
impossible for your machine learning model to detect a proper underlying pattern.
Hence, it will not perform well.
So put in every ounce of effort in cleaning up your training data. No matter how good
you are in selecting and hyper tuning the model, this part plays a major role in helping
us make an accurate machine learning model.
“Most Data Scientists spend a significant part of their
If you see some of the instances are clear outliers just discard them or fix them
manually.
If some of the instances are missing a feature like (E.g., 2% of user did not
specify their age), you can either ignore these instances, or fill the missing
values by median age, or train one model with the feature and train one
without it to come up with a conclusion.
3. Irrelevant Features:
Remove Garbage Data
The credit for a successful machine learning project goes to coming up with a good set of
features on which it has been trained (often referred to as feature engineering ), which
includes feature selection, extraction, and creating new features which are other
interesting topics to be covered in upcoming blogs.
4. Non representative training data:
To make sure that our model generalizes well, we have to make sure that our training
data should be representative of the new cases that we want to generalize to.
For E.G., Let us say you are trying to build a model that recognizes the genre of music.
One way to build your training set is to search it on youtube and use the resulting data.
Here we assume that youtube’s search engine is providing representative data but in
reality, the search will be biased towards popular artists and maybe even the artists
that are popular in your location(if you live in India you will be getting the music of
Arijit Singh, Sonu Nigam or etc).
So use representative data during training, so your model won’t be biased among one or
two classes when it
works on testing data.
Overfitting happens when the model is too complex relative to the amount and noisiness
of the training data.The possible solutions are:
To simplify the model by selecting one with fewer parameters (e.g., a linear model
rather than a high-degree polynomial model), by reducing the number of attributes in
the training data or by constraining the model
• To reduce the noise in the training data (e.g., fix data errors and remove outliers)
The tradeoffs in statistical learning are mostly from bias and variance tradeoff which gives
over fitting and uderfitting of data.
Bias: The bias is known as the difference between the prediction of the values by
the Machine Learning model and the correct value. Being high in biasing gives a large
error in training as well as testing data. It recommended that an algorithm should always
be low-biased to avoid the problem of under fitting. By high bias, the data predicted is in
a straight line format, thus not fitting accurately in the data in the data set. Such fitting is
known as the Under fitting of data.
Variance: The variability of model prediction for a given data point which tells us the
spread of our data is called the variance of the model. The model with high variance has
a very complex fit to the training data and thus is not able to fit accurately on the data
which it hasn’t seen before. As a result, such models perform very well on training data
but have high error rates on test data. When a model is high on variance, it is then said to
as Over fitting of Data.
Bias Variance Tradeoff
If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias
and low variance condition and thus is error-prone. If algorithms fit too complex
(hypothesis with high degree equation) then it may be on high variance and low bias. In the
latter condition, the new entries will not perform well. Well, there is something between
both of these conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in
complexity is why there is a tradeoff between bias and variance. An algorithm can’t be
more complex and less complex at the same time. For the graph, the perfect tradeoff will
be like this.
We try to optimize the value of the total error for the model by using the bias and
variance Tradeoff.
The best fit will be given by the hypothesis on the tradeoff point. The error to complexity
graph to show trade-off is given as –
This is referred to as the best point chosen for the training of the algorithm which gives
low error in training as well as testing data.
Sampling Distribution:
The sampling distribution of an estimator is a theoretical probability distribution that
shows the possible values that the estimator can take when calculated from different
random samples of the same size from the population.
Example:
She wants to analyze the number of teens riding a bicycle between two regions of
13-18
Instead of considering such individual in the population of 13-18 years of age in
the 2 regions, she selected 200 samples randomly from each area.
Here, the average count of the bicycle usage here is the sample mean
Each chosen sample has its own generated mean, and the distribution for average
mean is the sample distribution
The deviation obtained is termed the standard error
She plots the data gathered from the sample on a graph to go a clear view of the
finite sample distribution
The Empirical Risk Minimization (ERM) principle is a learning paradigm which consists
in selecting the model with minimal average error over the training set. This so-called
training error can be seen as an estimate of the risk (due to the law of large numbers),
hence the alternative name of empirical risk.
By minimizing the empirical risk, we hope to obtain a model with a low value of the risk.
The larger the training set size is, the closer to the true risk the empirical risk
Assume p(x,y) be the probability density of the distribution from which training samples
are drawn.
Trained Machine:A machine when f(.) selected gives the output ŷ=f(x)
Trained Machine
f(.)
x---------- --------ŷ
There may be multiple options available for f(.)
We need to selecty optimal f(.) that minimizes the loss in the predictions.
Loss function L(y,f(x,w)):To measure the error between the actual output y and predicted
output ŷ=f(x,w)
Risk Function R(w):Refers to the risk /expected loss associated with the decision f(x,w)
Our goal is to train a machine with decision function f(x,w) against p(x,y) that minimizes
the risk function R(w)
Here the issue is that the joint probability density function p(x,y) is not know explicitly.
Then the true risk function R(w) is approximated as empirical risk function Remp(W)
from the training samples {Xi,Yi}.
N
Remp(w)=1/N ∑ L(Yi,f(xi,W))
i=1