0% found this document useful (0 votes)
25 views

Intorduction of ML

Uploaded by

priyankabhatele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Intorduction of ML

Uploaded by

priyankabhatele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit -1

Machine learning (ML):


It is the scientific study of algorithms and statistical models that computer systems use to
perform a specific task without using explicit instructions, relying on patterns
and inference instead.
It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical
model based on sample data, known as "training data", in order to make predictions or decisions
without being explicitly programmed to perform the task. [1][2]:2 Machine learning algorithms are
used in a wide variety of applications, such as email filtering and computer vision, where it is
difficult or infeasible to develop a conventional algorithm for effectively performing the task.
Although a machine learning model may apply a mix of different techniques, the
methods for learning can typically be categorized as three general types:

 Supervised learning: The learning algorithm is given labeled data and the
desired output. For example, pictures of dogs labeled “dog” will help the
algorithm identify the rules to classify pictures of dogs.
 Unsupervised learning: The data given to the learning algorithm is
unlabeled, and the algorithm is asked to identify patterns in the input data. For
example, the recommendation system of an e-commerce website where the
learning algorithm discovers similar items often bought together.
 Reinforcement learning: The algorithm interacts with a dynamic
environment that provides feedback in terms of rewards and punishments. For
example, self-driving cars being rewarded to stay on the road.1
Supervised Learning
 Supervised learning do the work of function approximation, where basically we train an
algorithm and in the end of the process we pick the function that best describes the input
data, the one that for a given X makes the best estimation of y (X -> y). Most of the time
we are not able to figure out the true function that always make the correct predictions and
other reason is that the algorithm rely upon an assumption made by humans about how the
computer should learn and this assumptions introduce a bias, Bias is topic I’ll explain
in another post.

 Here input dataset acts as a teacher where we feed the computer with training data containing
the input/predictors and we show it the correct answers (output or the label of input
predictors ). Form the training dataset, the model learns the mapping function between the
input predictors and the output variable.

 Supervised learning algorithms try to model relationships and dependencies between the
target prediction output and the input features such that we can predict the output values for
new data based on those relationships which it learned from the previous data sets.

 Supervised learning based models are the predictive models that predict either the value of a
continuous variable ( like temperature, stock price etc) which we calls regression, another is
the prediction of class ( like input image is of dog or cat) which we call classification models.

List of Common Algorithms of Supervised learning


 Nearest Neighbor
 Naive Bayes
 Decision Trees
 Linear Regression
 Support Vector Machines (SVM)
 Neural Networks

Classification and Regression in supervised Learning:


Classification algorithms and regression algorithms are types of supervised learning.
Classification algorithms are used when the value of the output variable is restricted to a limited
set of values i.e. class numbers. For a classification algorithm that filters emails, the input would
be an incoming email, and the output would be the name of the folder in which to file the email.
For an algorithm that identifies spam emails, the output would be the prediction of either " spam"
or "not spam", represented by the Boolean values true and false.
Regression algorithms are named for their continuous outputs, meaning they may have any value
within a range. Examples of a continuous value are the temperature, length, or price of an object.

.
In the case of semi-supervised learning algorithms, some of the training examples are missing
training labels, but they can nevertheless be used to improve the quality of a model. In weakly
supervised learning, the training labels are noisy, limited, or imprecise; however, these labels are
often cheaper to obtain, resulting in larger effective training sets.

Unsupervised learning
Unsupervised learning algorithms take a set of data that contains only inputs, and find structure
in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test
data that has not been labeled, classified or categorized. Instead of responding to feedback,
unsupervised learning algorithms identify commonalities in the data and react based on the
presence or absence of such commonalities in each new piece of data. A central application of
unsupervised learning is in the field of density estimation in statistics, though unsupervised
learning encompasses other domains involving summarizing and explaining data features.
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to one or more predesignated criteria,
while observations drawn from different clusters are dissimilar. Different clustering techniques
make different assumptions on the structure of the data, often defined by some similarity
metric and evaluated, for example, by internal compactness, or the similarity between members
of the same cluster, and separation, the difference between clusters. Other methods are based
on estimated density and graph connectivity.

Semi-supervised learning
Semi-supervised learning falls between unsupervised learning (without any labeled training data)
and supervised learning (with completely labeled training data). Many machine-learning
researchers have found that unlabeled data, when used in conjunction with a small amount of
labeled data, can produce a considerable improvement in learning accuracy.

Reinforcement learning
Reinforcement learning is an area of machine learning concerned with how software
agents ought to take actions in an environment so as to maximize some notion of cumulative
reward. Due to its generality, the field is studied in many other disciplines, such as game
theory, control theory, operations research, information theory, simulation-based
optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In
machine learning, the environment is typically represented as a Markov Decision
Process (MDP). Many reinforcement learning algorithms use dynamic
programming techniques. Reinforcement learning algorithms do not assume knowledge of an
exact mathematical model of the MDP, and are used when exact models are infeasible.
Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game
against a human opponent.
Application of Machine Learning:

Limitations of Machine Learning:

Lack of Data : Many machine learning algorithms require large amounts of data
before they begin to give useful results. A good example of this is a neural
network. Neural networks are data-eating machines that require copious amounts
of training data. The larger the architecture, the more data is needed to produce
viable results. Reusing data is a bad idea, and data augmentation is useful to some
extent, but having more data is always the preferred solution. If you can get the
data, then use it.

Lack of Good Data: Despite the appearance, this is not the same as the above
comment. Let’s imagine you think you can cheat by generating ten thousand fake
data points to put in your neural network. What happens when you put it in?

1. It will train itself, and then when you come to test it on an unseen data set, it
will not perform well. You had the data but the quality of the data was not up
to scratch.
2. In the same way that having a lack of good features can cause your algorithm
to perform poorly, having a lack of good ground truth data can also limit the
capabilities of your model. No company is going to implement a machine
learning model that performs worse than human-level error.
3. Similarly, applying a model that was trained on a set of data in one situation
may not necessarily apply as well to a second situation. The best example of
this I have found so far is in breast cancer prediction.
4. Mammography databases have a lot of images in them, but they suffer from
one problem that has caused significant issues in recent years — almost all of
the x-rays are from white women. This may not sound like a big deal, but
actually, black women have been shown to be 42 percent more likely to die
from breast cancer due to a wide range of factors that may include
differences in detection and access to health care. Thus, training an algorithm
primarily on white women adversely impacts black women in this case.
5. What is needed in this specific case is a larger number of x-rays of black
patients in the training database, more features relevant to the cause of this 42
percent increased likelihood, and for the algorithm to be more equitable by
stratifying the dataset along the relevant axes.

Data Augmentation

Data augmentation is a method by which you can virtually increase the number of samples in
your dataset using data you already have. For image augmentation, it can be achieved
by performing geometric transformations, changes to color, brightness, contrast or by adding
some noise. Currently there are ongoing studies on interesting new methods in data
augmentation using Generative Adversarial Networks or by pairing samples.

Data Augmentation in image Processing:

 Position augmentation
 Scaling
 Cropping
 Flipping
 Padding
 Rotation
 Translation
 Affine transformation

 Color augmentation
 Brightness
 Contrast
 Saturation
 Hue

Scaling
In scaling or resizing, the image is resized to the given size e.g. the width of the image can be
doubled.

Cropping
In cropping, a portion of the image is selected e.g. in the given example the center cropped image is
returned
Flipping
In flipping, the image is flipped horizontally or vertically.

Padding
In padding, the image is padded with a given value on all sides.

Rotation
The image is rotated randomly in rotation.
Translation
In translation, the image is moved either along the x-axis or y-axis.

Color augmentation
Color augmentation or color jittering deals with altering the color properties of an image by changing
its pixel values.

Brightness
One way to augment is to change the brightness of the image. The resultant image becomes darker
or lighter compared to the original one.

Contrast
The contrast is defined as the degree of separation between the darkest and brightest areas of an
image. The contrast of the image can also be changed.

Saturation
Saturation is the separation between colors of an image.

Hue
Hue can be described of as the shade of the colors in an image
Topic: Eigen vector and Eigen value
Eigenvectors and eigenvalues have many important applications in computer vision and machine
learning in general. Well known examples are PCA (Principal Component Analysis) for
dimensionality reduction or EigenFaces for face recognition. An interesting use of eigenvectors and
eigenvalues is also illustrated in my post about error ellipses. Furthermore, eigendecomposition
forms the base of the geometric interpretation of covariance matrices, discussed in an more recent
post. In this article, I will provide a gentle introduction into this mathematical concept, and will show
how to manually obtain the eigendecomposition of a 2D square matrix.

An eigenvector is a vector whose direction remains unchanged when a linear transformation is


applied to it. Consider the image below in which three vectors are shown. The green square is only
drawn to illustrate the linear transformation that is applied to each of these three vectors.

Eigenvectors (red) do not change direction when a linear transformation (e.g. scaling) is applied to
them. Other vectors (yellow) do.

The transformation in this case is a simple scaling with factor 2 in the horizontal direction and factor
0.5 in the vertical direction, such that the transformation matrix is defined as:

A vector is then scaled by applying this transformation as . The above


figure shows that the direction of some vectors (shown in red) is not affected by this linear
transformation. These vectors are called eigenvectors of the transformation, and uniquely define the
square matrix . This unique, deterministic relation is exactly the reason that those vectors are
called ‘eigenvectors’ (Eigen means ‘specific’ in German).

In general, the eigenvector of a matrix is the vector for which the following holds:
where is a scalar value called the ‘eigenvalue’. This means that the linear transformation on
vector is completely defined by .

We can rewrite equation (1) as follows:

where is the identity matrix of the same dimensions as .

However, assuming that is not the null-vector, equation (2) can only be defined if is
not invertible. If a square matrix is not invertible, that means that its determinant must equal zero.
Therefore, to find the eigenvectors of , we simply have to solve the following equation:

In the following sections we will determine the eigenvectors and eigenvalues of a matrix , by
solving equation (3). Matrix in this example, is defined by:

Calculating the eigenvalues


To determine the eigenvalues for this example, we substitute in equation (3) by equation (4) and
obtain:

Calculating the determinant gives:

(6)

To solve this quadratic equation in , we find the discriminant:

Since the discriminant is strictly positive, this means that two different values for exist:
We have now determined the two eigenvalues and . Note that a square matrix of
size always has exactly eigenvalues, each with a corresponding eigenvector. The
eigenvalue specifies the size of the eigenvector.

Calculating the first eigenvector


We can now determine the eigenvectors by plugging the eigenvalues from equation (7) into equation
(1) that originally defined the problem. The eigenvectors are then found by solving this system of
equations.

We first do this for eigenvalue , in order to find the corresponding first eigenvector:

Since this is simply the matrix notation for a system of equations, we can write it in its equivalent
form:

and solve the first equation as a function of , resulting in:

Since an eigenvector simply represents an orientation (the corresponding eigenvalue represents the
magnitude), all scalar multiples of the eigenvector are vectors that are parallel to this eigenvector,
and are therefore equivalent (If we would normalize the vectors, they would all be equal). Thus,
instead of further solving the above system of equations, we can freely chose a real value for
either or , and determine the other one by using equation (9).

For this example, we arbitrarily choose , such that . Therefore, the


eigenvector that corresponds to eigenvalue is
Calculating the second eigenvector
Calculations for the second eigenvector are similar to those needed for the first eigenvector;
We now substitute eigenvalue into equation (1), yielding:

Written as a system of equations, this is equivalent to:

Solving the first equation as a function of resuls in:

We then arbitrarily choose , and find . Therefore, the eigenvector that


corresponds to eigenvalue is
Topic : Gradient Descent Based Linear Regression

It is a kind of Supervised Learning which can be used to predict the value of a


continuous variable like temperature, pressure, stock price.

The training dataset will be divided into two sections, one is the set of independent
variables(set of independent features) and another is the dependent variable which
is to be predicted. For example: In dataset of mobile price prediction, the input
dataset will be divided, input feature set ( CPU speed, ram, pixels for camera,
battery ) and output feature will be price which is dependent on the input feature
set.

Error
Function

Steps to calculate Linear Function

1. Assume Random values for m and b (slope and intercept)


2. For each ith Iteration or epoch, repeat the process from step 3 to 7
3. Evaluate gradient(Gm, Gb) for m and b using error from each ith sample
according to eq.1

4. Update M and C
5. m=m-(learning rate*Gm)
6. b= b-(Learning rate*Gb)

You might also like