0% found this document useful (0 votes)
35 views

Module 4

Uploaded by

sushma-icb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Module 4

Uploaded by

sushma-icb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Module 4

Machine learning (ML)

● Machine learning has become a hot topic today, with entrepreneurs all across the world switching to machine learning for
business operations. Machine learning has reached the advancement where it can even predict outcomes without being explicitly
programmed to do so.

● This field of study uses data and algorithms to mimic human learning, allowing machines to improve over time, becoming
increasingly accurate when making predictions or classifications or uncovering data-driven insights.
● Machine Learning, as the name suggests, provides machines with the ability to learn autonomously
based on experiences, observations and analysing patterns within a given data set without explicitly
programming.

● When we write a program or a code for some specific purpose, we are actually writing a definite set of
instructions which the machine will follow.

● Whereas in machine learning, we input a data set through which the machine learns by identifying and
analysing the patterns in the data set. Then, the machine will make decisions autonomously based on
its observations and learnings from the dataset.
● Machine learning plays an important role in the field of enterprises as it enables entrepreneurs to minimise manual
efforts. The machine learning model learns with the help of humans but eventually, the machine learns and takes over
the learnt task.

● Although a minimum level of intervention is needed for making sure that no “machine-related” glitch arises or for
updating the data inputted.

● Nowadays leading companies like Google, Amazon, Facebook, Tesla, and many more are efficiently utilising these
technologies. Hence, machine learning is proving to become a core part of operation and functioning.
Components of machine learning

Every machine learning algorithm has three components:

● Representation
● Evaluation
● Optimization
Representation
● When we talk about representation in machine learning, we're referring to how a model is structured so that a computer can understand and
use it

● Different types of models, like decision trees, support vector machines (SVMs), and neural networks, each have their own way of organizing
data and making predictions.

● Imagine you're building a house. The blueprint you choose (decision tree, SVM, neural network) determines the layout and design
possibilities (classifiers) you can build. Each type of blueprint (representation) has its strengths and limitations. The range of all possible
designs you can create based on that blueprint is called the hypothesis space. It's like the total set of ideas or designs your model can come up
with.

● So, in essence, the representation you choose for your machine learning model sets the boundaries for what kinds of classifiers (models) it can
learn. Different representations offer different strengths and ways of understanding data, influencing how well your model can solve the
problem at hand.
Evaluation
● When we talk about evaluating how well a machine learning model is performing, we use
evaluation functions or metrics. These are essentially tools that measure different aspects of
the model's predictions compared to the actual outcomes.

● There needs to be a function that measures the performance to know which classifiers are
good and which are bad. This is where the evaluation function comes into play.

● Some examples are accuracy, error rate, precision, recall, F-score, squared error, and
information gain. These functions are also referred to as the objective function or scoring
function.
Optimization
● When we talk about machine learning models, they often have many different ways they can
be configured or "tuned." These configurations are like different settings or choices the
model can make to try to solve a problem

● So, instead of trying every single possibility, we use optimization methods. These methods
are like smart strategies that help the model find the best settings more efficiently.

● Choosing the right optimization method is crucial because it determines how quickly and
effectively the model learns from data. Think of it like finding the best route to a destination:
you could wander randomly or use a map and GPS to find the fastest way.
In machine learning projects, we generally divide the original dataset into training data
and test data. We train our model over a subset of the original dataset, i.e., the training
dataset, and then evaluate whether it can generalize well to the new or unseen dataset or
test set.
The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the machine
learning model. Firstly, the training data is fed to the ML algorithms, which lets them learn how to make
predictions for the given task.

Once we train the model with the training dataset, it's time to test the model with the test dataset. This dataset
evaluates the performance of the model and ensures that the model can generalize well with the new or unseen
dataset. The test dataset is another subset of original data, which is independent of the training dataset.
Generalization
● Real-world data is inherently complex, encompassing variations, noise, and unpredictable
factors. In the realm of machine learning and data science, the ultimate objective is to develop
models capable of delivering accurate predictions and valuable insights when confronted with
new and unseen data.

● Generalization in machine learning refers to the ability of a trained model to accurately make
predictions on new, unseen data. Generalization is important because the true test of a model's
effectiveness is not how well it performs on the training data, but rather how well it generalizes
to new and unseen data.
A spam email classifier is a great example of generalization in machine learning. Suppose you have a training
dataset containing emails labeled as either spam or not spam and your goal is to build a model that can accurately
classify incoming emails as spam or legitimate based on their content.
Feature Engineering
If we train machine learning models using irrelevant data, even the best machine learning
algorithms won’t help much. Conversely, using well-engineered meaningful features can
achieve superior performance even with a simple machine learning algorithm

Working on feature engineering is especially important when working with traditional


machine learning algorithms, such as regressions, decision trees, support vector machines,
and others that require numeric inputs.
we can divide feature engineering into two components: 1) creating new features
and 2) processing these features to make them work optimally with the machine
learning algorithm under consideration

Feature Engineering is the process of extracting and organizing the important


features from raw data in such a way that it fits the purpose of the machine
learning model. It can be thought of as the art of selecting the important features
and transforming them into refined and meaningful features that suit the needs of
the model.
Validation Methods
● Following the principle of mistrust, no model is considered acceptable until it
has been tested against data it has not seen before. This process is called
validation.

● Example : cross validation or k-fold validation

● The caret package in R makes it easy to incorporate cross-validation into your


ML process.The ML process is commonly referred to as a pipeline.
K-fold validation
53 62 47 50 36 21 25 28 60 32 10 9

k=4
12(total dataset)/4(k)=3

(o1+o2+o3+o4)/k
Performance metrics
1) Confusion Matrix
Precision, recall, and specificity
True Positive: If a person actually has a disease and the model accurately predicts that they have the disease, then it is
called a true positive. (0)

True Negative : If a person does not have the disease and the model predicts “no,” then this is a true negative. (8)
False Positive: If a person does not have the disease (no) but the model predicts “yes”, then this is a false positive.(0)
False Negative: If a person has the disease (yes) but the model predicts “no,” this is a false negative.(2)
Precision(Positive Predictive Value (PPV)) = True Positives/(True Positives +
False Positives)
Recall(True Positive Rate(TPR)) = True Positives/(True Positives + False
Negatives)
Specificity(True Negative Rate(TNR)) = True Negatives/(True Negatives + False
Positives)
Confusion Matrix Filled Confusion
Format Matrix

Precision

Recall

You might also like