0% found this document useful (0 votes)
26 views

Unit 4_Question Bank and answers

The document is a question bank prepared by Dr. P.N. Nagare for the Mechanical Engineering Department at Amrutvahini College of Engineering, focusing on the development of machine learning models. It covers various topics including typical machine learning problems, steps for model development, data preprocessing techniques, the importance of training and testing data, and concepts of overfitting and underfitting. Additionally, it discusses cross-validation techniques and the use of confusion matrices for evaluating classification models.

Uploaded by

Manoj Kanawade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Unit 4_Question Bank and answers

The document is a question bank prepared by Dr. P.N. Nagare for the Mechanical Engineering Department at Amrutvahini College of Engineering, focusing on the development of machine learning models. It covers various topics including typical machine learning problems, steps for model development, data preprocessing techniques, the importance of training and testing data, and concepts of overfitting and underfitting. Additionally, it discusses cross-validation techniques and the use of confusion matrices for evaluating classification models.

Uploaded by

Manoj Kanawade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Amrutvahini College of Engineering, Sangamner

Department of Mechanical Engineering


Question Bank and Answers
Unit 4: Development of ML Model
Prepared by Dr. P.N. Nagare
Q1 What are four typical problems to be solved using machine learning approach?
Four typical problems solved using machine learning approach are

1. Regression - If the prediction value tends to be a continuous value then it falls under
Regression type problem in machine learning. Giving area name, size of land, etc. as features
and predicting expected cost of the land.

2. Classification - If the prediction value tends to be category/discrete like yes/no,


positive/negative, etc. then it falls under classification type problem in machine learning.
Given a sentence predicting whether it is negative or positive review

3. Clustering - Grouping a set of points to given number of clusters for given unlabeled dataset
(Unsupervised learning).

4. Ranking - Constructs a ranker from a set of labelled examples. This example set
consists of instance groups that can be scored with a given criteria. The ranking labels are { 0,
1, 2, 3, 4 } for each instance. The ranker is trained to rank new instance groups with
unknown scores for each instance.
Q2 Explain the steps involved in the development of the ML (Classification or Regression) model.
Following are the steps to be considered in development of classification model.

Steps in Building a Machine Learning Model

1. Define the Problem

 Identify whether it's a classification (predict category) or regression (predict number)


task.
 Understand the objective and desired outcome.

2. Collect Data

 Gather data from sources like files, databases, APIs, or web scraping.
 The quality and quantity of data are key to model success.

3. Explore and Analyze the Data (EDA)

 Visualize and summarize the data.


 Understand distributions, patterns, correlations, and potential problems like missing values
or outliers.

4. Preprocess the Data

 Handle missing values


 Encode categorical variables
 Normalize or standardize features
 Remove outliers (if necessary)
 Split into features (X) and target (y)
5. Split the Dataset

 Typically split into:


o Training Set (e.g., 70-80%)
o Testing Set (e.g., 20-30%)
 Optional: also use a Validation Set or cross-validation.

6. Choose a Model

 Regression Examples: Linear Regression, Decision Tree Regressor, Random Forest


Regressor.
 Classification Examples: Logistic Regression, K-Nearest Neighbors, Decision Trees,
SVM, Random Forest.

7. Train the Model

 Fit the model to the training data using .fit() in most libraries like Scikit-learn.

8. Evaluate the Model

 Use the test set to assess how well your model generalizes.
 Regression Metrics:
o Mean Absolute Error (MAE)
o Mean Squared Error (MSE)
o R-squared (R²)
 Classification Metrics:
o Accuracy
o Precision, Recall, F1 Score
o Confusion Matrix
o ROC-AUC

9. Tune Hyperparameters

 Use Grid Search or Randomized Search with cross-validation to find the best parameters.
 Improves performance and reduces overfitting.

10. Deploy the Model (Make predictions)

 Save and serve the model using tools like:


o Pickle / Joblib (Python)
o Flask / FastAPI for APIs
o Cloud services (AWS, GCP, Azure)

11. Monitor and Maintain

 Check model performance over time.


 Retrain with new data if needed.
Q3 Why is data preprocessing required? Explain techniques of preprocessing data
Data preprocessing is a crucial step in the machine learning pipeline because raw data is often
incomplete, inconsistent, noisy, or in an unsuitable format for analysis. Preprocessing transforms
this raw data into a clean, structured form that improves the performance, accuracy, and reliability
of machine learning models.

 Why is Data Preprocessing Required?


1. Handles Missing Values: Missing data can skew the analysis or make algorithms fail.
Preprocessing fills or removes missing entries.
2. Improves Accuracy: Clean and well-prepared data allows algorithms to learn patterns more
effectively.
3. Reduces Noise and Redundancy: Irrelevant or repetitive data can confuse models and
increase training time.
4. Standardizes Data: Algorithms like KNN or SVM are sensitive to data scales;
preprocessing ensures features contribute equally.
5. Prepares Data for Specific Algorithms: Some algorithms require numerical input, so text
or categorical data must be converted.

 Common Data Preprocessing Techniques

1. Data Cleaning
o Handling Missing Values:
 Remove rows/columns with missing values.
 Impute with mean, median, mode, or use interpolation.
o Handling Outliers:
 Detect using statistical methods (e.g., Z-score, IQR).
 Remove or cap/floor extreme values.
2. Data Transformation
o Normalization (Min-Max Scaling): Scales data to a range (usually 0 to 1).

o Standardization (Z-score Scaling): Centers data to mean 0 and std dev 1.


o Log Transformation: Reduces skewness in data.
3. Encoding Categorical Variables
o Label Encoding: Assigns each category a unique number.
o One-Hot Encoding: Creates binary columns for each category.
o Ordinal Encoding: Converts ordered categories into numerical values based on
their order.
4. Feature Extraction
o Extracts meaningful information from raw data (e.g., date-time features, text
features using TF-IDF).
5. Feature Selection
o Removes irrelevant or redundant features using statistical tests or model-based
importance.
6. Dimensionality Reduction
o Techniques like PCA (Principal Component Analysis) reduce the number of
features while retaining essential information.
7. Data Integration

o Combines data from multiple sources into a unified dataset.

8. Data Discretization

o Converts continuous data into categorical bins (e.g., age into "young", "adult",
"senior").

Q4 Explain the difference between training data and Testing data in a Dataset? How it is useful
in a Machine Learning Model?
Training Data vs Testing Data
Feature Training Data Testing Data
Purpose To teach (train) the model To evaluate how well the model
to learn patterns. performs.

Usage Used to fit the model Used to test the model's predictions.
(model learns from this).
Seen by Yes, during training. No, kept hidden during training.
Model?
Size 70–80% of the dataset. 20–30% of the dataset.
(typically)
Role Helps in building the Checks the model’s generalization
model. ability.

Why Are They Useful?

 Training Data
 This is where the model learns relationships between inputs and outputs.
 For example, in a regression task, the model uses training data to learn the best-fit line.
 In classification, it learns to assign labels to features.

 Testing Data
 Acts like a final exam for the model.
 It helps evaluate how well the model will perform on new, unseen data in the real world.
 It helps detect issues like overfitting (model memorizes training data but fails on new data).
Example:
Imagine building a spam email classifier:
 Training data: Emails labeled as spam or not-spam that the model uses to learn.
 Testing data: New emails the model hasn't seen, used to test if it correctly classifies them
as spam or not.

Best Practice:

 Always keep training and testing data separate.


 You can also use cross-validation to make better use of limited data and ensure consistent
performance.

Q5 What is training data, labeled data and unlabeled data? What are key steps involved in
developing training data?

 What is Training Data?


Training data is the part of your dataset that is used to train a machine learning model. It contains
input features (X) and, in supervised learning, the correct output labels (y).
 Example (for spam detection):
o Input (email text) → "Win a prize now!"
o Label → Spam (1)
The model looks at many examples like this to learn patterns and make predictions.

 What is Labeled Data?


Labeled data means that each data point includes both:
 Input features (like text, image, or numbers)
 Target/output labels (like a category or value)
Used in supervised learning where the model learns from known answers.
Example:
Review Sentiment

"I love this movie!" Positive

"Terrible experience." Negative

 What is Unlabeled Data?


 Unlabeled data only includes the input features, without known output labels.
 Used in unsupervised learning, like clustering or dimensionality reduction, or in semi-
supervised learning.
Example:
Review

"The product is okay."

"Worst service ever."

The model tries to group or understand structure without knowing the actual sentiment.

 Key Steps in Developing Training Data


Creating quality training data is a vital step for model success. Here's how it typically goes:
1. Data Collection
 Gather raw data from various sources (e.g., surveys, logs, APIs, databases).
2. Data Labeling
 Tag each example with the correct output label (manually or with tools).
 May involve domain experts (e.g., labeling medical scans).
3. Data Cleaning
 Remove duplicates, fix errors, handle missing values, and eliminate irrelevant data.
4. Preprocessing
 Normalize/scale numeric data.
 Encode categorical data.
 Tokenize or vectorize text.
 Resize or normalize images.
5. Data Augmentation (optional)
 Especially useful in image/text data to create variations and improve model robustness.
6. Splitting the Data
 Divide the dataset into:
o Training set (for learning)
o Validation set (optional, for tuning)
o Testing set (for evaluation)
7. Quality Checks
 Ensure labels are accurate.
 Balance classes to avoid bias (e.g., equal number of positive and negative examples).
 Check for label leakage (no hints from input that directly give away the label).
Q6 Explain the following terms : i) Over fitted model ii) Underfitted model iii) Good model
 Overfitting:
When a model is overfitted, it learns the training data with high accuracy, but struggles to generalize
to new data. This happens when the model is too complex and learns the specific details and noise
in the training set, rather than the underlying relationships. For example, a model might learn that a
specific set of pixels in an image indicates a cat, even though those pixels aren't a reliable indicator
in general. This leads to a model that performs very well on the training data but poorly on data it
hasn't seen before.

Reasons for Overfitting:


1. High variance and low bias.
2. The model is too complex.
3. The size of the training data.
 Underfitting:
An underfitted model is too simple to capture the underlying patterns in the data. It fails to learn the
relationships between the inputs and outputs, resulting in poor performance on both the training data
and new data. For example, a linear regression model might be underfitted if the relationship
between the inputs and outputs is not linear.

Reasons for Underfitting:


1. The model is too simple, So it may be not capable to represent the complexities in the data.
2. The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.
3. The size of the training dataset used is not enough.
4. Excessive regularization are used to prevent the overfitting, which constraint the model to
capture the data well.
5. Features are not scaled.
 Good Model:
A good model strikes a balance between simplicity and complexity. It learns the underlying
relationships in the data without overfitting to the noise or specific details of the training set. A good
model is one that generalizes well to new, unseen data, meaning it can make accurate predictions
on data it hasn't seen before.
Q7 What are the different cross validation techniques? Explain K-fold cross validation with neat
sketch.
Cross-validation splits the data into multiple parts (called folds) and trains/tests the model on
different combinations of these parts. This helps ensure the model isn’t just performing well on one
specific train-test split.

 Different Cross-Validation Techniques


1. Hold-Out Validation
 Split the dataset into two sets: training and testing (e.g., 80% train, 20% test).
 Simple but may lead to high variance if the split is not representative.
📌 Use When: You have a large dataset.

2. K-Fold Cross-Validation
 Divide the dataset into K equal parts (folds).
 Train the model on K-1 folds and test it on the remaining one.
 Repeat this K times, each time changing the test fold.
 Average the results for final evaluation.
📌 Commonly used K = 5 or 10.
Example (K = 5):
Run 1: [Train | Train | Train | Train | Test ]
Run 2: [Train | Train | Train | Test | Train]

3. Stratified K-Fold Cross-Validation


 Similar to K-Fold, but maintains the same class distribution in each fold (important for
classification tasks with imbalanced data).
📌 Use When: Dealing with classification problems.

4. Leave-One-Out Cross-Validation (LOOCV)


 Each fold contains only one data point for testing, and the rest for training.
 For n data points, the model is trained n times.
📌 Very accurate but computationally expensive, especially on large datasets.

5. Leave-P-Out Cross-Validation
 Similar to LOOCV, but instead of 1, P data points are used for testing.
 Repeat this for every possible combination.
📌 Very rarely used due to high computational cost.

6. Time Series Cross-Validation (Rolling/Forward Chaining)


 For time series data, we cannot shuffle the data.
 Train on the past, test on the future.
Example:
Fold 1: Train [1,2,3], Test [4]
Fold 2: Train [1,2,3,4], Test [5]
Fold 3: Train [1,2,3,4,5], Test [6]

📌 Use When: Working with time-dependent data (e.g., stock prices, weather, logs).

 K Fold Cross Validation Method

 The classifier model can be designed/trained and performance can be evaluated based on
K-fold cross-validation mode, training mode and test mode.
 The main idea behind K-Fold cross-validation is that each sample in our dataset has the
opportunity of being tested. It is a special case of cross-validation where we iterate over
a dataset set k times. In each round, we split the dataset into k parts. one part is used
for validation, and the remaining k-1 parts are merged into a training subset for model
evaluation
 Computation time is reduced as we repeated the process only 10 times when the value of
k is 10. It has Reduced bias.
 Every data points get to be tested exactly once and is used in training k-1 times.
 The variance of the resulting estimate is reduced as k increases.

Advantages of K-fold cross-validation mode

• Computation time is reduced as we repeated the process only 10 times when the value of k
is 10.
• Reduced bias
• Every data points get to be tested exactly once and is used in training k-1 times
• The variance of the resulting estimate is reduced as k increases
Q8 Explain use of Confusion matrix in Machine Learning Model with suitable example. Confusion
matrix
 A Confusion matrix is an N x N matrix used for evaluating the performance of a
classification model, where N is the number of target classes. The matrix compares the
actual target values with those predicted by the machine learning model.
 A confusion matrix is a table that describes the performance of a classification model by
comparing the actual values (true labels) with the predicted values.
 It gives a detailed breakdown of how well the model is classifying each class.
Structure of a Confusion Matrix (Binary Classification)

Predicted Class
Actual Class

Predicted: Negative
Predicted: Positive (1)
(0)
Actual: Positive (1) ✅ True Positive (TP) ❌ False Negative (FN)
Actual: Negative
❌ False Positive (FP) ✅ True Negative (TN)
(0)

 Diagonal values in Confusion Matrix are Truly Classified Values (Correctly Classified)

 Off diagonal Values are Falsely Classified Values (Incorrectly classified)


What can we learn from this matrix?

 There are two possible predicted classes: “yes" and "no". If we were predicting the
presence of a disease, for example, "yes" would mean they have the disease, and "no" would
mean they don't have the disease.

 The classifier made a total of 165 predictions (e.g., 165 patients were being tested for
the presence of that disease).

 Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.

 In reality, 105 patients in the sample have the disease, and 60 patients do not.
 True positives (TP): these are cases in which we predicted yes (they have the disease), and
they do have the disease.
 True negatives (TN): we predicted no, and they don't have the disease.
 False positives (FP): we predicted yes, but they don't actually
have the disease. (Also known as a "type I error.")
 False negatives (FN): we predicted no, but they actually do have the disease. (Also known
as a "type II error.")

Q9 Explain different performance evaluators used for interpretation/ assessment of classification


models. Explain 2×2 confusion matrix and explain its terminology.
i) Accuracy
ii) Precision
iii) Recall
iv) F1 score
v) Cohen’s Kappa
vi) ROC
Different performance evaluators used for interpretation/ assessment are Accuracy, Precision,
Recall, F1 score, Cohen’s Kappa coefficient, ROC Curve.
2 x 2 Confusion Matrix (Binary Classification) :

Predicted Class
Actual Class

Predicted: Negative
Predicted: Positive (1)
(0)
Actual: Positive (1) ✅ True Positive (TP) ❌ False Negative (FN)
Actual: Negative
❌ False Positive (FP) ✅ True Negative (TN)
(0)

 Diagonal values in Confusion Matrix are Truly Classified Values (Correctly Classified)

 Off diagonal Values are Falsely Classified Values (Incorrectly classified)


What can we learn from this matrix?

 There are two possible predicted classes: “yes" and "no". If we were predicting the
presence of a disease, for example, "yes" would mean they have the disease, and "no" would
mean they don't have the disease.

 The classifier made a total of 165 predictions (e.g., 165 patients were being tested for
the presence of that disease).

 Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.

 In reality, 105 patients in the sample have the disease, and 60 patients do not.

 True Positives (TP): these are cases in which we predicted yes (they have the disease), and
they do have the disease.

 True Negatives (TN): we predicted no, and they don't have the disease.

 False Positives (FP): we predicted yes, but they don't actually


have the disease. (Also known as a "type I error.")

 False Negatives (FN): we predicted no, but they actually do have the disease. (Also known
as a "type II error.")

1. Accuracy: Accuracy explain, how often the model is correct.

2. Misclassification Rate: Overall, how often is it wrong? which is equivalent to 1 minus


Accuracy
Misclassification rate= (FP+FN)/total

3. Precision: Precision explain, how many of the predicted positives are actually positive

4. True positive Rate (TP Rate) or Recall or Sensitivity: When it's actually yes, how often
does it predict yes?

TP Rate or Recall or Sensitivity= TP/actual yes

5. False Positive Rate (FP Rate) : When it's actually no, how often does it predict yes?
FP Rate = FP/actual no

6. True Negative Rate (TN Rate): When it's actually no, how often does it predict no?

TN Rate= TN/actual no

7. False Negative Rate (FN Rate): When it's actually no, how often does it predict no?

FN Rate= FN/actual yes

8. F1 Score: Harmonic mean of precision and recall. Useful when you want a balance.
9. Cohen’s Kappa: It is a very useful metric when you want to measure how much agreement
exists between two raters or classifiers, beyond chance. Cohen’s Kappa (κ) measures the
agreement between two sets of categorical data (e.g., actual vs predicted labels), correcting
for the agreement that could happen by chance. It’s commonly used in:

 Classification model evaluation


 Inter-rater reliability (e.g., in medical diagnoses, labeling tasks)
This is essentially a measure of how well the classifier performed as compared to how
well it would have performed simply by chance. In other words, a model will have a
high Kappa score if there is a big difference between the accuracy and the null error rate.
Formula for Cohen’s Kappa:

Where:
 Po = Observed agreement (how often the raters agree)
 Pe = Expected agreement by chance
Interpretation Scale of Cohen’s Kappa Value:

Kappa (κ) Value Level of Agreement

≤0 None or Poor

0.01 – 0.20 Slight

0.21 – 0.40 Fair

0.41 – 0.60 Moderate

0.61 – 0.80 Substantial

0.81 – 1.00 Almost perfect

10. ROC Curve — It is key too in evaluating the performance of classification models,
especially binary classifiers. ROC stands for Receiver Operating Characteristic curve.
It’s a graph that shows the performance of a classification model across different
threshold values. It plots the following:

 X-axis → False Positive Rate (FPR)


 Y-axis → True Positive Rate (TPR) = Recall

How ROC curve Works :

1. Your model returns probabilities instead of just binary predictions (like 0 or 1).
2. You sweep through thresholds (e.g., 0.0 to 1.0).
3. For each threshold, calculate TPR and FPR.
4. Plot TPR vs. FPR.

Interpretation of ROC Curve:

1. A perfect model: ROC curve passes through the top-left corner (TPR = 1, FPR = 0).
2. A random model: ROC curve is a diagonal line from (0,0) to (1,1).
3. Better model: The closer the curve follows the top-left border, the better.

Q10 From the below confusion matrix, determine accuracy, recall, precision,F 1-score, True
Positive Rate (TPR), False Positive Rate (FPR), True Negative Rate (TNR), and False Negative
Rate (FNR). Interpret the Result:
Actual Values

1 0

Predicted 1 540 150


Values
0 110 200

Refer Class Note Book for Solution


Q11 Quality Engineer wants to solve a two-class classification problem for predicting whether a
product is defective. The actual number of products containing no defect are 950 (Truly predicted
positives = 900), the actual number defective products are 150 (Truly predicted negatives =
130). So, calculate accuracy, precision, recall and f1 score.
Refer Class Note Book for Solution
Q12 What is meant by “Hyper parameter tuning” and how it is used to make a machine learning
algorithm work better? Explain tuning of Hyper parameters for any one specific algorithm.
What is a Hyperparameter in Machine Learning?
In machine learning, a hyperparameter is a setting or configuration that is set before the learning
process begins. These values are not learned from the data — you manually define them or use
search techniques to choose them.

Examples of Hyperparameters:
Algorithm Hyperparameter Example

Decision Trees max_depth, min_samples_split

K-Nearest Neighbors k (number of neighbors)

SVM (Support Vector) C, kernel, gamma

Neural Networks learning_rate, batch_size, epochs

Why are Hyper Parameters are Important?


Hyper parameters directly control the behavior and performance of a model.
They affect how fast and how well a model learns, and they help balance between underfitting
and overfitting.

🔍 What is Hyperparameter Tuning?


Hyperparameter tuning is the process of searching for the best combination of hyperparameter
values to improve model performance.

Why Tune Hyperparameters?


 A poor choice of hyperparameters can make a model perform badly.
 Proper tuning can lead to a model that:
o Is more accurate
o Generalizes well to new data
o Trains faster

How is Hyperparameter Tuning Done?


1. Manual Search
Try different values and compare performance.
2. Grid Search
 Test all combinations of hyperparameters from a predefined set.
 Works well but can be slow.
3. Random Search
 Randomly pick combinations to test.
 Faster and surprisingly effective.
4. Bayesian Optimization / AutoML
 Uses smarter algorithms to find the best settings (e.g., with Optuna, Hyperopt).
 More efficient than exhaustive search.

Example of Hyper parameter tanning of Decision Tree Algorithm:


 Hyperparameters: max_depth, min_samples_split
 Try different combinations:
o max_depth = 5, min_samples_split = 2
o max_depth = 10, min_samples_split = 4
 Use cross-validation to check performance.
 Pick the best combination that gives the highest accuracy or lowest error.
Q13 Explain hyperparameter tuning in decision tree. Why is it required?
In a Decision Tree, hyperparameter tuning means finding the best values for key settings that
control how the tree is built — like how deep it can grow or how many samples are needed to split
a node. These hyperparameters affect model complexity, training time, and accuracy.

Key Hyperparameters in a Decision Tree:


Hyperparameter Description

max_depth Maximum depth of the tree (limits how many splits)

min_samples_split Minimum samples required to split an internal node

min_samples_leaf Minimum samples required to be at a leaf node

max_features Number of features to consider when splitting

criterion Function to measure quality of split (gini or entropy)

Why Is Tuning Required?


Without tuning:
 Tree may become too deep → Overfitting (memorizes the training data)
 Tree may be too shallow → Underfitting (misses patterns in the data)
 Model might be slow or inaccurate
With proper tuning:
 You control model complexity
 You improve accuracy and generalization
 You avoid overfitting and underfitting
Example of Hyper parameter tanning of Decision Tree Algorithm:
 Hyperparameters: max_depth, min_samples_split
 Try different combinations:
o max_depth = 5, min_samples_split = 2
o max_depth = 10, min_samples_split = 4
 Use cross-validation to check performance.
Pick the best combination that gives the highest accuracy or lowest error.
Q14 What is hyper parameter tuning? Explain any two hyper parameters in the
Random Forest algorithm.
What is hyper parameter tuning? Refer Que. 13
Two Hyper parameters in Random Forest algorithm :
1. n_estimators (Number of Trees)
What it does:
 Controls how many decision trees the Random Forest builds.
 Each tree is trained on a random sample of the data.
Why it matters:
 More trees → Better predictions (up to a point).
 Too few trees → Low accuracy or unstable results.
 Too many trees → Longer training time, higher memory use.
Typical values:
 Start with 100, and tune higher if needed (e.g., 200, 500, 1000).

2. max_depth (Maximum Tree Depth)


What it does:
 Limits how deep each tree can grow.
 Deep trees = more splits = more complex decisions.
Why it matters:
 Controls overfitting:
o High depth → Overfitting (model memorizes training data)
o Low depth → Underfitting (model too simple)
How to tune:
 Try values like 5, 10, 20, or None (unlimited depth)
 Use cross-validation to find the best value
Q15 What is hyper parameter tuning? Explain any three hyper parameters tuned in SVM?
What is hyper parameter tuning? Refer Que. 13
Three Hyper parameters in SVM algorithm :
1. C (Regularization Parameter)
What it does:
 Controls the trade-off between achieving a low training error and a large margin.
 It determines how much the model tries to avoid misclassifying each training example.
Why it matters:
 Small C → Allows more margin violations (more generalization, but can underfit)
 Large C → Tries to fit the training data closely (less margin, may overfit)

2. kernel (Kernel Function)


What it does:
 Defines how the data is transformed into higher dimensions to make it linearly separable.
Common Kernels:
 'linear' → For linearly separable data
 'poly' → Polynomial kernel (good for non-linear but structured data)
 'rbf' (Radial Basis Function) → Most popular for non-linear problems
 'sigmoid' → Similar to a neural network’s activation
Why it matters:
 Choosing the right kernel can dramatically improve performance.

3. gamma (Kernel Coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’)


What it does:
 Defines how far the influence of a single training example reaches.
 Affects the shape of the decision boundary.
Why it matters:
 Low gamma → Far reach → More general, smoother boundary
 High gamma → Close reach → Very complex boundary, may overfit

You might also like