0% found this document useful (0 votes)
13 views

Answer Key

The document provides answer keys for the CS3491 Artificial Intelligence and Machine Learning exam, covering various topics such as types of agents, applications of AI, probability theory, machine learning techniques, and specific algorithms like SVM and gradient descent. It includes detailed explanations of concepts like Bayes' Theorem, ensemble learning, and unsupervised learning methods. Additionally, it presents problem-solving examples like the Water Jug Problem and Bayesian Network applications.

Uploaded by

abernakumari87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Answer Key

The document provides answer keys for the CS3491 Artificial Intelligence and Machine Learning exam, covering various topics such as types of agents, applications of AI, probability theory, machine learning techniques, and specific algorithms like SVM and gradient descent. It includes detailed explanations of concepts like Bayes' Theorem, ensemble learning, and unsupervised learning methods. Additionally, it presents problem-solving examples like the Water Jug Problem and Bayesian Network applications.

Uploaded by

abernakumari87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

B.E/B.TECH.END SEMESTER THEORY EXAMINATIONS,APRIL/MAY 2025.

CS3491-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING


(REGULATIONS 2021)
ANSWER KEYS
PART A 10 X 2 =20 MARKS
1. List the different types of agents.
The different types of agents in AI are:
• Simple Reflex Agents
• Model-based Reflex Agents
• Goal-based Agents
• Utility-based Agents
• Learning Agents
2. Outline various applications of AI / What can AI do today?
Applications of AI include:
• Healthcare (e.g., disease diagnosis, drug discovery)
• Finance (fraud detection, trading algorithms)
• Transportation (autonomous vehicles, traffic management)
• Customer Service (chatbots, virtual assistants)
• Agriculture (yield prediction, pest control)
• Entertainment (recommendation systems, gaming AI)
3. What is the need for probability theory in uncertainty?
Probability theory helps AI systems deal with:
• Uncertainty in perception and action
• Incomplete or noisy data
• Decision-making under risk
It allows reasoning with uncertain knowledge and making rational predictions.
4. State Bayes' Theorem in Artificial Intelligence.
Bayes’ Theorem:
P(H∣E)=P(E∣H)⋅P(H)P(E)P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}P(H∣E)=P(E)P(E∣H)⋅P(H)
Where:
• P(H∣E)P(H|E)P(H∣E) = Posterior probability
• P(E∣H)P(E|H)P(E∣H) = Likelihood
• P(H)P(H)P(H) = Prior probability
• P(E)P(E)P(E) = Evidence
Used in probabilistic reasoning and classification (e.g., Naive Bayes).
5. What is Machine Learning?
Machine Learning is a subset of AI that enables systems to learn from data and improve performance
without being explicitly programmed. It involves algorithms that detect patterns and make predictions.
6. Show the different Algorithm techniques in Machine Learning.
Main types:
• Supervised Learning (e.g., Linear Regression, Decision Trees)
• Unsupervised Learning (e.g., K-Means, PCA)
• Reinforcement Learning (e.g., Q-learning, DQN)
• Semi-supervised Learning
• Self-supervised Learning
7. Summarize stacking in ensemble learning.
Stacking combines multiple models (base learners) using a meta-learner. Base models make
predictions, and the meta-learner uses those predictions as inputs to produce the final output. It enhances
performance by leveraging model diversity.
8. Show the principle of maximum likelihood?
Maximum Likelihood Estimation (MLE) chooses parameters that maximize the likelihood of the
observed data:
θ^=arg⁡max⁡θP(Data∣θ)\hat{\theta} = \arg\max_{\theta} P(Data|\theta)θ^=argθmaxP(Data∣θ)
It's widely used in probabilistic models like logistic regression and Gaussian models.
9. Recall perceptron and its types.
Perceptron is a linear binary classifier.
Types:
• Single-layer Perceptron: Can classify linearly separable data
• Multi-layer Perceptron (MLP): Has hidden layers and can classify non-linear data
10. What do you mean by activation function?
An activation function decides whether a neuron should be activated. It introduces non-linearity in
neural networks.
Examples:
• Sigmoid
• ReLU (Rectified Linear Unit)
• Tanh
They help networks learn complex patterns.

PART B 5 X 13 = 65 MARKS
11. a) Water Jug Problem
Problem Statement:
You are given two jugs with capacities 4 liters and 3 liters. Neither has any measuring markers. The goal
is to obtain exactly 2 liters in the 4-liter jug.
State Space:
Each state is represented as a pair (X, Y) where
• X = amount of water in 4-liter jug
• Y = amount in 3-liter jug
Initial state: (0, 0)
Goal state: (2, y) for any y
Possible operations:
• Fill a jug completely
• Empty a jug
• Pour water from one jug to another until one is full or the other is empty
Solution steps:
1. (0, 0) → Fill 3-liter → (0, 3)
2. (0, 3) → Pour into 4-liter → (3, 0)
3. (3, 0) → Fill 3-liter → (3, 3)
4. (3, 3) → Pour into 4-liter → (4, 2)
5. (4, 2) → Empty 4-liter → (0, 2)
6. (0, 2) → Pour into 4-liter → (2, 0) ← Goal reached
(OR)
11. b) Uninformed Search Methods
Uninformed search methods do not use domain-specific knowledge.
Types:
1. Breadth-First Search (BFS): Explores all nodes at current depth before going deeper
o Complete, optimal (if step cost = 1), time: O(b^d), space: O(b^d)
2. Depth-First Search (DFS): Explores as far as possible along a branch
o Not complete in infinite space, not optimal, time: O(b^m), space: O(bm)
3. Uniform Cost Search (UCS): Expands least-cost node
o Complete, optimal, time and space: O(b^(1 + C*/ε)) where C* is optimal cost
Example:
Pathfinding from city A to city B without using distance as heuristic.
12. a) Approximate Inference in Bayesian Networks
Why Approximate?
Exact inference is NP-hard in general for large networks.
Methods:
1. Sampling Methods
o Likelihood Weighting
o Gibbs Sampling
o Rejection Sampling
2. Variational Inference
o Converts inference into optimization problem
o Approximates true distribution with simpler one
Likelihood Weighting Example:
Sample values for non-evidence variables based on conditional probabilities, while fixing evidence.
(OR)
12. b) a.Bayesian Network – Biased Coin Problem
Given:
• 3 coins: a (P(H)=0.2), b (P(H)=0.6), c (P(H)=0.8)
• One coin is chosen at random: P(a)=P(b)=P(c)=1/3
• The coin is flipped 3 times → X1, X2, X3
• Observed outcome: HHT
a) Bayesian Network and CPTs
Network Structure:
java
CopyEdit
C (Coin)
/| \
X1 X2 X3 (Coin flips)
Variables:
• C ∈ {a, b, c}
• Xi ∈ {H, T} (i=1 to 3)
CPTs:
• P(C) = {a: 1/3, b: 1/3, c: 1/3}
• P(Xi | C):
o For a: P(H)=0.2, P(T)=0.8
o For b: P(H)=0.6, P(T)=0.4
o For c: P(H)=0.8, P(T)=0.2
12.b) .bCompute Posterior for Coin given outcome HHT
We want to find:
P(C=ci∣X1=H,X2=H,X3=T)∝P(C=ci)⋅P(H∣ci)2⋅P(T∣ci)P(C = c_i | X_1 = H, X_2 = H, X_3 = T) \propto
P(C = c_i) \cdot P(H|c_i)^2 \cdot P(T|c_i)P(C=ci∣X1=H,X2=H,X3=T)∝P(C=ci)⋅P(H∣ci)2⋅P(T∣ci)
For coin a (P(H)=0.2, P(T)=0.8):
Pa∝13⋅(0.2)2⋅0.8=13⋅0.032=0.0107P_a \propto \frac{1}{3} \cdot (0.2)^2 \cdot 0.8 = \frac{1}{3} \cdot
0.032 = 0.0107Pa∝31⋅(0.2)2⋅0.8=31⋅0.032=0.0107
For coin b (P(H)=0.6, P(T)=0.4):
Pb∝13⋅(0.6)2⋅0.4=13⋅0.144=0.048P_b \propto \frac{1}{3} \cdot (0.6)^2 \cdot 0.4 = \frac{1}{3} \cdot
0.144 = 0.048Pb∝31⋅(0.6)2⋅0.4=31⋅0.144=0.048
For coin c (P(H)=0.8, P(T)=0.2):
Pc∝13⋅(0.8)2⋅0.2=13⋅0.128=0.0427P_c \propto \frac{1}{3} \cdot (0.8)^2 \cdot 0.2 = \frac{1}{3} \cdot
0.128 = 0.0427Pc∝31⋅(0.8)2⋅0.2=31⋅0.128=0.0427
Normalize:
Total = 0.0107 + 0.048 + 0.0427 = 0.1014
Final probabilities:
• P(a|HHT) ≈ 0.0107 / 0.1014 ≈ 10.6%
• P(b|HHT) ≈ 0.048 / 0.1014 ≈ 47.3%
• P(c|HHT) ≈ 0.0427 / 0.1014 ≈ 42.1%
13.a) Principle of the Gradient Descent Algorithm
Definition:
Gradient Descent is an optimization algorithm used to minimize a cost/loss function by iteratively
updating model parameters (like weights in a neural network) in the opposite direction of the gradient (or
slope) of the function.
Mathematical Formulation:
Let:
• θ\thetaθ = parameter (e.g., weight)
• J(θ)J(\theta)J(θ) = cost/loss function
• α\alphaα = learning rate (step size)
The update rule is:
θ:=θ−α⋅∂J(θ)∂θ\theta := \theta - \alpha \cdot \frac{\partial J(\theta)}{\partial \theta}θ:=θ−α⋅∂θ∂J(θ)
This means we take small steps in the direction opposite to the gradient to reach the minimum of
the cost function.
Terms Explained:

Term Description Valid Range

Parameter being optimized


Any real number
θ\thetaθ (e.g., weight in linear
R\mathbb{R}R
regression)

Cost function – measures how


≥0\geq 0≥0
J(θ)J(\theta)J(θ) far predictions are from actual
(commonly)
results
Term Description Valid Range

Learning rate – controls step 0<α<10 < \alpha <


α\alphaα
size in each iteration 10<α<1 typically

Gradient (slope) of the cost


∂J∂θ\frac{\partial Any real number
function with respect to
J}{\partial \theta}∂θ∂J R\mathbb{R}R
θ\thetaθ

Types of Gradient Descent:


1. Batch Gradient Descent:
o Uses the entire dataset to compute the gradient.
o Stable but slow for large datasets.
2. Stochastic Gradient Descent (SGD):
o Uses a single data point at each iteration.
o Faster, but noisier convergence.
3. Mini-Batch Gradient Descent:
o Uses a small batch of data points.
o Compromise between batch and SGD.
Diagram:
pgsql
Cost Function J(θ)
^
|
* | .←←←←←←←
| .
| .
| .
|.
----------*------------------> θ
Initial point Global minimum
• The arrow shows how θ\thetaθ is updated step by step towards the minimum of J(θ)J(\theta)J(θ).
Effect of Learning Rate (α):
• If α\alphaα is too small → slow convergence.
• If α\alphaα is too large → may overshoot or even diverge.
• A good learning rate ensures faster and stable convergence.
(OR)
13. b) Explain SVM Algorithm in Detail
Support Vector Machine (SVM):
SVM is a supervised learning algorithm used for classification and regression tasks.
Core Idea:
SVM finds the optimal hyperplane that maximally separates data points of different classes in the feature
space.
Key Concepts:
• Hyperplane: A decision boundary that separates classes.
• Support Vectors: Data points closest to the hyperplane, which influence its position.
• Margin: Distance between the hyperplane and the nearest support vectors.
• Objective: Maximize margin.
Mathematics:
For binary classification:
f(x)=w⋅x+bf(x) = w \cdot x + bf(x)=w⋅x+b
We want:
yi(w⋅xi+b)≥1∀iy_i(w \cdot x_i + b) \geq 1 \quad \forall iyi(w⋅xi+b)≥1∀i
Minimize:
12∣∣w∣∣2\frac{1}{2} ||w||^221∣∣w∣∣2
Using Lagrange multipliers for optimization (Convex Quadratic Programming).
Soft Margin SVM:
Introduces slack variable ξ for non-linearly separable data:
yi(w⋅xi+b)≥1−ξiy_i(w \cdot x_i + b) \geq 1 - \xi_iyi(w⋅xi+b)≥1−ξi
Kernel Trick:
Used to map data to higher dimensions:
• Linear Kernel
• Polynomial Kernel
• RBF (Gaussian) Kernel
• Sigmoid Kernel
Diagram:
[Include diagram of 2D plane with margin, support vectors, and hyperplane.]
14. a) Unsupervised Learning Structure
Definition:
Learning from unlabelled data to discover patterns or structure.
Characteristics:
• No output labels.
• Learning by grouping or dimensionality reduction.
Algorithms:
• Clustering (e.g., K-Means, DBSCAN)
• Dimensionality Reduction (e.g., PCA, t-SNE)
Use Cases:
• Customer segmentation
• Anomaly detection
• Recommender systems
(OR)
14.b) Clustering Approaches and Difference from Classification
Clustering – Concept
Clustering is an unsupervised learning method in machine learning used to group a set of data points
into clusters, such that data points in the same cluster are more similar to each other than to those in other
clusters.
Types of Clustering Approaches
1. Partitioning Methods
o Example: K-Means Clustering
o Divides data into k non-overlapping subsets (clusters).
o Each point belongs to one cluster.
o Tries to minimize intra-cluster variance.
2. Hierarchical Clustering
o Builds a hierarchy of clusters.
o Types:
▪ Agglomerative (bottom-up)
▪ Divisive (top-down)
o Results can be visualized using a dendrogram.
3. Density-Based Clustering
o Example: DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
o Groups together points that are closely packed.
o Can identify outliers as noise.
o Suitable for arbitrarily shaped clusters.
4. Model-Based Clustering
o Example: Gaussian Mixture Models (GMM)
o Assumes the data is generated from a mixture of distributions.
o Uses statistical models to assign probabilities of cluster membership.
5. Fuzzy Clustering
o Example: Fuzzy C-Means
o Allows one data point to belong to multiple clusters with varying degrees of membership.
Difference between Clustering and Classification

Feature Clustering Classification

Type of Learning Unsupervised Learning Supervised Learning

Label Availability No labeled data used Requires labeled training data

Output Groups or clusters Predefined classes or labels

Goal Discover structure or pattern in data Predict class labels for new data

Example Market segmentation, anomaly Email spam detection, disease


Applications detection prediction

15.a) Model the Architecture of a Multilayer Perceptron (MLP), Explain Its Operation, and
Mention Its Advantages and Disadvantages
1. Architecture of MLP:
A Multilayer Perceptron (MLP) is a type of feedforward artificial neural network that consists of
three main types of layers:
• Input Layer: Receives input features.
• Hidden Layer(s): One or more layers that perform computations.
• Output Layer: Produces the final output.
MLP Structure:
• Fully Connected: Every neuron in one layer is connected to every neuron in the next.
• Activation Functions: Used in hidden and output layers (e.g., ReLU, sigmoid, softmax).
Diagram (Text Representation):
css
CopyEdit
Input Layer Hidden Layer(s) Output Layer
[x1] ----\ [y1]
[x2] ---->--- [h1] ---\ [y2]
[x3] ----/ \---- [h2] ---\ ...
... \--- [yn]
[hn]
2. Operation of MLP:
a) Forward Propagation:
• Each neuron receives input, applies a weighted sum and a bias, then passes the result through an
activation function:
zj=∑i=1nwijxi+bjhj=f(zj)z_j = \sum_{i=1}^{n} w_{ij}x_i + b_j \\ h_j = f(z_j)zj=i=1∑nwijxi+bjhj=f(zj)
• Output of one layer becomes input to the next.
b) Output Calculation:
• Final layer computes the output using activation suitable for the task:
o Sigmoid for binary classification
o Softmax for multi-class classification
o Linear for regression
c) Loss Calculation:
• Compute loss using functions like Mean Squared Error (MSE) or Cross-Entropy.
d) Backpropagation:
• Gradients are calculated and weights are updated using algorithms like Stochastic Gradient
Descent (SGD) or Adam.
3. Advantages of MLP:
1. Can model non-linear and complex relationships
2. Works well for both classification and regression problems
3. General-purpose architecture
4. Supports multi-class output
5. Learns feature representations from raw data
4. Disadvantages of MLP:
1.Computationally expensive for deep networks
2. Requires large datasets for training
3.Prone to overfitting without regularization
4.Black-box model: difficult to interpret
5. Sensitive to hyperparameters (e.g., learning rate, number of neurons/layers)
(OR)
15. b) Backpropagation in MLP (from first principles)
Overview:
Backpropagation is a supervised learning algorithm used to train feedforward neural networks by
minimizing loss.
Network Setup:
• 1 Input layer, 1 Hidden layer, 1 Output layer
• Activation function: Sigmoid or ReLU
Steps:
1. Forward Propagation
o Calculate activations at each layer.
2. Loss Calculation
o Example: Mean Squared Error or Cross-Entropy
3. Backward Propagation
o Use chain rule to compute gradient of loss w.r.t weights.
4. Weight Update
o Using Gradient Descent:
w=w−η⋅∂L∂ww = w - \eta \cdot \frac{\partial L}{\partial w}w=w−η⋅∂w∂L
Key Formulas:
• Output layer error:
δ(L)=(y(L)−t)⋅f′(z(L))\delta^{(L)} = (y^{(L)} - t) \cdot f'(z^{(L)})δ(L)=(y(L)−t)⋅f′(z(L))
• Hidden layer error:
δ(l)=(W(l+1))Tδ(l+1)⋅f′(z(l))\delta^{(l)} = (W^{(l+1)})^T \delta^{(l+1)} \cdot
f'(z^{(l)})δ(l)=(W(l+1))Tδ(l+1)⋅f′(z(l))
PART C 1 X 15 = 15 MARKS
16. a) Constraint Satisfaction Problem + Crypt Arithmetic Example
Definition:
CSP is a problem where variables must be assigned values satisfying constraints.
Formulation:
• Variables: Letters in a puzzle (e.g., S, E, N, D, M, O, R, Y)
• Domain: Digits 0-9
• Constraints:
o Unique digit per letter
o Arithmetic constraint of the puzzle
o Leading digit ≠ 0
Example:
SEND + MORE = MONEY
Apply backtracking + constraint propagation:
• Check constraints at each step.
• Prune domains violating constraints.
Algorithms:
• Backtracking Search
• Forward Checking
• Arc Consistency (AC-3)
(OR)
16. b) Categorize Linear vs Logistic Regression
Feature Linear Regression Logistic Regression

Type Regression Classification

Output Continuous Probability (0–1)

Activation Function None Sigmoid

Equation y=w⋅x+by = w \cdot x + by=w⋅x+b ( P(y=1)

Cost Function MSE Cross-Entropy

Use Case Predict salary, price, etc. Predict binary outcomes (spam, disease)

You might also like