Answer Key
Answer Key
PART B 5 X 13 = 65 MARKS
11. a) Water Jug Problem
Problem Statement:
You are given two jugs with capacities 4 liters and 3 liters. Neither has any measuring markers. The goal
is to obtain exactly 2 liters in the 4-liter jug.
State Space:
Each state is represented as a pair (X, Y) where
• X = amount of water in 4-liter jug
• Y = amount in 3-liter jug
Initial state: (0, 0)
Goal state: (2, y) for any y
Possible operations:
• Fill a jug completely
• Empty a jug
• Pour water from one jug to another until one is full or the other is empty
Solution steps:
1. (0, 0) → Fill 3-liter → (0, 3)
2. (0, 3) → Pour into 4-liter → (3, 0)
3. (3, 0) → Fill 3-liter → (3, 3)
4. (3, 3) → Pour into 4-liter → (4, 2)
5. (4, 2) → Empty 4-liter → (0, 2)
6. (0, 2) → Pour into 4-liter → (2, 0) ← Goal reached
(OR)
11. b) Uninformed Search Methods
Uninformed search methods do not use domain-specific knowledge.
Types:
1. Breadth-First Search (BFS): Explores all nodes at current depth before going deeper
o Complete, optimal (if step cost = 1), time: O(b^d), space: O(b^d)
2. Depth-First Search (DFS): Explores as far as possible along a branch
o Not complete in infinite space, not optimal, time: O(b^m), space: O(bm)
3. Uniform Cost Search (UCS): Expands least-cost node
o Complete, optimal, time and space: O(b^(1 + C*/ε)) where C* is optimal cost
Example:
Pathfinding from city A to city B without using distance as heuristic.
12. a) Approximate Inference in Bayesian Networks
Why Approximate?
Exact inference is NP-hard in general for large networks.
Methods:
1. Sampling Methods
o Likelihood Weighting
o Gibbs Sampling
o Rejection Sampling
2. Variational Inference
o Converts inference into optimization problem
o Approximates true distribution with simpler one
Likelihood Weighting Example:
Sample values for non-evidence variables based on conditional probabilities, while fixing evidence.
(OR)
12. b) a.Bayesian Network – Biased Coin Problem
Given:
• 3 coins: a (P(H)=0.2), b (P(H)=0.6), c (P(H)=0.8)
• One coin is chosen at random: P(a)=P(b)=P(c)=1/3
• The coin is flipped 3 times → X1, X2, X3
• Observed outcome: HHT
a) Bayesian Network and CPTs
Network Structure:
java
CopyEdit
C (Coin)
/| \
X1 X2 X3 (Coin flips)
Variables:
• C ∈ {a, b, c}
• Xi ∈ {H, T} (i=1 to 3)
CPTs:
• P(C) = {a: 1/3, b: 1/3, c: 1/3}
• P(Xi | C):
o For a: P(H)=0.2, P(T)=0.8
o For b: P(H)=0.6, P(T)=0.4
o For c: P(H)=0.8, P(T)=0.2
12.b) .bCompute Posterior for Coin given outcome HHT
We want to find:
P(C=ci∣X1=H,X2=H,X3=T)∝P(C=ci)⋅P(H∣ci)2⋅P(T∣ci)P(C = c_i | X_1 = H, X_2 = H, X_3 = T) \propto
P(C = c_i) \cdot P(H|c_i)^2 \cdot P(T|c_i)P(C=ci∣X1=H,X2=H,X3=T)∝P(C=ci)⋅P(H∣ci)2⋅P(T∣ci)
For coin a (P(H)=0.2, P(T)=0.8):
Pa∝13⋅(0.2)2⋅0.8=13⋅0.032=0.0107P_a \propto \frac{1}{3} \cdot (0.2)^2 \cdot 0.8 = \frac{1}{3} \cdot
0.032 = 0.0107Pa∝31⋅(0.2)2⋅0.8=31⋅0.032=0.0107
For coin b (P(H)=0.6, P(T)=0.4):
Pb∝13⋅(0.6)2⋅0.4=13⋅0.144=0.048P_b \propto \frac{1}{3} \cdot (0.6)^2 \cdot 0.4 = \frac{1}{3} \cdot
0.144 = 0.048Pb∝31⋅(0.6)2⋅0.4=31⋅0.144=0.048
For coin c (P(H)=0.8, P(T)=0.2):
Pc∝13⋅(0.8)2⋅0.2=13⋅0.128=0.0427P_c \propto \frac{1}{3} \cdot (0.8)^2 \cdot 0.2 = \frac{1}{3} \cdot
0.128 = 0.0427Pc∝31⋅(0.8)2⋅0.2=31⋅0.128=0.0427
Normalize:
Total = 0.0107 + 0.048 + 0.0427 = 0.1014
Final probabilities:
• P(a|HHT) ≈ 0.0107 / 0.1014 ≈ 10.6%
• P(b|HHT) ≈ 0.048 / 0.1014 ≈ 47.3%
• P(c|HHT) ≈ 0.0427 / 0.1014 ≈ 42.1%
13.a) Principle of the Gradient Descent Algorithm
Definition:
Gradient Descent is an optimization algorithm used to minimize a cost/loss function by iteratively
updating model parameters (like weights in a neural network) in the opposite direction of the gradient (or
slope) of the function.
Mathematical Formulation:
Let:
• θ\thetaθ = parameter (e.g., weight)
• J(θ)J(\theta)J(θ) = cost/loss function
• α\alphaα = learning rate (step size)
The update rule is:
θ:=θ−α⋅∂J(θ)∂θ\theta := \theta - \alpha \cdot \frac{\partial J(\theta)}{\partial \theta}θ:=θ−α⋅∂θ∂J(θ)
This means we take small steps in the direction opposite to the gradient to reach the minimum of
the cost function.
Terms Explained:
Goal Discover structure or pattern in data Predict class labels for new data
15.a) Model the Architecture of a Multilayer Perceptron (MLP), Explain Its Operation, and
Mention Its Advantages and Disadvantages
1. Architecture of MLP:
A Multilayer Perceptron (MLP) is a type of feedforward artificial neural network that consists of
three main types of layers:
• Input Layer: Receives input features.
• Hidden Layer(s): One or more layers that perform computations.
• Output Layer: Produces the final output.
MLP Structure:
• Fully Connected: Every neuron in one layer is connected to every neuron in the next.
• Activation Functions: Used in hidden and output layers (e.g., ReLU, sigmoid, softmax).
Diagram (Text Representation):
css
CopyEdit
Input Layer Hidden Layer(s) Output Layer
[x1] ----\ [y1]
[x2] ---->--- [h1] ---\ [y2]
[x3] ----/ \---- [h2] ---\ ...
... \--- [yn]
[hn]
2. Operation of MLP:
a) Forward Propagation:
• Each neuron receives input, applies a weighted sum and a bias, then passes the result through an
activation function:
zj=∑i=1nwijxi+bjhj=f(zj)z_j = \sum_{i=1}^{n} w_{ij}x_i + b_j \\ h_j = f(z_j)zj=i=1∑nwijxi+bjhj=f(zj)
• Output of one layer becomes input to the next.
b) Output Calculation:
• Final layer computes the output using activation suitable for the task:
o Sigmoid for binary classification
o Softmax for multi-class classification
o Linear for regression
c) Loss Calculation:
• Compute loss using functions like Mean Squared Error (MSE) or Cross-Entropy.
d) Backpropagation:
• Gradients are calculated and weights are updated using algorithms like Stochastic Gradient
Descent (SGD) or Adam.
3. Advantages of MLP:
1. Can model non-linear and complex relationships
2. Works well for both classification and regression problems
3. General-purpose architecture
4. Supports multi-class output
5. Learns feature representations from raw data
4. Disadvantages of MLP:
1.Computationally expensive for deep networks
2. Requires large datasets for training
3.Prone to overfitting without regularization
4.Black-box model: difficult to interpret
5. Sensitive to hyperparameters (e.g., learning rate, number of neurons/layers)
(OR)
15. b) Backpropagation in MLP (from first principles)
Overview:
Backpropagation is a supervised learning algorithm used to train feedforward neural networks by
minimizing loss.
Network Setup:
• 1 Input layer, 1 Hidden layer, 1 Output layer
• Activation function: Sigmoid or ReLU
Steps:
1. Forward Propagation
o Calculate activations at each layer.
2. Loss Calculation
o Example: Mean Squared Error or Cross-Entropy
3. Backward Propagation
o Use chain rule to compute gradient of loss w.r.t weights.
4. Weight Update
o Using Gradient Descent:
w=w−η⋅∂L∂ww = w - \eta \cdot \frac{\partial L}{\partial w}w=w−η⋅∂w∂L
Key Formulas:
• Output layer error:
δ(L)=(y(L)−t)⋅f′(z(L))\delta^{(L)} = (y^{(L)} - t) \cdot f'(z^{(L)})δ(L)=(y(L)−t)⋅f′(z(L))
• Hidden layer error:
δ(l)=(W(l+1))Tδ(l+1)⋅f′(z(l))\delta^{(l)} = (W^{(l+1)})^T \delta^{(l+1)} \cdot
f'(z^{(l)})δ(l)=(W(l+1))Tδ(l+1)⋅f′(z(l))
PART C 1 X 15 = 15 MARKS
16. a) Constraint Satisfaction Problem + Crypt Arithmetic Example
Definition:
CSP is a problem where variables must be assigned values satisfying constraints.
Formulation:
• Variables: Letters in a puzzle (e.g., S, E, N, D, M, O, R, Y)
• Domain: Digits 0-9
• Constraints:
o Unique digit per letter
o Arithmetic constraint of the puzzle
o Leading digit ≠ 0
Example:
SEND + MORE = MONEY
Apply backtracking + constraint propagation:
• Check constraints at each step.
• Prune domains violating constraints.
Algorithms:
• Backtracking Search
• Forward Checking
• Arc Consistency (AC-3)
(OR)
16. b) Categorize Linear vs Logistic Regression
Feature Linear Regression Logistic Regression
Use Case Predict salary, price, etc. Predict binary outcomes (spam, disease)