April May 2024
April May 2024
o Avoidance of Local Minima: The noisy updates of SGD can help the
optimization process escape local minima and find better solutions.
10. Why is ReLU better than Softmax? Give the equation for both.
ReLU (Rectified Linear Unit):
o Equation: f(x) = max(0, x)
o Advantages:
PART B
11. (a) Differentiate Kind Strarch and Heuristic Search.
o General Search Strategies:
o It uses "heuristics," which are rules of thumb or educated guesses, to guide the
search.
o A heuristic function estimates how close a current state is to the goal, allowing the
algorithm to prioritize more promising paths.
o Examples include A* search, greedy best-first search, and hill climbing.
o Heuristic search aims to find a solution more efficiently, but it doesn't always
guarantee finding the optimal solution.
o Key Differences:
o Information:
o General search strategies may or may not use information about the problem.
o Efficiency:
o Optimality:
o General search strategies can guarantee finding the optimal solution (depending
on the strategy).
o Heuristic search does not always guarantee finding the optimal solution.
12. (a) Consider the following set of propositions... [Bayesian Network Creation]
Propositions:
o Patient has spots
Network Creation:
8. Identify Dependencies:
Rocky Mountain Spotted Fever (RMSF) causes spots, fever, and muscle
aches.
Tick bites can cause RMSF.
Allergies can cause spots.
Inoculation against measles can reduce the likelihood of fever.
9. Create Nodes: Create a node for each proposition.
10. Draw Edges: Connect nodes based on dependencies. For example:
Tick Bite -> RMSF
RMSF -> Spots
RMSF -> Fever
RMSF -> Muscle Aches
Allergy -> Spot
11. Conditional Probability Tables (CPTs): For each node, create a CPT that specifies
the probability of that node's state given the states of its parent nodes.
(b) Construct a Bayesian Network and define the necessary CPT... [Coin Flip Problem]
Variables:
o Coin (A, B, C)
o X1 (First flip)
o X2 (Second flip)
o X3 (Third flip)
Network Structure:
o Coin -> X1
o Coin -> X2
o Coin -> X3
CPTs:
o P(Coin = A) = 0.1
o P(Coin = B) = 0.4
o P(Coin = C) = 0.5
Calculating Most Likely Coin: Use Bayes' Theorem to calculate P(Coin | X1, X2, X3) for
each coin, given the observed sequence of flips. The coin with the highest posterior
probability is the most likely.
13. (a) State when and why you would use random forests vs SVM.
Random Forests:
o When to Use:
High-dimensional data.
Clear margin of separation between classes.
When memory efficiency is important.
When interpretability is needed (for linear SVM).
o Why:
Anomaly Detection: Identifying data points that deviate significantly from the norm.
o Examples: Isolation Forest, One-Class SVM.
(b) List the applications of clustering and identify advantages and disadvantages of
clustering algorithms.
Applications:
o Customer segmentation
o Image segmentation
o Document clustering
o Anomaly detection
o Bioinformatics
Advantages:
o Discovers hidden patterns in data.
15. (a) Draw the architecture of a single layer perceptron (SLP) and explain the operation.
Mention its advantages and disadvantages.
ANS:
Architecture:
o Input Layer: A set of input nodes (x1, x2, ..., xn) representing the features of the
input data.
o Weights: Each input node is connected to the output node by a weighted
connection (w1, w2, ..., wn).
o Summation/Aggregation: The weighted sum of the inputs is calculated: z = w1x1
+ w2x2 + ... + wn*xn + b (where 'b' is the bias).
o Activation Function: The result 'z' is passed through an activation function (e.g.,
step function, sigmoid, ReLU) to produce the output 'y'.
o Output Node: A single output node 'y' representing the prediction.
Diagram:
x1 --- w1 ---> (+) --- Activation Function ---> y
x2 --- w2 ---> (+)
...
xn --- wn ---> (+)
b ---> (+)
Operation:
1. The inputs are multiplied by their respective weights.
2. The weighted inputs are summed together, along with the bias term.
3. The result is passed through an activation function, which determines the final
output.
Advantages:
o Simplicity: Easy to understand and implement.
Disadvantages:
o Limited Capability: Can only learn linear decision boundaries.
o Cannot Solve Non-Linear Problems: Fails to solve problems where the data is not
linearly separable (e.g., XOR problem).
o Sensitivity to Input Scaling: Performance can be affected by the scaling of input
features.
(b) How do you tune hyperparameters for better neural network performance? Explain in
detail.
ANS:
Hyperparameter Tuning: The process of finding the optimal values for hyperparameters
(parameters that are set before training) to maximize the neural network's performance.
Methods:
o Manual Tuning: Experimenting with different hyperparameter values based on
experience and intuition.
o Grid Search: Trying all possible combinations of hyperparameter values within a
specified range.
o Random Search: Randomly sampling hyperparameter values from a distribution.
Evaluation:
o Cross-Validation: Using techniques like k-fold cross-validation to evaluate model
performance on different subsets of the data.
o Validation Set: Using a separate validation set to monitor performance during
training and prevent overfitting.
PART C
16. (a) Discuss constraint satisfaction problems with an algorithm for solving crypt
arithmetic. Solve the following:
```
CROSS
ROADS
------
DANGER
```
ANS:
Constraint Satisfaction Problems (CSPs): Problems where the goal is to find a set of
values for variables that satisfy a set of constraints.
Crypt Arithmetic: A type of CSP where letters represent digits and the goal is to find the
digit assignment that makes the arithmetic equation true.
Algorithm (Backtracking Search):
1. Assign Variables: Assign a digit to each unique letter.
2. Check Constraints: Check if the assignment satisfies the constraints (e.g., each
letter has a unique digit, the arithmetic equation is correct).
3. Backtrack: If a constraint is violated, backtrack to the previous assignment and try
a different value.
4. Repeat: Repeat steps 2 and 3 until a solution is found or all possibilities have been
explored.
Solution for CROSS + ROADS = DANGER:
96233
86513
------
182746
ANS:
Calculations (using Information Gain for demonstration; you could also use Gain Ratio):
Entropy(S): Entropy of the entire dataset (S) for "Play Golf."
o Entropy(S) = - (9/14) * log2(9/14) - (5/14) * log2(5/14) ≈ 0.940
Information Gain for Each Attribute: We'll calculate the information gain for each
attribute to determine the best split.
o Gain(S, Outlook):