0% found this document useful (0 votes)
6 views

April May 2024

The document provides an answer key for a B.E/B.Tech degree examination in Artificial Intelligence and Data Science, covering various topics such as applications of AI, performance measurement, probabilistic reasoning, ensemble learning, and intelligent agents. It includes detailed explanations of concepts like stochastic gradient descent, random forests vs SVM, and unsupervised learning techniques. Additionally, it discusses the architecture of a single layer perceptron and its operation, along with advantages and disadvantages.

Uploaded by

abernakumari87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

April May 2024

The document provides an answer key for a B.E/B.Tech degree examination in Artificial Intelligence and Data Science, covering various topics such as applications of AI, performance measurement, probabilistic reasoning, ensemble learning, and intelligent agents. It includes detailed explanations of concepts like stochastic gradient descent, random forests vs SVM, and unsupervised learning techniques. Additionally, it discusses the architecture of a single layer perceptron and its operation, along with advantages and disadvantages.

Uploaded by

abernakumari87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

B.E/ B.TECH. DEGREE EXAMINATIONS, APRIL/MAY 2024


ANSWER KEY
PART A
1. What are the various applications of AI?
 Natural Language Processing (NLP): Chatbots, translation, sentiment analysis, speech
recognition.
 Computer Vision: Image recognition, object detection, facial recognition, medical image
analysis.
 Robotics: Autonomous vehicles, industrial automation, surgical robots.
 Expert Systems: Medical diagnosis, financial forecasting, troubleshooting.
 Recommendation Systems: Product recommendations, content personalization.
 Gaming: Game AI, strategy simulation.
2. How will you measure the performance of AI application?
 Accuracy: The proportion of correct predictions.
 Precision: The proportion of correctly predicted positives out of all predicted positives.
 Recall (Sensitivity): The proportion of correctly predicted positives out of all actual
positives.
 F1-Score: The harmonic mean of precision and recall.
 AUC-ROC: Area under the Receiver Operating Characteristic curve (for classification
tasks).
 Mean Squared Error (MSE): For regression tasks, measures the average squared
difference between predicted and actual values.
 Response Time/Latency: How quickly the system provides a response.
 Throughput: The amount of data the system can process in a given time.
3. Mention the needs of probabilistic reasoning in AI.
 Handling Uncertainty: Real-world scenarios are often uncertain and incomplete.
Probabilistic reasoning allows AI to make decisions with limited information.
 Dealing with Noisy Data: Probabilistic models can handle noisy or erroneous data.
 Making Predictions: Probabilities help in making predictions about future events or
outcomes.
 Updating Beliefs: Probabilistic reasoning enables the AI to update its beliefs based on
new evidence.
 Learning from Data: Probabilistic models can learn patterns and relationships from
data.
4. Given that P(A) = 0.3, P(A|B) = 0.4, and P(B) = 0.5, Compute P(B|A).
We can use Bayes' Theorem:
P(B|A) = [P(A|B) * P(B)] / P(A)
P(B|A) = (0.4 * 0.5) / 0.3
P(B|A) = 0.2 / 0.3
P(B|A) = 2/3 ≈ 0.667
5. How can overfitting be avoided?
 More Data: Increase the size of the training dataset.
 Cross-Validation: Use techniques like k-fold cross-validation to evaluate model
performance on different subsets of the data. 1
 Regularization: Add penalties to the model complexity (e.g., L1 or L2 regularization).
 Early Stopping: Stop training the model when performance on a validation set starts to
degrade.
 Feature Selection/Reduction: Choose only the most relevant features or reduce the
dimensionality of the data.
 Dropout: Randomly drop units (neurons) during training to prevent co-adaptation.
6. Assume a disease so rare that it is seen in only one person out of every million... [Bayes'
Theorem Problem]
Positive result doesn't guarantee the disease:
 Because so few people actually have the disease, most positive test results are likely to be
incorrect.
Bayes' Theorem helps:
 It's a way to calculate the real probability of having the disease after a positive test, taking
into account how rare the disease is.

7. Write the three types of ensemble learning.


 Bagging (Bootstrap Aggregating): Trains multiple models on different subsets of the
training data (bootstrapped samples) and combines their predictions (e.g., Random
Forest).
 Boosting: Trains models sequentially, where each model focuses on correcting the errors
of the previous models (e.g., AdaBoost, Gradient Boosting).
 Stacking (Stacked Generalization): Trains multiple base models and then uses another
model (a meta-learner) to combine their predictions.
8. How expectation maximization is used in Gaussian mixture models?
Expectation Maximization (EM) is used to estimate the parameters (means, covariances, and
mixing coefficients) of a Gaussian Mixture Model (GMM).
 E-step (Expectation): Calculate the probability that each data point belongs to each
Gaussian component in the mixture.
 M-step (Maximization): Update the parameters of each Gaussian component based on
the probabilities calculated in the E-step, maximizing the likelihood of the data.
The E and M steps are repeated iteratively until convergence.
9. What is stochastic gradient descent and why is it used in the training of neural
networks?
 Stochastic Gradient Descent (SGD): An iterative optimization algorithm used to
minimize a loss function. Unlike standard gradient descent, which calculates the gradient
using the entire training dataset, SGD calculates the gradient using only a single
randomly selected data point (or a small batch of data points) at each iteration.
 Why used in Neural Networks:
o Efficiency: SGD is computationally efficient, especially for large datasets, as it
avoids calculating the gradient over the entire dataset.
o Scalability: It scales well to large and high-dimensional datasets.

o Avoidance of Local Minima: The noisy updates of SGD can help the
optimization process escape local minima and find better solutions.

10. Why is ReLU better than Softmax? Give the equation for both.
 ReLU (Rectified Linear Unit):
o Equation: f(x) = max(0, x)

o Advantages:

 Reduces Vanishing Gradient Problem: Doesn't saturate for positive


inputs, avoiding the vanishing gradient problem in deep networks.
 Computational Efficiency: Simple thresholding operation,
computationally efficient.
 Sparsity: Forces activations to be zero for negative inputs, leading to
sparsity.
 Softmax:
o Equation: σ(z)_i = e^(z_i) / Σ_j e^(z_j) (where z is a vector of inputs)

o Used for: Multi-class classification problems, outputs a probability distribution


over the classes.
o Disadvantages (compared to ReLU for hidden layers):

 Computational Cost: Exponential calculations are more computationally


expensive.
 Vanishing Gradient (in some cases): Can saturate for very large or very
small inputs.

PART B
11. (a) Differentiate Kind Strarch and Heuristic Search.
o General Search Strategies:

o These are algorithms that explore a problem space to find a solution.

o They can be "uninformed" (blind) or "informed."

o Uninformed search strategies (like breadth-first search or depth-first search)


explore the search space systematically without any knowledge of how close they
are to the goal.
o They guarantee finding a solution if one exists, but can be very inefficient for
large problem spaces.
o Heuristic Search:

o Heuristic search is a type of "informed" search strategy.

o It uses "heuristics," which are rules of thumb or educated guesses, to guide the
search.
o A heuristic function estimates how close a current state is to the goal, allowing the
algorithm to prioritize more promising paths.
o Examples include A* search, greedy best-first search, and hill climbing.

o Heuristic search aims to find a solution more efficiently, but it doesn't always
guarantee finding the optimal solution.
o Key Differences:

o Information:

o General search strategies may or may not use information about the problem.

o Heuristic search always uses information in the form of a heuristic function.

o Efficiency:

o General search strategies can be inefficient for large problem spaces.

o Heuristic search is designed to be more efficient.

o Optimality:

o General search strategies can guarantee finding the optimal solution (depending
on the strategy).
o Heuristic search does not always guarantee finding the optimal solution.

o In essence, heuristic search is a specialized type of search strategy that leverages


problem-specific knowledge to improve efficiency.
o You stopped this response

(b) Explain characteristics of intelligent agents.


Core Characteristics:
 Autonomy:
o This is a fundamental trait. Intelligent agents operate independently, making
decisions and taking actions without constant human intervention.
o They can function in dynamic and unpredictable environments, adapting as
needed.
 Reactivity:
o Intelligent agents can perceive their environment through sensors (which can be
anything from simple data inputs to complex vision systems).
o They react to changes in the environment in a timely manner.
 Proactive Behavior (Goal-Oriented):
o Beyond simply reacting, intelligent agents exhibit goal-directed behavior.
o They take initiative, anticipating needs and working towards achieving specific
objectives.
o This involves planning and problem-solving.
 Learning Capabilities:
o Many intelligent agents can learn from their experiences, improving their
performance over time.
o This often involves machine learning techniques, allowing agents to adapt to new
situations and refine their decision-making processes.
 Rationality:
o A rational agent strives to do the "right thing," meaning it aims to maximize its
performance based on available information.
o This involves making decisions that are expected to lead to the best possible
outcome.
Expanding on the Concepts:
 Perception:
o Agents must be able to perceive their environment. This involves gathering data
through sensors and interpreting that data.
 Decision-Making:
o Based on their perception and goals, agents must make decisions about what
actions to take. This often involves reasoning, planning, and problem-solving.
 Action:
o Agents interact with their environment by taking actions. These actions can range
from simple movements to complex communications.
 Environment:
o It is very important to consider the environment that the agent operates within.
The environment can be;
 Deterministic or Stochastic.
 Static or Dynamic.
 Observable or Partially observable.
 Single Agent or Multi Agent

12. (a) Consider the following set of propositions... [Bayesian Network Creation]
 Propositions:
o Patient has spots

o Patient has high fever

o Patient has muscle aches

o Patient has previously been inoculated against measles

o Patient has Rocky Mountain Spotted Fever

o Patient has an allergy

o Patient was recently bitten by a tick

Network Creation:
8. Identify Dependencies:
 Rocky Mountain Spotted Fever (RMSF) causes spots, fever, and muscle
aches.
 Tick bites can cause RMSF.
 Allergies can cause spots.
 Inoculation against measles can reduce the likelihood of fever.
9. Create Nodes: Create a node for each proposition.
10. Draw Edges: Connect nodes based on dependencies. For example:
 Tick Bite -> RMSF
 RMSF -> Spots
 RMSF -> Fever
 RMSF -> Muscle Aches
 Allergy -> Spot
11. Conditional Probability Tables (CPTs): For each node, create a CPT that specifies
the probability of that node's state given the states of its parent nodes.

(b) Construct a Bayesian Network and define the necessary CPT... [Coin Flip Problem]
 Variables:
o Coin (A, B, C)

o X1 (First flip)

o X2 (Second flip)

o X3 (Third flip)

 Network Structure:
o Coin -> X1

o Coin -> X2

o Coin -> X3

 CPTs:
o P(Coin = A) = 0.1

o P(Coin = B) = 0.4

o P(Coin = C) = 0.5

o P(X1 = Heads | Coin = A) = 0.2 (and so on for X1 = Tails, X2, X3)

o P(X1 = Heads | Coin = B) = 0.5 (and so on for X1 = Tails, X2, X3)

o P(X1 = Heads | Coin = C) = 0.8 (and so on for X1 = Tails, X2, X3)

 Calculating Most Likely Coin: Use Bayes' Theorem to calculate P(Coin | X1, X2, X3) for
each coin, given the observed sequence of flips. The coin with the highest posterior
probability is the most likely.
13. (a) State when and why you would use random forests vs SVM.
 Random Forests:
o When to Use:

 Large datasets with many features.


 When feature importance is needed.
 When robustness to outliers is important.
 When interpretability is not a primary concern.
o Why:

 Handles non-linear data well.


 Less prone to overfitting.
 Provides feature importance estimates.
 Support Vector Machines (SVM):
o When to Use:

 High-dimensional data.
 Clear margin of separation between classes.
 When memory efficiency is important.
 When interpretability is needed (for linear SVM).
o Why:

 Effective in high-dimensional spaces.


 Can use kernel trick for non-linear data.
 Good generalization performance.
(b) Explain the principle of the gradient descent algorithm...
 Gradient Descent: An iterative optimization algorithm used to find the minimum of a
function (typically a loss function).
o Principle:

1. Start with an initial guess for the parameters.


2. Calculate the gradient of the loss function at the current parameter values.
3. Update the parameters in the direction opposite to the gradient (to
minimize the loss).
4. Repeat steps 2 and 3 until convergence (the loss stops decreasing
significantly).

14. (a) Explain various learning techniques involved in unsupervised learning.


ANS:
 Clustering: Grouping similar data points together.
o Examples: K-means, Hierarchical Clustering, DBSCAN.

 Dimensionality Reduction: Reducing the number of features while preserving important


information.
o Examples: Principal Component Analysis (PCA), t-SNE.

 Association Rule Learning: Discovering interesting relationships or associations between


variables.
o Examples: Apriori algorithm.

 Anomaly Detection: Identifying data points that deviate significantly from the norm.
o Examples: Isolation Forest, One-Class SVM.

(b) List the applications of clustering and identify advantages and disadvantages of
clustering algorithms.
 Applications:
o Customer segmentation

o Image segmentation

o Document clustering

o Anomaly detection

o Bioinformatics

 Advantages:
o Discovers hidden patterns in data.

o Useful for exploratory data analysis.

o Can be used for data preprocessing.


 Disadvantages:
o Can be sensitive to initialization and parameter choices.

o Can be computationally expensive for large datasets.

o May not find meaningful clusters if the data is not well-structured.

15. (a) Draw the architecture of a single layer perceptron (SLP) and explain the operation.
Mention its advantages and disadvantages.
ANS:
 Architecture:
o Input Layer: A set of input nodes (x1, x2, ..., xn) representing the features of the
input data.
o Weights: Each input node is connected to the output node by a weighted
connection (w1, w2, ..., wn).
o Summation/Aggregation: The weighted sum of the inputs is calculated: z = w1x1
+ w2x2 + ... + wn*xn + b (where 'b' is the bias).
o Activation Function: The result 'z' is passed through an activation function (e.g.,
step function, sigmoid, ReLU) to produce the output 'y'.
o Output Node: A single output node 'y' representing the prediction.

Diagram:
x1 --- w1 ---> (+) --- Activation Function ---> y
x2 --- w2 ---> (+)
...
xn --- wn ---> (+)
b ---> (+)
 Operation:
1. The inputs are multiplied by their respective weights.
2. The weighted inputs are summed together, along with the bias term.
3. The result is passed through an activation function, which determines the final
output.
 Advantages:
o Simplicity: Easy to understand and implement.

o Computational Efficiency: Relatively fast computation.

o Linear Separability: Can effectively solve linearly separable problems.

 Disadvantages:
o Limited Capability: Can only learn linear decision boundaries.

o Cannot Solve Non-Linear Problems: Fails to solve problems where the data is not
linearly separable (e.g., XOR problem).
o Sensitivity to Input Scaling: Performance can be affected by the scaling of input
features.

(b) How do you tune hyperparameters for better neural network performance? Explain in
detail.
ANS:
 Hyperparameter Tuning: The process of finding the optimal values for hyperparameters
(parameters that are set before training) to maximize the neural network's performance.
 Methods:
o Manual Tuning: Experimenting with different hyperparameter values based on
experience and intuition.
o Grid Search: Trying all possible combinations of hyperparameter values within a
specified range.
o Random Search: Randomly sampling hyperparameter values from a distribution.

o Bayesian Optimization: Using a probabilistic model to guide the search for


optimal hyperparameters.
o Automated Machine Learning (AutoML): Using automated tools to search for
optimal hyperparameters.
 Hyperparameters to Tune:
o Learning Rate: Controls the step size during gradient descent.

o Number of Layers: Depth of the neural network.

o Number of Neurons per Layer: Width of the neural network.


o Activation Functions: Choice of activation functions for hidden layers.

o Batch Size: Number of samples used in each training iteration.

o Regularization Parameters: Controls the amount of regularization (e.g., L1, L2,


dropout).
o Optimizer: Algorithm used for optimization (e.g., Adam, SGD, RMSprop).

o Initialization: Method used to initialize the weights.

 Evaluation:
o Cross-Validation: Using techniques like k-fold cross-validation to evaluate model
performance on different subsets of the data.
o Validation Set: Using a separate validation set to monitor performance during
training and prevent overfitting.

PART C

16. (a) Discuss constraint satisfaction problems with an algorithm for solving crypt
arithmetic. Solve the following:
```
CROSS
ROADS
------
DANGER
```
ANS:
 Constraint Satisfaction Problems (CSPs): Problems where the goal is to find a set of
values for variables that satisfy a set of constraints.
 Crypt Arithmetic: A type of CSP where letters represent digits and the goal is to find the
digit assignment that makes the arithmetic equation true.
 Algorithm (Backtracking Search):
1. Assign Variables: Assign a digit to each unique letter.
2. Check Constraints: Check if the assignment satisfies the constraints (e.g., each
letter has a unique digit, the arithmetic equation is correct).
3. Backtrack: If a constraint is violated, backtrack to the previous assignment and try
a different value.
4. Repeat: Repeat steps 2 and 3 until a solution is found or all possibilities have been
explored.
 Solution for CROSS + ROADS = DANGER:
 96233
 86513
 ------
 182746

(b) Construct the decision tree for the given dataset.


Dataset:

Day Outlook Temperature Humidity Wind Play Golf

D1 Sunny Hot High Weak No


D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain1 Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong2 No

ANS:
Calculations (using Information Gain for demonstration; you could also use Gain Ratio):
 Entropy(S): Entropy of the entire dataset (S) for "Play Golf."
o Entropy(S) = - (9/14) * log2(9/14) - (5/14) * log2(5/14) ≈ 0.940

 Information Gain for Each Attribute: We'll calculate the information gain for each
attribute to determine the best split.
o Gain(S, Outlook):

 Outlook = Sunny: [2 No, 3 Yes]


 Outlook = Overcast: [4 Yes]
 Outlook = Rain: [3 Yes, 2 No]
 Gain(S, Outlook) ≈ 0.246
o Gain(S, Temperature):

 Temperature = Hot: [2 No, 2 Yes]


 Temperature = Mild: [4 Yes, 2 No]
 Temperature = Cool: [3 Yes, 1 No]
 Gain(S, Temperature) ≈ 0.029
o Gain(S, Humidity):

 Humidity = High: [3 No, 4 Yes]


 Humidity = Normal: [6 Yes, 1 No]
 Gain(S, Humidity) ≈ 0.151
o Gain(S, Wind):

 Wind = Weak: [6 Yes, 2 No]


 Wind = Strong: [3 No, 3 Yes]
 Gain(S, Wind) ≈ 0.048
 Root Node:
o The attribute with the highest information gain is "Outlook" (Gain ≈ 0.246).

o The root node of the decision tree is "Outlook."

 Branches and Subsets:


o Outlook = Sunny: [2 No, 3 Yes] (further splitting needed)

o Outlook = Overcast: [4 Yes] (leaf node: Play Golf = Yes)

o Outlook = Rain: [3 Yes, 2 No] (further splitting needed)

 Further Splitting (Example for Outlook = Sunny):


o We repeat the gain calculation for the subset where "Outlook = Sunny" using the
remaining attributes ("Temperature," "Humidity," "Wind").
o We find that "Humidity" gives the highest gain in this subset.

 Decision Tree Structure (Tabular Representation):

NODE ATTRIBUTE VALUES PLAY GOLF NEXT


NODE/LEAFS
Root Out look Sunny Humidity (Sunny)
Ovetcast Yes Leaf (Yes)
Rainy Wind (Rain)
Humidity(Sunny) Humidity High No Leaf (No)
Normal Yes Leaf (Yes)
Wind(Rain) Wind Weak Yes Leaf (Yes)
Strong No Leaf (No)

You might also like