ALML QUESTION PAPER
ALML QUESTION PAPER
healthy person. Assume that a new patient arrives and the test result is positive. Where =W, Xo + W, X, +... + WX= w, x,= WTX
0
id the probability that the patient has the disease?
Given a positive test result, we use Baye's theorem:
PART B-(5x 13=65 Marks)
Probability of having the disease (prior): P(D)=106
11. (a) Differentiate Blind Search and Heuristic Search
3 Sensitivity (true positiverate): P(T |D)= 0.99 Ans: Refer Section No: 1.5,3 Page No:1.45
3 False positive rate: P(T+ |H) =0.001
result: [OR]
The probability of having the disease given a positive 0.99 x 10-6
0.099 (6) Explain characteristics of intelligent agents.
P(D |T) = (0.99 x 106) + (0.001 x 0.999999)
Intelligent agents in artificial intelligence (AI) have several characteristics, including:
Thus, the probability the patient has the disease is about 9.9%. Autonomy: They can perform tasks independently
7. Write the three types of ensemble learning. 3 Learning: They can learn from their experiences
A few simple but powerful techniques, namely: Interaction: They can interact with other agents, humans, and systems
Max Voting Goal-oriented: They have habits that are oriented towards goals
Averaging Adaptation: They can adapt based on their experiences
Weighted Averaging Problem-solving: They can solve problems in real time
models? Error analysis: They can analyze their success and error rates
8. How expectation maximization is used in Gaussian mixture
Expectation-Maximization refers to a two-step, iterative process that is often used Memory: They can use memory-based storage and retrieval
when latent or unobserved variables are present underlying a data generation process. It An intelligent agent is an autonomous entity that uses sensors to perceive
provides the framework used to fit a Gaussian Mixture Model, which has wide environment and actuators to interact with it. The agent's decision-making mechanis
application in unsupervised learning contexts.
SQ31 |SQ.32| Artificial Intelligence and Machine Learning
Question Paper rules
SobedAnna University decisions based on Step 3: Specify Conditional Probability Distributions
and nmakes goals.
information from the
sensors
actions the agent
takes to achieve its Define the conditional probability distributions (CPDs) for
each variable given its
processes
algorithms. These
decisions determine the reflex agents,
model-based agents,
parents:
include: simple
intelligent agents
$ Some types of utility-based agents,and learning agents. P(MI) - Probability of measles given inoculation status.
Mountain spotted fever status.
goal-based agents,
propositions. * P(SM,R) - Probability of spots given measles and Rocky
followingset of Mountain spotted fever
12 (4) Consider the P(FM,R) -Probability of high fever given measles and Rocky
Patient has spots status.
status.
Patient hasmeasles P(RT) - Probability of Rocky Mountain spotted fever given tick bíte
Patient has high fever P(A)- Probability of allergy (A is independent).
fever.
Patient has Rocky mountain spotted Step 4: Represent the Bayesian Network
been inoculated against measles.
* Patient has previously The Bayesian network is represented by the DAG from Step 2
and the CPDs from Step 3.
bitten by a tick.
$ Patient was recently This fully specifies the joint probability distribution over all varíables,
Patient has an allergy.
among these nodes, (5) Final Answer
network that defines the casualconnections
Create a necessary conditional with the directed
()
network by constructing the The Bayesian network consists of the variables (S, M, F, R, I, T, A}
(i) Make it aBayesian (8) acyclic graph and conditional probability distributions as defined in Steps 2 and 3.
The joint
probability matrix.
probability distribution can be calculated using the chain rule and the specified CPDs.
Step 1: Define Variables [OR]
Define the following binary random variables: (b) Construct a Bayesian Network and define the necessary CPTs for the given
S: Patient has spots (True/False) scenario. We have a bag of three biased coins a, b and c with probabilities of
M: Patient has measles (True/False) coming up heads of 20%, 60% and 80% respectively. One coin is drawn
$ F: Patient has high fever (True/False) randomly from the bag (with equal likelih0od of drawing each of thethree coins)
R: Patient has Rocky Mountain spotted fever (True/False) and then the coinisflipped three times to generate the outcomes XI, X2 and X3.
* I: Patient has been inoculated against measles
(True/False) () Draw a Bayesian network corresponding to this setup and define the relevant
CPTs. (7)
T: Patient was recently bitten by a tick (True/False)
A: Patient has an allergy (True/False) (iü) Calculate which coin is most likely to have been drawn if the flips come up
HHT. (6)
Step 2: Define the Network Structure
Bayesian Network
Establish the directed acyclic graph (DAG) representing the causal relationships:
I’M
The Bayesian network for this setup can be represented as follows:
MS Coin Type (C) > Flip 1 (X1)
MF > Flip 2 (X2)
T’R > Flip 3 (X3)
Conditional Probability Tables (CPTs)
R’F The necessary Conditional Probability Tables (CPTs) for this Bayesian network are as
follows:
Learnink L
SQ.33 Artificial Intelligence and Machine
SQ.34
Question Paper descent algorithm, Accompany your
Solved Anna University (b) Explain the principle of the gradient
explanation with a diagram.
1. CPT for Coin Type
(C): Probability used optimization algorithms to
Coin Type (C) Gradient Descent is known as one of the most commonly
1/3 learning models by means of minimizing eTors between actual and expected
train machine Networks. In mathematical
descent is also used to train Neural
1/3 results. Further, gradient minimizing/maximizing an
terminology, Optimization algorithm refers to the task of learning. optimization is
1/3
parameterized by x. Similarly, in machine
objective function f(r)
model's parameters.
Flip 3 (X3): the task of minimizing the cost function parameterized by the
(X1), Flip 2 (X2), and Flip 3 (X3) The main objective of gradient descent is to minimize
the convex function using íteration
2. CPT for Flip 1 Flip 2(X2) models are optimized, these models can
Coin Type (C) Flip 1 (X1) 0.2 of parameter updates. Once these machine learning and various computer science
0.2 as powerful tools for Artificial Intelligence
0.2 be used function
maximum of a
applications. The best-way to define the local minimum or local
a 0.6
0.6
0.6 using gradient descent is as follows:
b 0.8
0.8 from the gradient of the function
0.8 If we move towards a negative gradient or away
that function.
given the at the current point, it will give the local minimum of
Calculation of Most Likely Coin drawn from the bag of the
coin was most likely
to have been and Whenever we move towards a positive gradient or towards the gradient
To calculate which denote the event of drawing coin a, b, wil get the local maximum of that function.
theorem. Let's
Bayes' coin given that we function at the current point, we
observed flips, we can use the probability of each
B, and C, respectively. We want to find P(C|HHT). Initial
as A, and
tail, P(AJHHT), P(BHHT),
Gradient
observed two heads and one Weight
According to Bayes'theorem: Incremnental
/P(HHT)
P(AJHHT) = (P(HHT|A) * P(A)) Step
P(HHT)
P(B|HHT) = (P(HHT|B)* P(B)) /
P(C|HHT) = (P(HHT|C) * P(C) / P(HHT) Minimum Cost
as follows:
We can calculate the probabilities P(X2=H|A) * P(X3=T|A) = Derivative of Cost
X3=T|A) = PX1=HA) *
P(HHT|A) = P(X1=H, X2-H, Weight
0.2 *0.,2 * 0.8 =0.032
P(X1=H|B) * P(X2=H|B) * P(X3=T|B) = Fig. SQ.1.
P(HHT|B) = P(X1=H, X2=H, X3=T|B) =
This entire procedure is known as Gradient AScent, which is also known as steepe
0.6 * 0.6 * 0.4 = 0.144
P(1-HC) * P(X2-H|C) * P(X3=T|C) = descent. The main objective of using a gradient descent algorithm is to minimize the co
P(HHT|C) = P(Xl=H, X2-H, X3=T|C) = function using iteration. To achieve this goal, it performs two steps iteratively:
0.8 * 0.8 * 0.2 =0.128
are equal (1/3), we can ignore them in Calculates the first-order derivative of the function to compute the gradient
Since the prior probabilities P(A), P(B), and P(C)
likely to have been drawn from the bag given the slope of that function.
the comparison. Therefore, the coin most probability P(B|HHT) =0.144.
observed flips is coin B, as it has the highest conditional Move away from the direction of the gradient, which means slope increased fr
forests vs SVM? the current point by alpha times, where Alpha is defined as Learning Rate. It
13. (a) State when and why you would use random
Ans: Refer Section No: 3.12, 3.10
Page No: 3.67 and 3.56 tuning parameter in the optimization process which helps to decide the lengt
the steps.
[OR]
SQ35
SQ 36 Artificid Intelligece ant Mekine Learning
(uestion Puer
SobdAnna linierih some basic minimnum. At the same time, a low leaming rate sbows the sInall step siCS, which
should know
Descent work? decent, we linear compromises overall efticieney but gives the adv antage of more precison.
How does Gradient of
gradient cquation for SImple
Small Leaming Rate Large Leanng Rate
starting the working principle linear regression. The Loss LOSS
Retore line from
the slope ofa
coneps to find out
regressOn is given as: on the y-aXIS.
YmX c represents the intercepts
line. and '
reprsents the slope ofthe
Where Starting point
Loss
VaW of wgnt
Value of wesght
Fig. SQ.2.
14. (a) Explain various learning techniques involved in unsupervised learning.
Ans: Refer Section No: 4.11 Page No: 4.43
Value of weight
[OR]
Point of convergence, i.e.
where the cost function is (6) Listthe applications of clustering and identify advantages and disadvantages of
at its minimum clustering algorithms.
the performance as it is
above fig) is used to evaluate Ans: Refer Section No: 4.12.3 Page No: 4.52
The starting point (shown in point, we will derive the first derivative
starting
considered just as an arbitrary point. At this this slope 15. (a) Draw the architecture of a single layer perceptron (SLP) and explain is
calculate the steepness of this slope. Further,
or slope and then use a tangent line to (weights and bias).
operation, Mentionits udvantages and disadvantages.
will inform the updates to the parameters Ans: Refer Section No: 5.2 Page No: 5.11
point or arbitrary point, but whenever new
The slope becomes steeper at the starting
then steepness gradually reduces, and at the lowest point, it [OR]
parameters are generated, convergence. (b) How do you tune hyperparameters for better neural network performance ?
approaches the lowest point, which is called a point of
minimize the cost function or the error Explain in detail.
The main objective of gradient descent is to
function, twvo data points are required: Ans: Refer Section No: 5.11 Page No: 5.76
between expected and actual. To minimize the cost
Direction & Learning Rate
derivative calculation of future
These two factors are used to determine the partial PART C-(1× 15- 15 Marks)
minimum. Let's
iteration and allow it to the point of convergence or local minimum or global 16. (a) Discuss constraint satisfuction problems with an algorithm for solving cryp
discuss learning rate factors in brief; arithmetic. Trace the algorithm for the following:
Learning Rate: CROSS
It is defined as the step size taken to reach the minimum or lowest point. This is typically + ROADs
a small value that is evaluated and updated based on the behavior of the cost function. If the DANGER
learning rate is high, it results in larger steps but also leads to risks of overshooting the
1. Last column is generating a carry C, - 1. So D=1
(sQ.37| hine learning
sQ.38) Artifia ial lnte lligen e aul Mn
Fig. SQ.4.
Decision Trees
At every node, test the
Send the instance downcorresponding atribute
Outlook
the appropriate branch of
If at a leaf, output the the tree
Overcast Rain
corresponding classification
Sunny
Wind
Humidity Yes
Strong Weak
Normal
High
No Yes
No Yes
Fig. SQ.3.