0% found this document useful (0 votes)
2 views

ALML QUESTION PAPER

The document is an examination paper for a course on Artificial Intelligence and Machine Learning, detailing various topics such as definitions, applications, and techniques in AI. It includes questions on adversarial search, probabilistic reasoning, supervised vs. unsupervised learning, and performance metrics for AI applications. Additionally, it covers advanced concepts like Bayesian networks, gradient descent, and strategies to avoid overfitting in machine learning models.

Uploaded by

ganesh3032005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ALML QUESTION PAPER

The document is an examination paper for a course on Artificial Intelligence and Machine Learning, detailing various topics such as definitions, applications, and techniques in AI. It includes questions on adversarial search, probabilistic reasoning, supervised vs. unsupervised learning, and performance metrics for AI applications. Additionally, it covers advanced concepts like Bayesian networks, gradient descent, and strategies to avoid overfitting in machine learning models.

Uploaded by

ganesh3032005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Learning

|sQ.20) Artiflieial Intelligence and Machine

Authors: S.N. Sangeethaa, S.Jothimani


APRIL/ MAY 2023
3.E/ B.T'ech DEGREEEXAMINATIONS,
Engineering
Fourth Semester, Computer Science and
INTELLIGENCE AND MACHINE LEARNING
CS 349l-ARTIFICIAL
Engineering and Information
(Common to :Computer and Communication
Technology)
(Regulations 2021)
Maximum : 100 Marks
Time Threc Hours
Answer ALL Questions
PART A-(10×2= 20 Marks)
1. Define artificial intelligence.
intelligence, or AI, is technology that enables computers and
Artificial problem-solving capabilities.
machines to simulate human intelligence and
2. What is adversarial search?
artificial intelligence is a problem-solving technique
Adversarial 'search in
decisions in competitive or adversarial scenarios. It is
that focuses on making agents, often referred to as
optimal strategies when multiple
employed to find Adversarial search aims to
conflicting objectives.
players, have opposing or
determine the best course of action for a given player,, considering the possible
moves and counter-moves of the opponent(S).
3. Define uncertainty.
systems can operate in
Uncertainty in AI refers to the idea that AI
infornmation.
environments with incomplete or uncertain
4. State Baye's rule.
Theorem states that the conditional probability of an event, based on
Bayes' likelihood of the second event.
another event, is equal to the
the occurrence of
firstevent multiplied by the probability of the first event.
given the
difference between supervised learning and unsupervised learning.
5. Outline the
Unsupervised Learning
Difference between Supervised Learning and
Unsupervised Learning
Supervised Learning
Input data is unlabelled
Input data is labeled
Uses training dataset Uses just input dataset
Solved Anna Universiry Question Paper SQ21 SQ.28 Artificial Intelligence and Machine Learning
S.Jothimani Probabilistic reasoning is a form of knowledge representation ín which the concept
Authors: S.N. Sangeethaa, of probability is used to indicate the degree of uncertainty in knowledge. In AI,
50904
QUESTION PAPER CODE: APRILMAY 2024 probabilistic models are used to examine data using statistical codes.
EXAMINATIONS,
B.E/B.TECH DEGREE 3 Probabilistic reasoning is using logic and probability to handle uncertain situations.
FOURTH/SIXTH SEMESTER
An example of probabilistic reasoning is using past situations and statistics to
Computer Science and Engineering predict an outcome.
MACHINE LEARNING
CS 3491- ARTIFICIAL INTELUGENCE AND Abilities

Science and Design/Computer Science and


(Common to: Biomedical engineering/Computer Electronics and Goals/Preferences Agent
Communication Enginecring/
Engineering (Cyber Security)/Computer and Telecommunication Enginecring/ Medical Prior knowledge Actions
Communication Engineering/Electronics and
Electronics/Information Technology)
(Regulations 2021) Observations Environment
Maximum: 100 Marks
Time: Three Hours Past experiences
Answer ALL questions.
4. Given that P(A) = 0.3, P(A | B) = 0.4 and P(B) = 0.5, Compute P(B | A).
PART A -(10 x 2 = 20 Marks) Given:
1. What are the various applications of Al? P(A) = 0.3
Game Playing P(A |B) = 0.4
such as Chess, where the
AI is widely used in Gaming. Different strategic games P(B) = 0.5
and video games to provide real-time experiences
machine needs to think logically, P(B/A) = ?
use Artificial Intelligence.
Solution:
Robotics
P(AN B)
to develop
Artificial Intelligence is commonly used in the field of Robotics P(B |A) = P(A)
implemented robots use real-time updates to sense any
intelligent robots. AI P(AN B) = P(A|B) P(B) =0.4 x 0.5
path and can change the path instantly. AI robots can be used for
obstacle in their P(AN B) = 0.2
also be' used for other different
carrying goods in hospitals and industries and can 0.2
purposes. P(B|A)= 0.3
Healthcare
P(B |A) 0.667
to detect
In the healthcare sector, AI has diverse uses. In this field, AI can be used 5. How can overfitting be avoided?
diseases and cancer cells. It also helps in finding new drugs with the use of
historical data and medical intelligence. * Use more data: This is the simplest way to prevent overfitting. Using more dat
makes it harder for the model to memorize exact patterns, and forces it to find mor
2. How will you measure the performance of AI application? flexible solutions. However, adding more noisy data won't help.
Metrics such as accuracy, precision, recall, and F1 score for classification tasks, and Use crosS-validation: This technique generates multiple train-test splits from you
mean squared error (MSE) and R-squared for regression tasks, can be used to evaluate training data to tune your model. Cross-validation helps the model generalize bet
the performance of an Al model. to unseen data.
evs 3. Mention the needs of probabiliatic reasoning in Al.
SQ.29 SQ.30 Artificial Intelligence and Machine Learning
Solved Anna University Cuestion Paper
of a model by addinga 9. What is stochastic gradient descent and why is it used in the training of neural
techniquc reduces the complexity
Use regularization: This function. Regularization grades features based On networks?
penalty term to the loss prediction outcomes. In Stochastic Gradient Descent, a few samples are selected randomly instead of the
don't impact the
importance, eliminating those that learnable parameters is whole data set of cach iteration. In Gradient Descent, there is a term called "batch"
model with a small number of
Start with a small model: A which denotes the total number of samples from a dataset that is used for calculating the
less likely to overfit. gradient for each iteration. In typical Gradient Descent optimization, like Batch
strategy to avoid overfitting. Gradient Descent, the batch is taken to be the shole dataset.
Use early stopping: This is a
strategy to ávoid overfitting. 10. Why is ReLU better than Sofimaz? Give the equation for both.
Use data augmentation: This is a transfer
strategy to avoid overfitting with ReLU is often better than Sofmax when the number of key-value slots is large. This
Choose the right model: This is a
is because ReLU is able to alleviate the problem of activation weights being highly
learning. with transfer
Freeze or fine-tune the layers: This is a strategy to avoid overfitting centralized in a small number of slots.
3 million. Also
in only one person out of every ReLUformula is: f() = max (0, x)
6 Assume a disease so rare that it is seen the disease, there is e(1)
in that ifaperson has
assumeth¡t we have a test that is effective is not P(y=j|6) = (i)
result will be positive; however, the test e.
a 99 percent chance that the test the test will be positive on a
thousand chance that
perfect, and there is a one in a What K

healthy person. Assume that a new patient arrives and the test result is positive. Where =W, Xo + W, X, +... + WX= w, x,= WTX
0
id the probability that the patient has the disease?
Given a positive test result, we use Baye's theorem:
PART B-(5x 13=65 Marks)
Probability of having the disease (prior): P(D)=106
11. (a) Differentiate Blind Search and Heuristic Search
3 Sensitivity (true positiverate): P(T |D)= 0.99 Ans: Refer Section No: 1.5,3 Page No:1.45
3 False positive rate: P(T+ |H) =0.001
result: [OR]
The probability of having the disease given a positive 0.99 x 10-6
0.099 (6) Explain characteristics of intelligent agents.
P(D |T) = (0.99 x 106) + (0.001 x 0.999999)
Intelligent agents in artificial intelligence (AI) have several characteristics, including:
Thus, the probability the patient has the disease is about 9.9%. Autonomy: They can perform tasks independently
7. Write the three types of ensemble learning. 3 Learning: They can learn from their experiences
A few simple but powerful techniques, namely: Interaction: They can interact with other agents, humans, and systems
Max Voting Goal-oriented: They have habits that are oriented towards goals
Averaging Adaptation: They can adapt based on their experiences
Weighted Averaging Problem-solving: They can solve problems in real time
models? Error analysis: They can analyze their success and error rates
8. How expectation maximization is used in Gaussian mixture
Expectation-Maximization refers to a two-step, iterative process that is often used Memory: They can use memory-based storage and retrieval
when latent or unobserved variables are present underlying a data generation process. It An intelligent agent is an autonomous entity that uses sensors to perceive
provides the framework used to fit a Gaussian Mixture Model, which has wide environment and actuators to interact with it. The agent's decision-making mechanis
application in unsupervised learning contexts.
SQ31 |SQ.32| Artificial Intelligence and Machine Learning
Question Paper rules
SobedAnna University decisions based on Step 3: Specify Conditional Probability Distributions
and nmakes goals.
information from the
sensors
actions the agent
takes to achieve its Define the conditional probability distributions (CPDs) for
each variable given its
processes
algorithms. These
decisions determine the reflex agents,
model-based agents,
parents:
include: simple
intelligent agents
$ Some types of utility-based agents,and learning agents. P(MI) - Probability of measles given inoculation status.
Mountain spotted fever status.
goal-based agents,
propositions. * P(SM,R) - Probability of spots given measles and Rocky
followingset of Mountain spotted fever
12 (4) Consider the P(FM,R) -Probability of high fever given measles and Rocky
Patient has spots status.
status.
Patient hasmeasles P(RT) - Probability of Rocky Mountain spotted fever given tick bíte
Patient has high fever P(A)- Probability of allergy (A is independent).
fever.
Patient has Rocky mountain spotted Step 4: Represent the Bayesian Network
been inoculated against measles.
* Patient has previously The Bayesian network is represented by the DAG from Step 2
and the CPDs from Step 3.
bitten by a tick.
$ Patient was recently This fully specifies the joint probability distribution over all varíables,
Patient has an allergy.
among these nodes, (5) Final Answer
network that defines the casualconnections
Create a necessary conditional with the directed
()
network by constructing the The Bayesian network consists of the variables (S, M, F, R, I, T, A}
(i) Make it aBayesian (8) acyclic graph and conditional probability distributions as defined in Steps 2 and 3.
The joint
probability matrix.
probability distribution can be calculated using the chain rule and the specified CPDs.
Step 1: Define Variables [OR]
Define the following binary random variables: (b) Construct a Bayesian Network and define the necessary CPTs for the given
S: Patient has spots (True/False) scenario. We have a bag of three biased coins a, b and c with probabilities of
M: Patient has measles (True/False) coming up heads of 20%, 60% and 80% respectively. One coin is drawn
$ F: Patient has high fever (True/False) randomly from the bag (with equal likelih0od of drawing each of thethree coins)
R: Patient has Rocky Mountain spotted fever (True/False) and then the coinisflipped three times to generate the outcomes XI, X2 and X3.
* I: Patient has been inoculated against measles
(True/False) () Draw a Bayesian network corresponding to this setup and define the relevant
CPTs. (7)
T: Patient was recently bitten by a tick (True/False)
A: Patient has an allergy (True/False) (iü) Calculate which coin is most likely to have been drawn if the flips come up
HHT. (6)
Step 2: Define the Network Structure
Bayesian Network
Establish the directed acyclic graph (DAG) representing the causal relationships:
I’M
The Bayesian network for this setup can be represented as follows:
MS Coin Type (C) > Flip 1 (X1)
MF > Flip 2 (X2)
T’R > Flip 3 (X3)
Conditional Probability Tables (CPTs)
R’F The necessary Conditional Probability Tables (CPTs) for this Bayesian network are as
follows:
Learnink L
SQ.33 Artificial Intelligence and Machine
SQ.34
Question Paper descent algorithm, Accompany your
Solved Anna University (b) Explain the principle of the gradient
explanation with a diagram.
1. CPT for Coin Type
(C): Probability used optimization algorithms to
Coin Type (C) Gradient Descent is known as one of the most commonly
1/3 learning models by means of minimizing eTors between actual and expected
train machine Networks. In mathematical
descent is also used to train Neural
1/3 results. Further, gradient minimizing/maximizing an
terminology, Optimization algorithm refers to the task of learning. optimization is
1/3
parameterized by x. Similarly, in machine
objective function f(r)
model's parameters.
Flip 3 (X3): the task of minimizing the cost function parameterized by the
(X1), Flip 2 (X2), and Flip 3 (X3) The main objective of gradient descent is to minimize
the convex function using íteration
2. CPT for Flip 1 Flip 2(X2) models are optimized, these models can
Coin Type (C) Flip 1 (X1) 0.2 of parameter updates. Once these machine learning and various computer science
0.2 as powerful tools for Artificial Intelligence
0.2 be used function
maximum of a
applications. The best-way to define the local minimum or local
a 0.6
0.6
0.6 using gradient descent is as follows:
b 0.8
0.8 from the gradient of the function
0.8 If we move towards a negative gradient or away
that function.
given the at the current point, it will give the local minimum of
Calculation of Most Likely Coin drawn from the bag of the
coin was most likely
to have been and Whenever we move towards a positive gradient or towards the gradient
To calculate which denote the event of drawing coin a, b, wil get the local maximum of that function.
theorem. Let's
Bayes' coin given that we function at the current point, we
observed flips, we can use the probability of each
B, and C, respectively. We want to find P(C|HHT). Initial
as A, and
tail, P(AJHHT), P(BHHT),
Gradient
observed two heads and one Weight
According to Bayes'theorem: Incremnental
/P(HHT)
P(AJHHT) = (P(HHT|A) * P(A)) Step
P(HHT)
P(B|HHT) = (P(HHT|B)* P(B)) /
P(C|HHT) = (P(HHT|C) * P(C) / P(HHT) Minimum Cost
as follows:
We can calculate the probabilities P(X2=H|A) * P(X3=T|A) = Derivative of Cost
X3=T|A) = PX1=HA) *
P(HHT|A) = P(X1=H, X2-H, Weight
0.2 *0.,2 * 0.8 =0.032
P(X1=H|B) * P(X2=H|B) * P(X3=T|B) = Fig. SQ.1.
P(HHT|B) = P(X1=H, X2=H, X3=T|B) =
This entire procedure is known as Gradient AScent, which is also known as steepe
0.6 * 0.6 * 0.4 = 0.144
P(1-HC) * P(X2-H|C) * P(X3=T|C) = descent. The main objective of using a gradient descent algorithm is to minimize the co
P(HHT|C) = P(Xl=H, X2-H, X3=T|C) = function using iteration. To achieve this goal, it performs two steps iteratively:
0.8 * 0.8 * 0.2 =0.128
are equal (1/3), we can ignore them in Calculates the first-order derivative of the function to compute the gradient
Since the prior probabilities P(A), P(B), and P(C)
likely to have been drawn from the bag given the slope of that function.
the comparison. Therefore, the coin most probability P(B|HHT) =0.144.
observed flips is coin B, as it has the highest conditional Move away from the direction of the gradient, which means slope increased fr
forests vs SVM? the current point by alpha times, where Alpha is defined as Learning Rate. It
13. (a) State when and why you would use random
Ans: Refer Section No: 3.12, 3.10
Page No: 3.67 and 3.56 tuning parameter in the optimization process which helps to decide the lengt
the steps.
[OR]
SQ35
SQ 36 Artificid Intelligece ant Mekine Learning
(uestion Puer
SobdAnna linierih some basic minimnum. At the same time, a low leaming rate sbows the sInall step siCS, which
should know
Descent work? decent, we linear compromises overall efticieney but gives the adv antage of more precison.
How does Gradient of
gradient cquation for SImple
Small Leaming Rate Large Leanng Rate
starting the working principle linear regression. The Loss LOSS
Retore line from
the slope ofa
coneps to find out
regressOn is given as: on the y-aXIS.
YmX c represents the intercepts
line. and '
reprsents the slope ofthe
Where Starting point
Loss

VaW of wgnt
Value of wesght
Fig. SQ.2.
14. (a) Explain various learning techniques involved in unsupervised learning.
Ans: Refer Section No: 4.11 Page No: 4.43
Value of weight
[OR]
Point of convergence, i.e.
where the cost function is (6) Listthe applications of clustering and identify advantages and disadvantages of
at its minimum clustering algorithms.
the performance as it is
above fig) is used to evaluate Ans: Refer Section No: 4.12.3 Page No: 4.52
The starting point (shown in point, we will derive the first derivative
starting
considered just as an arbitrary point. At this this slope 15. (a) Draw the architecture of a single layer perceptron (SLP) and explain is
calculate the steepness of this slope. Further,
or slope and then use a tangent line to (weights and bias).
operation, Mentionits udvantages and disadvantages.
will inform the updates to the parameters Ans: Refer Section No: 5.2 Page No: 5.11
point or arbitrary point, but whenever new
The slope becomes steeper at the starting
then steepness gradually reduces, and at the lowest point, it [OR]
parameters are generated, convergence. (b) How do you tune hyperparameters for better neural network performance ?
approaches the lowest point, which is called a point of
minimize the cost function or the error Explain in detail.
The main objective of gradient descent is to
function, twvo data points are required: Ans: Refer Section No: 5.11 Page No: 5.76
between expected and actual. To minimize the cost
Direction & Learning Rate
derivative calculation of future
These two factors are used to determine the partial PART C-(1× 15- 15 Marks)
minimum. Let's
iteration and allow it to the point of convergence or local minimum or global 16. (a) Discuss constraint satisfuction problems with an algorithm for solving cryp
discuss learning rate factors in brief; arithmetic. Trace the algorithm for the following:
Learning Rate: CROSS
It is defined as the step size taken to reach the minimum or lowest point. This is typically + ROADs
a small value that is evaluated and updated based on the behavior of the cost function. If the DANGER
learning rate is high, it results in larger steps but also leads to risks of overshooting the
1. Last column is generating a carry C, - 1. So D=1
(sQ.37| hine learning
sQ.38) Artifia ial lnte lligen e aul Mn

Papr 6+0+ C,« 9


l'nisetsiy (hueston
Solved Annu 0+C, 3 Let C, -I
Is CvCn
the Knunber
know sun of Ren 02
2. As we
cven
S+S R So: Ris differentlettcrs. 1.C. 0 - 0.1,2...
digit to
So R 0, 2, 4, 6, 8
We can't assign same
)
Let [o- 2)
But R + 0 since S
is also zero
land R 2. 2 4. So Cj Again consider R+0+C;-N
But LD 1.So S carry since 2
IfR 2 Ren S 1 cuation S +S*R not generate any 6+2 + | N
If R 4 Ren S
2s0 carry Cs * I
cquation is generated N 9
+ 10. Since
3. (C+R + C4 A C+R+C>9 C+$ + Ca>9
But
C9 So N 9
So. -C+Cs >5 So Let N-8 andlet C, )
=’C> 5 6. Let us consider cquation
IfC, 0+A C G Since C,0
So C= 6 or 7or 8 or 9 2+5+ 0 - G
Let C 9
cquation
4. Let us consider the S+D+ C, = E
So D= 1,S =3, A= 5, Gj7,C 9, )- 2, E 4, R6, N%
2+ I + ) = E
And C, = 0, Cz- 0, C, 0,Ca 0, C, - 1
=3
E=3 and C, = 0.Since2+1 Verification
equation again
Let us consider the
C+R t Ca = A+ 10
C, C,Cy C, C;
CRO S S
9+4+0 A+ J0
13 = A + 10’A=3
But E =3 ROA DS
A 3
DAN GER
So
consider R=6 SoS=3 and C, = 0
Again 100 0 0
C+R+ C = A+ 10
So
A + 10 96233
9+6 + 0 =
15 = A+ 10 62513
A = 5 1587 46
S+D+C, = E
and
3+1+0 = E [OR]
E = 4 C,=0 (b) Construct the decision tree for the below dataset.
5. Let's consider equation
R+0+Ca = N Since C =0 Humidity Wind Play Golf
is not generating carry
Day Outlook Temperature
6+0+C, = N DI Weak No
Sunny Hot High
SQ.39
Solved Anna University SQ.40|
Question Paper No
A
set of
D2
Using Decisioleaves, where each leaf gives a Artificial Intelligence and Machine Learning
Strong
Sunny Hot High Yes
D3 Weak
n Trees for class
D4
Overcast Hot High
Weak
Yes
Suppose we get a new Classification value.
Rain Mild High
Outlook = instance:
How do weSunny, Temperat
Yes

classify it? ure hot, Humidity =High, Wind =Stong


Weak
DS Rain Cool Normal =
No
D6 Rain Normal Strong
Cool Yes
D7 Overcast Cool Normal Strong Outlook
No
D8 Weak
Sunny Mild High Yes
D9 Normal
Weak Sunny Overcast Rain
Sunny Cool
Yes
D10 Rain Normal Weak Humidity
Mild Yes
Normal Strong
Yes Wind
DII
Sunny Mild
Yes
Strong
D12 Overcast Mild High High Normal
Yes
Normal
Weak Strong Weak
D13 Overcast Hot No
No Yes
Strong No
DI4 Rain Mild High Yes

Fig. SQ.4.
Decision Trees
At every node, test the
Send the instance downcorresponding atribute
Outlook
the appropriate branch of
If at a leaf, output the the tree
Overcast Rain
corresponding classification
Sunny

Wind
Humidity Yes

Strong Weak
Normal
High
No Yes
No Yes
Fig. SQ.3.

A decision tree consists of: branches on all


node tests the value of an attribute and
A set of nodes. where each
possible values.

You might also like