0% found this document useful (0 votes)
24 views

AI_UNIT2

Ro ho

Uploaded by

Roja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

AI_UNIT2

Ro ho

Uploaded by

Roja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

ARTIFICIAL INTELLIGENCE

(22MCA0304)
Unit-2

Heuristic search strategies-heuristic functions:


Informed Search (Heuristic Search)
Informed search algorithms use domain knowledge.
In an informed search, problem information is available which can guide the search. Informed search
strategies can find a solution more efficiently than an uninformed search strategy.
Informed search is also called a Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but guaranteed to find a
good solution in reasonable time.
Heuristics function:
Heuristic is a function which is used in Informed Search, and it finds the most promising path. It
takes the current state of the agent as its input and produces the estimation of how close agent is from the
goal. The heuristic method, however, might not always give the best solution, but it guaranteed to find a
good solution in reasonable time. Heuristic function estimates how close a state is to the goal. It is
represented by h(n), and it calculates the cost of an optimal path between the pair of states. The value of
the heuristic function is always positive.
Admissibility of the heuristic function is given as:
1. h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost.
Hence heuristic cost should be less than or equal to the estimated cost.

Pure Heuristic Search:


Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes based on their
heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the CLOSED list, it places those
nodes which have already expanded and in the OPEN list, it places nodes which have yet not been
expanded.
On each iteration, each node n with the lowest heuristic value is expanded and generates all its successors
and n is placed to the closed list. The algorithm continues unit a goal state is found.
In the informed search we will discuss two main algorithms which are given below:
Best First Search Algorithm (Greedy search)

A* Search Algorithm:
Best-first Search Algorithm (Greedy Search):
Greedy best-first search algorithm always selects the path which appears best at that moment. It is the
combination of depth-first search and breadth-first search algorithms. It uses the heuristic function and
search. Best-first search allows us to take the advantages of both algorithms. With the help of best-first
search, at each step, we can choose the most promising node. In the best first search algorithm, we
expand the node which is closest to the goal node and the closest cost is estimated by heuristic function,
i.e.
f(n)= g(n).
Were, h (n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm


Step 1: Place the starting node into the OPEN list.
Step 2: If the OPEN list is empty, Stop and return failure.
Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and places it in the
CLOSED list.
Step 4: Expand the node n, and generate the successors of node n.
Step 5: Check each successor of node n, and find whether any node is a goal node or not. If any successor
node is goal node, then return success and terminate the search, else proceed to Step 6.
Step 6: For each successor node, algorithm checks for evaluation function f(n), and then check if the node

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 1


has been in either OPEN or CLOSED list. If the node has not been in both list, then add it to the OPEN
list.
Step 7: Return to Step 2.
Advantages:
Best first search can switch between BFS and DFS by gaining the advantages of both the algorithms.
This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
It can behave as an unguided depth-first search in the worst case scenario.
It can get stuck in a loop as DFS.
This algorithm is not optimal.
Example
Consider the below search problem, and we will traverse it using greedy best-first search. At each iteration,
each node is expanded using evaluation function f(n)=h(n) , which is given in the below table.

In this search example, we are using two lists which are OPEN and CLOSED Lists. Following are the
iteration for traversing the above example.

Expand the nodes of S and put in the CLOSED list Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S, B]
: Open [E, A], Closed [S, B, F]
Iteration 3: Open [I, G, E, A], Closed [S, B, F]

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 2


: Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F > G
Time Complexity: The worst case time complexity of Greedy best first search is O(bm).
Space Complexity: The worst case space complexity of Greedy best first search is O(bm).
Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
Optimal: Greedy best first search algorithm is not optimal.

A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function h(n), and cost
to reach the node n from the start state g(n). It has combined features of UCS and greedy best-first search,
by which it solve the problem efficiently. A* search algorithm finds the shortest path through the search
space using the heuristic function. This search algorithm expands less search tree and provides optimal
result faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we can
combine both costs as following, and this sum is called as a fitness number.

Algorithm of A* search:
Step1: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function (g+h), if
node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute evaluation
function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back pointer
which reflects the lowest g(n') value.
Step 6: Return to Step 2.

Advantages:
A* search algorithm is the best algorithm than other search algorithms.
A* search algorithm is optimal and complete.
This algorithm can solve very complex problems.
Disadvantages:
It does not always produce the shortest path as it mostly based on heuristics and approximation.
A* search algorithm has some complexity issues.
The main drawback of A* is memory requirement as it keeps all generated nodes in the memory, so it is
not practical for various large-scale problems.

Example
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of all states is
given in the below table so we will calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 3


Solution:

Initialization: {(S, 5)}


Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost 6.

If the heuristic function is admissible, then A* tree search will always find the least cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic function, and the
number of nodes expanded is exponential to the depth of solution d. So the time complexity is O(b^d),
where b is the branching factor.

Space Complexity: The space complexity of A* search algorithm is O(b^d)

LOCAL SEARCH ALGORITHMS AND OPTIMIZATION PROBLEMS:


In many optimization problems, the path to the goal is irrelevant; the goal state itself is the solution.
For example, in the 8-queens problem, what matters is the final configuration of queens, not the order in
which they are added.

In such cases, we can use local search algorithms. They operate using a single current state (rather
than multiple paths) and generally move only to neighbors of that state.
The important applications of these class of problems are (a) integrated-circuit design, (b) Factory-floor
layout,(c) job-shop scheduling,(d)automatic programming, (e)telecommunications network
optimization,(f)Vehicle routing, and (g) portfolio management.
Key advantages of Local Search Algorithms
They use very little memory – usually a constant amount; and
They can often find reasonable solutions in large or infinite (continuous) state spaces for which
systematic algorithms are unsuitable.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 4


OPTIMIZATION PROBLEMS:
In addition to finding goals, local search algorithms are useful for solving pure optimization problems, in
which the aim is to find the best state according to an objective function.
State Space Landscape
To understand local search, it is better explained using state space landscape as shown in figure
below.
A landscape has both “location” (defined by the state) and “elevation”(defined by the value of the heuristic
cost function or objective function).
If elevation corresponds to cost, then the aim is to find the lowest valley – a global minimum;
if elevation corresponds to an objective function, then the aim is to find the highest peak – a global
maximum.
Local search algorithms explore this landscape. A complete local search algorithm always finds a goal if
one.

Local search can be used on problems that can be formulated as finding a solution maximizing a criterion
among a number of candidate solutions.
Local search algorithms move from solution to solution in the space of candidate solutions (the search
space) until a solution deemed optimal is found or a time bound is elapsed.
For example, the travelling salesman problem, in which a solution is a cycle containing all nodes of the
graph and the target, is to minimize the total length of the cycle. i.e. a solution can be a cycle and the
criterion to maximize is a combination of the number of nodes and minimize the length of the cycle.
A local search algorithm starts from a candidate solution and then iteratively moves to a neighbor
solution.
Terminate on a time bound or if the situation is not improved after number of steps.

HILL CLIMBING SEARCH:


Continually moves in the direction of increasing value (i.e., uphill).Terminates when it reaches a “peak”, no
neighbor has a higher value.Only records the state and its objective function value.
Does not look ahead beyond the immediate. So, it is sometimes called Greedy Local Search

Problem: can get stuck in local maxima,

Its success depends very much on the shape of the state-space land-scape: if there are few local
maxima, random-restart hill climbing will find a “good” solution very quickly. The hill-climbing search
algorithm, which is the most basic local search technique. At each step the current node is replaced by
the best neighbor; in this version, that means the neighbor with the highest VALUE, but if a heuristic cost
estimate h is used, we would find the neighbor with the lowest h.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 5


Example: 4 queens problem
Put 4 queens on an 4×4 board with no two queens on the same row, column, or diagonal. Move a queen to
reduce number of conflicts. In the below figures, h represents the heuristic (number of conflicting pairs of
queens).

Example: 8 queens problem

Problems with hill-climbing:


Hill-climbing often gets stuck for the following reasons:
Local maxima: a local maximum is a peak that is higher than each of its neighboring states, but lower
than the global maximum. Hill-climbing algorithms that reach the vicinity Of a local maximum will be
drawn upwards towards the peak, but will then be stuck with nowhere else to go
Ridges: A ridge is shown in Figure below. Ridges results in a sequence of local maxima that is very difficult
for greedy algorithms to navigate.
Plateaux: A plateau is an area of the state space landscape where the evaluation function is flat. All the
neighboring states are same.
Figure: Ridges

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 6


Simulated Annealing

To avoid being stuck in a local maxima, it tries randomly (using a probability function) to move to another
state, if this new state is better it moves into it, otherwise try another move… and so on.
Terminates when finding an acceptably good solution in a fixed amount of time, rather than the best
possible solution. Locating a good approximation to the global minimum of a given function in a large
search space. Widely used in VLSI layout, airline scheduling, etc.
The problem with this approach is that the neighbors of a state are not guaranteed to contain any of the
existing better solutions which means that failure to find a better solution among them does not guarantee
that no better solution exists. It will not get stuck to a local optimum. If it runs for an infinite amount of
time, the global optimum will be found.

GENETIC ALGORITHMS:
Inspired by evolutionary biology and natural selection, such as inheritance. Evolves toward better
solutions.A successor state is generated by combining two parent states, rather by modifying a single state.
Start with k randomly generated states (population), Each state is an individual.A state is represented as a
string over a finite alphabet (often a string of 0s and 1s) Evaluation function (fitness function). Higher
values for better states. Produce the next generation of states by selection, crossover, and mutation.
Commonly, the algorithm terminates when either a maximum number of generations has been produced,
or a satisfactory fitness level has been reached for the population.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 7


Example
Local Search in Continuous Spaces:
The states are defined by n variables (defined by an n-dimensional vector of variables x.)
Gradient of the objective function ▽f

Local expression for the gradient


Can perform steepest-ascent hill climbing by updating the current state according to the formula x ← x +
α▽f(x) , αis the step size (a small constant).
If the objection function f is not available in a differentiable form, use empirical gradient search.

Line search: tries to overcome the dilemma of adjusting α by extending the current gradient direction—
repeatedly doubling αuntil f starts to decrease again. The point at which this occurs becomes the new
current state.
In cases that the objective function is not differentiable: we calculate an empirical gradient by evaluating
the response to small increments and decrements in each coordinate. Empirical gradient search is the
same as steepest-ascent hill climbing in a discretized version of the state space.

Newton-Raphson method:

Hf(x) is the Hessian matrix of second derivatives, Hij =

Constrained optimization: An optimization problem is constrained if solutions must satisfy some hard
constraints on the values of the variables. (e.g. : linear programming problems)

Many local search methods apply also to problems in continuous spaces. Linear programming and convex
optimization problems obey certain restrictions on the shape of the state space and the nature of the
objective function, and admit polynomial-time algorithms that are oftenly extremely efficient in practice.

Search with non-deterministic actions:


When the environment is either partially observable or nondeterministic (or both), the future precepts
cannot be determined in advance, and the agent’s future actions will depend on those future precepts.

Nondeterministic problems:
Transition model is defined by RESULTS function that returns a set of possible outcome states;
Solution is not a sequence but a contingency plan (strategy),
e.g.
[Suck, if State = 5 then [Right, Suck] else []];

In nondeterministic environments, agents can apply AND-OR search to generate contingent plans that
reach the goal regardless of which outcomes occur during execution.
AND-OR search trees
OR nodes: In a deterministic environment, the only branching is introduced by the agent’s own choices in
each state, we call these nodes OR nodes.
AND nodes: In a nondeterministic environment, branching is also introduced by the environment’s choice
of outcome for each action, we call these nodes AND nodes.

AND-OR tree: OR nodes and AND nodes alternate. States nodes are OR nodes where some action must be
chosen. At the AND nodes (shown as circles), every outcome must be handled.
A solution (shown in bold lines) for an AND-OR search problem is a sub tree that

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 8


1) has a goal node at every leaf;
2) Specifies one action at each of its OR nodes;
3) Includes every outcome branch at each of its AND nodes.

A recursive, depth-first algorithm for AND-OR graph search

Searching with partially observable environments:


Belief state: The agent's current belief about the possible physical states it might be in, given the
sequence of actions and percepts up to that point.
Standard search algorithms can be applied directly to belief-state space to solve sensorless problems, and
belief-state AND-OR search can solve general partially observable problems. Incremental algorithms that

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 9


construct solutions state-by-state within a belief state are often more efficient.

1. Searching with no observation:


When the agent's percepts provide no information at all, we have a sensorless problem.
To solve sensorless problems, we search in the space of belief states rather than physical states. In belief-
state space, the problem is fully observable; the solution is always a sequence of actions.

Belief-state problem can be defined by: (The underlying physical problem P is defined by ACTIONS P,
RESULTP, GOAL-TESTP and STEP-COSTP)
·Belief states: Contains every possible set of physical states. If P has N states, the sensorless problem has
up to 2^N states (although many may be unreachable from the initial state).
·Initial states: Typically the set of all states in P.
·Actions:
a. If illegal actions have no effect on the environment, take the union of all the actions in any of the
physical states in the current belief b:

ACTIONS (b) =
b. If illegal actions are extremely dangerous, take the intersection.
·Transition model:
a. For deterministic actions,
b' = RESULT(b,a) = {s' : s' = RESULTP(s,a) and s∈b}. (b' is never larger than b).
b. For nondeterministic actions,
b' = RESULT(b,a) = {s' : s' = RESULTP(s,a) and s∈b} = (b' may be larger than b)
The process of generating the new belief state after the action is called the prediction step.
·Goal test: A belief state satisfies the goal only if all the physical states in it satisfy GOAL-TESTP.
·Path cost
If an action sequence is a solution for a belief state b, it is also a solution for any subset of b. Hence, we
can discard a path reaching the superset if the subset has already been generated. Conversely, if the
superset has already been generated and found to be solvable, then any subset is guaranteed to be
solvable.

Main difficulty: The size of each belief state.


Solution:
a. Represent the belief state by some more compact description;
b. Avoid the standard search algorithm, develop incremental belief state search algorithms instead.
Incremental belief-state search: Find one solution that works for all the states, typically able to detect
failure quickly.
2. Searching with observations:
When observations are partial, The ACTIONS, STEP-COST, and GOAL-TEST are constructed form the
underlying physical problem just as for sensorless problems.
·Transition model: We can think of transitions from one belief state to the next for a particular action as
occurring in 3 stages.
The prediction stage is the same as for sensorless problem, given the action a in belief state b, the

predicted belief state is


The observation prediction stage determines the set of percepts o that could be observed in the predicted
belief state:

The update stage determines, for each possible percept, the belief state that would result from the percept.
The new belief state bo is the set of states in that could have produced the percept:

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 10


In conclusion:
RESULTS(b,a) = { bo : bo = UPDATE(PREDICT(b,a),o) and o∈POSSIBLE-PERCEPTS(PREDICT(b,a))}
Search algorithm return a conditional plan that test the belief state rather than the actual state.

Agent for partially observable environments is similar to the simple problem-solving agent (formulates a
problem, calls a search algorithm, executes the solution).
Main difference:
1) The solution will be a conditional plan rather than a sequence.
2) The agent will need to maintain its belief state as it performs actions and receives percepts.
Given an initial state b, an action a and a percept o, the new belief state is
b’ = UPDATE(PREDICT(b, a), o). //recursive state estimator
Sate estimation: a.k.a. monitoring or filtering, a core function of intelligent system in partially observable
environments—maintaining one’s belief state.

Online Search Agents and Unknown Environments:


An online search agent operates by interleaving computation and action: first it takes an action and then
it observes the environment and computes the next action. Online search is a good idea in dynamic or
semi dynamic domains-domains where there is a penalty for sitting around and computing too long.
Online search is an even better idea for stochastic domains.
(The term "online" is commonly used in computer science to refer to algorithms that must process input
data as they are received, rather than waiting for the entire input data set to become available.)

In general, an offline search would have to come up with an exponentially large contingency plan that
considers all possible happenings, while an online search need only consider what actually does happen.
For example,
A chess playing agent is well-advised to make its first move long before it has figured out the complete
course of the game. Online search is a necessary idea for an exploration problem, where the states and
actions are unknown to the agent. An agent in this state of Ignorance must use its actions as experiments
to determine what to do next, and hence must interleave computation and action.
The canonical example of online search is a robot that is placed in a new building and must explore it to
build a map that it can use for getting from A to B. Methods for escaping from labyrinths-required
knowledge for aspiring heroes of antiquity-are also examples of online search algorithms. Spatial
exploration is not the only form of exploration, however.

Consider a newborn baby: it has many possible actions, but knows the outcomes of none of them,
and it has experienced only a few of the possible states that it can reach. The baby’s gradual discovery
of how the world works is, in part, an online search process.

Online search problems:


An online search problem can be solved only by an agent executing actions, rather than by a purely
computational process. We will assume that the agent knows just the following:
ACTIONS(S), which returns a list of actions allowed in states;
The step-cost function c(s, a, sl)-note that this cannot be used until the agent knows that sl is the
outcome; and GOAL-TEST(S).

Note in particular that the agent cannot access the successors of a state except by actually trying all the
actions in that state. For example, in the maze problem shown in Figure, the agent does not know that
going Up from (1,l) leads to (1,2); nor, having done that, does it know that going Down will take it back to
(1,l). This degree of ignorance can be reduced in some applications-for example, a robot explorer might
know how its movement actions work and be ignorant only of the locations of obstacles.

We will assume that the agent can always recognize a state that it has visited before, and we will assume
that the actions are deterministic. Finally, the agent might have access to an, admissible heuristic
function h(s) that estimates the distance from the current state to a goal state. For example, in Figure, the
agent might know the location of the goal and be able to use the Manhattan distance heuristic.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 11


Typically, the agent's objective is to reach a goal state while minimizing cost. (Another possible objective is
simply to explore the entire environment.) The cost is the total path cost of the path that the agent actually
travels. It is common to compare this cost with the path cost of the path the agent would follow if it knew
the search space in advance-that is, the actual shortest path (or shortest complete exploration). In the
language of online algorithms this is called the competitive ratio; we would like it to be as small as
possible. Although this sounds like a reasonable request, it is easy to see that the best achievable
competitive ratio is infinite in some cases. For example, if some actions are irreversible, the online search
might accidentally reach a dead-end state from which no goal state is reachable.

Figure: A simpIe maze problem.

The agent starts at S and must reach G, but knows nothing of the environment.

(b)
Two state spaces that might lead an online search agent into a dead end. Any given agent will fail in at
least one of these spaces.

A two-dimensional environment that can cause an online search agent to follow an arbitrarily inefficient
route to the goal. Whichever choice the agent makes, the adversary blocks that route with another long,
thin wall, so that the path followed is much longer than the best possible path.

Perhaps you find the term "accidentally" unconvincing-after all, there might be an algorithm that happens
not to take the dead-end path as it explores. Our claim, to be more precise, is that no algorithm can avoid
dead ends in all state spaces.

Consider the two dead-end state spaces in Figure (a). To an online search algorithm that has visited states
S and A, the two state spaces look identical, so it must make the same decision in both. Therefore, it will

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 12


fail in one of them. This is an example of an adversary argument-we can imagine an adversary that
constructs the state space while the agent explores it and can put the goals and dead ends wherever it
likes.

Dead ends are a real difficulty for robot exploration--staircases, ramps, cliffs, and all kinds of natural
terrain present opportunities for irreversible actions. To make progress, we will simply assume that the
state space is safely explorable-that is, some goal state is reachable from every reachable state. State
spaces with reversible actions, such as mazes and 8-puzzles, can be viewed as undirected graphs and are
clearly safely explorable.

Even in safely explorable environments, no bounded competitive ratio can be guaranteed if there are paths
of unbounded cost. This is easy to show in environments with irreversible actions, but in fact it remains
true for the reversible case as well, as Figure (b) shows. For this reason, it is common to describe the
performance of online search algorithms in terms of the size of the entire state space rather than just the
depth of the shallowest goal.

Online search agents:


After each action, an online agent receives a percept telling it what state it has reached; from this
information, it can augment its map of the environment. The current map is used to decide where to go
next. This interleaving of planning and action means that online search algorithms are quite different from
the offline search algorithms we have seen previously.

For example, offline algorithms such as A* have the ability to expand a node in one part of the space and
then immediately expand a node in another part of the space, because node expansion involves simulated
rather than real actions. An online algorithm, on the other hand, can expand only a node that it physically
occupies. To avoid traveling all the way across the tree to expand the next node, it seems better to expand
nodes in a local order.

Depth-first search has exactly this property, because (except when backtracking) the next node expanded
is a child of the previous node expanded.

An online depth-first search agent is shown in Figure. This agent stores its map in a table, result [a, s],
that records the state resulting from executing action an in states. When ever an action from the current
state has not been explored, the agent tries that action.

The difficulty comes when the agent has tried all the actions in a state. In offline depth-first search, the
state is simply dropped from the queue; in an online search, the agent has to backtrack physically.

In depth-first search, this means going back to the state from which the agent entered the current state
most recently. That is achieved by keeping a table that lists, for each state, the predecessor states to
which the agent has riot yet backtracked. If the agent has run out of states to which it can backtrack,
then its search is complete.

The progress of ONLINE-DFS-AGENT can be traced when applied to the maze given in Figure. It is fairly
easy to see that the agent will, in the worst case, end up traversing every link in the state space exactly
twice.

For exploration, this is optimal; for finding a goal, on the other hand, the agent's competitive ratio could be
arbitrarily bad if it goes off on a long excursion when there is a goal right next to the initial state. An
online variant of iterative deepening solves this problem; for an environment that is a uniform tree, the
competitive ratio of such an agent is a small constant.

Because of its method of backtracking, ONLINE-DFS-AGENT works only in state spaces where the actions
are reversible. There are slightly more complex algorithms that work in general state spaces, but no such

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 13


algorithm has a bounded competitive ratio.

Figure: An online search agent that uses depth-first exploration. The agent is applicable only in bidirected
search spaces.

Online local search:

Like depth-first search, hill-climbing search has the property of locality in its node expansions. In fact,
because it keeps just one current state in memory, hill-climbing search is already an online search
algorithm! Unfortunately, it is not very useful in its simplest form because it leaves the agent sitting at
local maxima with nowhere to go. Moreover, random restarts cannot be used, because the agent cannot
transport itself to a new state.

Instead of random restarts, one might consider using a random walk to explore the environment. A
random walk simply selects at random one of the available actions from the current state; preference can
be given to actions that have not yet been tried. It is easy to prove that a random walk will eventually find
a goal or complete its exploration, provided that the space is finite.15 On the other hand, the process can
be very slow. Figure shows an environment in which a random walk will take exponentially many steps to
find the goal, because, at each step, backward progress is twice as likely as forward progress.

The example is contrived, of course, but there are many real-world state spaces whose topology causes
these kinds of "traps" for random walks.

Augmenting hill climbing with memory rather than randomness turns out to be a more effective approach.
The basic idea is to store a "current best estimate" H(s) of the cost to reach the goal from each state that
has been visited. H(s) starts out being just the heuristic

Figure - An environment in which a random walk will take exponentially many steps to find the goal.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 14


Estimate h(s) and is updated as the agent gains experience in the state space. Figure shows a simple
example in a one-dimensional state space. In (a), the agent seems to be stuck in a flat local minimum at
the shaded state. Rather than staying where it is, the agent should follow what seems to be the best path
to the goal based on the current cost estimates for its neighbors. The estimated cost to reach the goal
through a neighbor s is the cost to get to s plus the estimated cost to get to a goal from there-that is, c(s,
a, s) + H(st).

In the example, there are two actions with estimated costs 1 + 9 and 1 + 2, so it seems best to move right.
Now, it is clear that the cost estimate of 2 for the shaded state was overly optimistic. Since the best move
cost 1 and led to a state that is at least 2 steps from a goal, the shaded state must be at least 3 steps from
a goal, so its H should be updated accordingly, as shown in Figure. Continuing this process, the agent will
move back and forth twice more, updating H each time and "flattening out" the local minimum until it
escapes to the right.

An agent implementing this scheme, which is called learning real- time A* (LRTA*), is shown in Figure.
Like ONLINE-DFS-AGENT, it builds a map of the environment using the result table. It updates the cost
estimate for the state it has just left and then chooses the "apparently best" move according to its current
cost estimates. One important detail is that actions that have not yet been tried in a state s are always
assumed to lead immediately to the goal with the least possible cost, namely h(s). This optimism under
uncertainty encourages the agent to explore new, possibly promising paths.

Learning in online search:

The initial ignorance of online search agents provides several opportunities for learning. First, the agents
learn a "map" of the environment-more precisely, the outcome of each action in each state-simply by
recording each of their experiences. (Notice that the assumption of deterministic environments means that
one experience is enough for each action.) Second, the local search agents acquire more accurate
estimates of the value of each state by using local updating rules.

Five iterations of LRTA* on a one-dimensional state space. Each state is labeled with H(s), the current cost
estimate to reach a goal, and each arc is labeled with its step cost. The shaded state marks the location of
the agent, and the updated values at each iteration are circled.

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 15


LRTA*-AGENT selects an action according to the values of neighboring states, which are updated as the
agent moves about the state space.

These updates eventually converge to exact values for every state, provided that the agent explores the
state space in the right way. Once exact values are known, optimal decisions can be taken simply by
moving to the highest-valued successor-that is, pure hill climbing is then an optimal strategy.

If you followed our suggestion to trace the behavior of ONLINE-DFS- AGENT in the environment, you will
have noticed that the agent is not very bright. For example, after it has seen that the Up action goes from
(1,l) to (1,2), the agent still has no idea that the Down action goes back to (1,1), or that the Up action
also goes from (2,l) to (2,2), from (2,2) to (2,3), and so on. In general, we would like the agent to learn that
Up increases the y-coordinate unless there is a wall in the way, which Down reduces it, and so on. For
this to happen, we need two things. First, we need a formal and explicitly representation for these kinds of
general rules; so far, we have hidden the information inside the black box called the successor function.

*******

Prepared by: S E SURESH, Dept of MCA, AITS AI – Unit 2 Page 16

You might also like