Beyond Classical Search: Hill-Climbers, Hidden Worlds & Agents That Learn on the Fly

5 minute read

Published: July 26, 2023

If you’ve ever tried every twist of a Rubik’s cube,
searched for Wi-Fi by inching your laptop around,
or navigated a dark room with arms outstretched—
congrats! You already know local search, nondeterminism, and partial observability.

This gentle tour covers Chapter 4 of classic AI texts—the cool stuff that comes after Bread-and-Butter A*:

Local Search & Optimization
Continuous Spaces
Nondeterministic Actions
Partial Observations
Online Search in Unknown Worlds

No headache math—just tiny formulas, bite-size code, and cartoons-in-words.

1 Local Search Algorithms & Optimization (4 · 1)

Big idea: keep one state, tweak it, keep the better one—repeat.

Algorithm	Core move	Gets stuck?	Escape trick
Hill Climbing	Best neighbour	✔️	None
Simulated Annealing	Random neighbour, sometimes accept worse	🔥 lowers chance over time	Temperature
Genetic Algorithm	Recombine & mutate a population	Rare	Diversity

1·1 One-Screen Demo – Hill-Climb Eight Queens

import random

def conflicts(board):
    n = len(board)
    return sum(
        board[i] == board[j] or
        abs(board[i] - board[j]) == j - i
        for i in range(n) for j in range(i+1, n)
    )

def hill_climb(n=8, max_steps=1000):
    board = [random.randrange(n) for _ in range(n)]
    for _ in range(max_steps):
        if conflicts(board) == 0:
            return board
        # pick the queen with most clashes
        col = max(range(n), key=lambda c: sum(
            board[c] == board[j] or abs(board[c]-board[j]) == j-c
            for j in range(n) if j != c))
        # try every row in that column
        best_row = min(range(n), key=lambda r: conflicts(board[:col]+[r]+board[col+1:]))
        board[col] = best_row
    return None

print(hill_climb())

Run; most runs solve in milliseconds.

2 Local Search in Continuous Spaces (4 · 2)

When the world isn’t a chessboard but a smooth landscape, we use calculus-flavoured tweaks.

2·1 Gradient Descent, Micro-Edition

\[\theta_{t+1} = \theta_t - \eta \nabla J(\theta_t)\]

( \nabla J ) = slope of the cost hill
( \eta ) = tiny step size

Analogy: feel the slope under your boots and step downhill until level ground.

3 Searching with Nondeterministic Actions (4 · 3)

“Turn key → maybe car starts, maybe not.”

Model	Plan shape	Goal test
AND-OR tree	Branches when any vs all outcomes matter	Find a subtree that guarantees success
Min-Max-Cost	Costs depend on worst outcome	Choose safest path

Small trick: replace nodes with sets of possible states → plan for all outcomes.

4 Searching with Partial Observations (4 · 4)

You’re in a fog; every step reveals a few metres.

Keep a belief state ( B ) = probability distribution over where you might be.
Actions map ( B \to B’ ) via prediction; sensors shrink ( B’ ) via update (Bayes).

4·1 Tiny Code – Updating a Belief Grid

import numpy as np
def sense_update(belief, sensor, world, p_correct=0.8):
    hit = (world == sensor)
    return normalise(belief * (hit * p_correct + (~hit) * (1-p_correct)))

def normalise(m):
    s = m.sum()
    return m / s if s else m

Belief stays a 2-D matrix; each move is a convolution, each sensor ping a pixel-wise multiply.

5 Online Search Agents & Unknown Environments (4 · 5)

Plan a bit, act a bit, learn a bit—like playing a new video game level.

Algorithm	Key memory	Behaviour
LRTA* (Learning Real-Time A*)	Table of local heuristic corrections	Walks, updates cost estimate, never stops improving
Real-Time Dynamic A*	Maintains a frontier under time cap	Pauses to think when safe; otherwise greedy

5·1 Micro-Demo – LRTA* in a Maze

(Pseudo-code)

h[s] ← heuristic(s)
loop:
    if s is goal: break
    choose a' ∈ N(s) minimising c(s,a',s') + h[s']
    h[s] ← c(s,a',s') + h[s']
    s ← s'

The agent learns a better heuristic on the fly—next time it revisits, it’s smarter.

6 Cheat-Sheet 🧾

Local search    = Keep one (or few) states and tweak
Hill climbing   = Greedy climb up/down the slope
Simulated anneal= Accept some bad moves early, cool later
Genetic alg.    = Evolve a population with crossover & mutation
Gradient descent= Local search in ℝ^n using derivatives

Nondeterminism  = Actions give branches → AND-OR plans
Partial info    = Track belief state (probabilities)
Online search   = Interleave planning & execution; learn h(n) on the run
LRTA*           = A* that repairs its own heuristic table

7 Try It Yourself 🧪

Simulated-Anneal Eight-Queens
Replace hill-climb loop with a temperature schedule. Plot “energy” over time.
Hidden Robot
On a 10×10 grid, hide the agent’s coordinates; give noisy distance sensors; implement the belief update.
Online Maze Race
Compare A* (full map) vs LRTA* (unknown map) on a 30×30 random maze. Count moves before goal.

🚀Final Words

When plain old A* hits a wall—too big, too uncertain, too slippery—Chapter 4 tools step in:

Local search when the map is huge.
Gradient tricks when the world is smooth.
AND-OR plans when fate rolls dice.
Belief states when your eyes are fogged.
Online learners when the map isn’t even drawn yet.

Master these and your agents won’t just find paths—they’ll invent them as they go.

Happy exploring, and may your local optimum always have an escape tunnel!

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Shamsuddin Ahmed