Recursive Backtracking

Summary

Many problems can be solved with trial and error. Computers are good at this drudgery, but they are limited by their own power. We will explore ways to use recursion to reduce large problems to many smaller ones.

The Mixtape

The cassette tape democratized music by letting anyone record/copy music with a small device. A popular use was to make a mixtape, copying songs from many sources. However, each tape had a limited capacity, so we want to have as much music as possible. Our goal is to choose a set that comes closest to filling a tape.

Our naïve approach might look as follows:

song = list(songs.keys())[0]  # pick an arbitrary song
duration = songs[song]
return {song} | mixtape(
    {k: v for k, v in songs.items() if k != song},
    target_duration - duration,
)

This is a recursive set of songs with a set for the subproblem of filling the remaining duration. To make the dict of songs remaining, we used a dictionary comprehension. Since target_duration is decreasing, our natural base case might be:

if target_duration == 0:
    return set()

However, this code fails, since songs is empty once we reach the base case. Hence, we add:

if not songs:
    return None

We now also need to change our recursive case, since we no longer act under the assumption that our recursive result is a set:

recursive_result = mixtape(
    {k: v for k, v in songs.items() if k != song},
    target_duration - duration,
)
if recursive_result:
    return {song} | recursive_result
else:
    return None

This code doesn’t explore all possibilities. Our recursive call always assumes the current song is part of the solution, so our only possible solutions include the first song. We need to backtrack, or consider both the case that the song is and isn’t part of the solution.

if target_duration == 0:
    return set()
 
if not songs:
    return None
 
song = list(songs.keys())[0]
songs_rest = {k: v for k, v in songs.items() if k != song}
duration = songs[song]
 
# if the first song is part of a solution, we can fill the remaining
# space on the tape with the other songs
recursive_result1 = mixtape(songs_rest, target_duration - duration)
if recursive_result1 is not None:
    return {song} | recursive_result1
 
# if the first song is not part of the solution, we should try to fill
# the whole duration with the other songs
recursive_result2 = mixtape(songs_rest, target_duration)
if recursive_result2 is not None:
    return recursive_result2
 
# if there is no solution with the first song, and no solution without
# it, then there must be no solution. it's only now that we can safely
# return None to indicate failure.
return None

An Alternative Approach

Notice we could also mix recursion and iteration. Rather than recursing on the “exclude song” case, we can just iterate over which song to include.

We could also create a flood-fill-esque approach where we accumulate in the set so_far as we go:

def mixtape(songs, target_duration, so_far=frozenset()):
    so_far_duration = sum(songs[s] for s in so_far)
    if so_far_duration == target_duration:
        return _____  # < -- what should go here?
 
    if so_far_duration > target_duration:
        return None
 
    for song, duration in songs.items():
        recursive_result = mixtape(songs, target_duration, so_far | {song})
        if recursive_result is not None:
            return recursive_result
    return None

Aside, Default Arguments

In this version, so_far is a default argument. That means if the caller only passes songs and target_duration, Python automatically uses the default value for so_far. Here, that means the original call starts with an empty frozenset(), while the recursive calls pass an updated value explicitly.

This is a useful pattern in recursion. It lets us keep helper state in the function signature without forcing the original caller to provide it. The caller just asks for a mixtape of a certain duration, and the recursive process manages so_far internally.

There is one Python-specific trap here: default argument values are created once, when the function is defined, not each time the function is called. If that default value is mutable and the function changes it, later calls can accidentally share state from earlier ones.

That is why this example uses frozenset() instead of set(). Even though this code is building new sets with so_far | {song} rather than mutating so_far directly, using an immutable default is still the safer habit.

Sudoku

https://py.mit.edu/_static/spring26/readings/backtracking/sudoku.svg

We will try to solve sudoku as follows:

Find a 0 in the board.
Try filling it with one of the digits 1 through 9.
Recursively try to solve that updated board. If it works, we are done.
If it does not work, try the next digit.
If none of the digits work, then this path fails, so we return None.

Here is code that follows that strategy:

def solve_sudoku(board):
    """
    Given a sudoku board (as a list-of-lists of numbers, where 0
    represents an empty square), return a solved version of the puzzle.
    """
    for row in range(9):
        for col in range(9):
            if board[row][col] != 0:
                continue
            # Found a 0; try filling it
            for trial in range(1, 10):
                new_board = [[trial if (r, c) == (row, col) else board[r][c]
                              for c in range(9)]
                             for r in range(9)]
                if valid_board(new_board):
                    result = solve_sudoku(new_board)
                    if result is not None:
                        return result
            return None
    return board

This works, but it is a bit wasteful. When we find an empty square, we try every digit and only later ask valid_board whether that choice was obviously impossible. A person solving sudoku usually does the opposite: look at the row, column, and subgrid first, rule out impossible digits, and only then try the remaining ones.

To do that, it helps to write a few utility functions:

values_in_row(board, r) should return the nonzero values already present in row r.
values_in_column(board, c) should return the nonzero values already present in column c.
values_in_subgrid(board, sr, sc) should return the nonzero values already present in the 3 times 3 subgrid at subgrid row sr and subgrid column sc.

One possible implementation is:

def values_in_row(board, r):
    return {value for value in board[r] if value != 0}
 
 
def values_in_column(board, c):
    return {board[r][c] for r in range(9) if board[r][c] != 0}
 
 
def values_in_subgrid(board, sr, sc):
    start_row = 3 * sr
    start_col = 3 * sc
    return {
        board[r][c]
        for r in range(start_row, start_row + 3)
        for c in range(start_col, start_col + 3)
        if board[r][c] != 0
    }

These helper functions let us narrow the search before we recurse. That is the same general idea as base cases in other recursive problems: use the structure of the problem to avoid unnecessary work as early as possible.

Improving Efficiency, Mutation

We can push this a bit further. So far, every recursive call has built a fresh board. That is safer, but it also means we keep copying nested lists. Another option is to reuse the same board and mutate it in place.

That looks like this:

def solve_sudoku(board):
    """
    Given a sudoku board (as a list-of-lists of numbers, where 0
    represents an empty square), return a solved version of the puzzle.
    """
    for row in range(9):
        for col in range(9):
            if board[row][col] != 0:
                continue
            # Found a 0; try filling it
            for trial in valid_moves(board, row, col):
                board[row][col] = trial
                result = solve_sudoku(board)
                if result is not None:
                    return result
            board[row][col] = 0
            return None
    return board

This is very similar to the version above, but there is one critical new line:

board[row][col] = 0

That line undoes the mutation if a recursive path fails. Without it, later recursive calls would keep seeing the failed trial value and would act as if that number had always belonged in that square. Once we start mutating shared state, we also need to explicitly un-mutate it when we backtrack.

This is faster, but it is also riskier. Reusing the same board avoids repeated copying, but mutation makes aliasing bugs much easier to introduce and much harder to debug. In many cases, the simpler immutable version is still the better tradeoff.

For these examples, though, mutating the board does give another noticeable speedup:

board	original code	only valid moves	mutate instead of copy
grid1	0.125	0.014	0.004
grid2	1.297	0.123	0.044
grid3	1.545	0.149	0.050

Alexander's Notes

Explorer

Recursive Backtracking

The Mixtape

An Alternative Approach

Aside, Default Arguments

Sudoku

Improving Efficiency, Mutation

Graph View

Table of Contents

Recent Notes

Variance & Covariance

Expected Value

Representation Learning

Saving, Capital Accumulation, and Output

The Facts of Growth

Technological Progress and Growth

Recursive Backtracking

Introduction to Open Economy