Summary

Formalizes the flood fill and pathfinding ideas from last week into graph search. Defines graphs, then presents a general agenda-based search algorithm. The key insight: pulling the newest path gives DFS, pulling the oldest gives BFS. BFS guarantees shortest paths but uses more memory; DFS finds a path with less memory but no optimality guarantee.


Last week, we discussed flood fill, where we recolored a region of similarly colored cells, and path finding, where we returned a path through a maze. We will be building on those ideas and formalizing them, particularly with graph search algorithms.

What is a graph?

A graph is a set of vertices for each object we represent, and edges that represent relationships between them.

Note

When we don’t have arrows, we treat the edges as bidirectional.

Some questions we might want to ask about our graph are: what are all the objects connected to a node? What objects can I reach starting from a node using exactly two connections? Arbitrary connections?

We can solve all of these with graph search approaches, but we will focus on pathfinding algorithms.

Example - The 15 Puzzle

The goal is to order the tiles. Consider making a graph where each vertex represents a state, and vertices are connected by edges representing individual moves we can take: If we draw the full graph, we would eventually reach our sorted state.

Unfortunately, there are 16! arrangements, so this search would take an extremely long time.

Pathfinding in the Abstract

In Flood Fill and Maze Path Finding, we maintained an “agenda” of pixels we eventually needed to color in, and repeatedly pulled a pixel off, colored it, and added neighboring pixels of the original color back to the agenda. We will do something similar here, but our agenda will consist of paths. Below is the rough pseudocode:

  • Initialize an agenda (list of paths to consider) with a single path containing just our starting state
  • Initialize a visited set with the starting state
  • Repeat:
    • Remove one path from the agenda
    • For each neighbor 𝑛 of 𝑠𝑁 (each 𝑛 connected to 𝑠𝑁, the last vertex in the path):
      • If 𝑛 visited, ignore
      • Else if 𝑛 satisfies the goal, return the path containing 𝑛
      • Else add 𝑛 to visited set and add the path containing 𝑛 to the agenda
  • until we either find a path or the agenda is empty, i.e. no path exists.

One thing to notice is that we arbitrarily remove the first path in our agenda. It turns out that the order of consideration has a big effect on the nature of the paths we return. We will use the following example:

Replace First Path on Agenda with Children

Here, we pull the first (left-most) path every time. We would get to 𝐼 via ABCFI. Even though 𝐼 comes after 𝐸, 𝐼 satisfies the condition so we stop looping. This is a depth-first search, as we go deeper into the graph before exploring all neighbors at a given distance from a node.

Replace Last Path on Agenda with Children

Here, we pull the last path each time. To get to 𝐼, we would take ADGHI. This is also depth-first search, though with a different strategy.

Remove First Path, Add Children to End

This is breadth-first search because we explore the breadth of the tree before going deep. Our path is now ABCFI.

In short, these slight variations give very different outcomes. The takeaway is that when we consider the newest path, we end up doing depth-first search, and if we consider the oldest path, we are doing breadth-first.

BFS vs. DFS

To implement BFS, we add/remove elements from opposite sides of the agenda. This is known as “first-in, first-out” (FIFO). This method will always return a shortest path if such a path exists. Since we consider all paths of length 𝑛 before considering paths of length 𝑛+1, we know that the path is optimal. This is a double-edged sword: if the graph is infinite with no solution, it can run forever.

To implement DFS, we add/remove elements from the same side of the agenda. This is known as “last-in, first-out” (LIFO). DFS is guaranteed to find a path (but not necessarily the optimal one), and may also run forever on an infinite graph, though this time even if a solution exists.

DFS might seem like the poorer choice in general, but it uses a lot less memory, so if we don’t care about finding the optimal path, it can be the better tool to wield.

Pathfinding in Python