Summary
Introduces amortized analysis through the union find data structure. Three methods — aggregate, accounting, and potential — provide tools for bounding average cost over a sequence of operations. Union by rank gives ; adding path compression gives amortized via the potential method.
Union Find
We want a data structure for maintaining disjoint sets:
where for all . Each has a representative . We support 3 operations:
Make-set(x)— add to , withFind-set(x)— returnUnion(x,y)— replace and with , set representative
Example
Make-set(1).Make-set(2).Make-set(3).Find-set(2).Union(1,3). Representative is1.Find-set(3)since1is the representative of .
Linked List Implementation
Represent each set as a doubly-linked list with the representative as the head.
Make-set(x)— create a new linked list with just ()Find-set(x)— followprevpointers to the head ( worst case)Union(x,y)— append one list to the other ()
Adding head/tail pointers reduces Find-set to , but Union still requires remapping all pointers in one list — still worst case. We need a better analysis; this is where amortized analysis comes in.
Weighted Union (Amortized )
Definition
Amortized analysis bounds the worst-case average cost over a sequence of operations.
A data structure has amortized cost if any sequence of operations starting from ‘s initial state has total cost at most .
In other words, for operations with costs , we bound .
The optimization: always merge the shorter list into the longer one, so we only remap pointers for the shorter list.
Proof: amortized
Each time an element’s pointer is remapped, it’s in the shorter list, so its set size at least doubles. An element can double at most times before reaching size . Therefore each element is remapped at most times, and the total cost of all remappings is .
Over a sequence of operations, total cost is , giving amortized cost .
Methods of Amortized Analysis
- Aggregate Method — compute total cost over a sequence, divide by . This is what we just did.
- Accounting Method — “save” credit during cheap operations to “spend” on expensive ones.
- Potential Method — define a potential function over states; cheap operations build potential, expensive ones drain it.
The Potential Method
Definition
The Potential Method uses a function mapping each state to a number. For states , define .
Note
, i.e. we cannot drop below our starting potential.
The amortized cost of operation is . Summing:
Since :
So the amortized costs upper-bound the actual costs.
Forest-of-Trees Implementation
Instead of linked lists, treat each set as a tree where the root is the representative.
Make-set(x)— create a single-node tree ()Find-set(x)— walk up to the root ( worst case, not necessarily balanced)Union(x,y)— point one root to the other ()
Example
Make-set(1).Make-set(2).Make-set(3).Union(1,3): point root of{3}to root of{1}. NowFind-set(3)follows parent pointer from3to1and returns1.
Union by Rank
With union by rank, always point the shorter tree’s root to the taller tree’s root. This guarantees max height , so Find-set and Union both take .
Proof: max rank is
The rank only increases when two trees of equal rank merge.
Claim: a tree of rank has at least nodes. By induction:
- Base: rank 0 has 1 node .
- Inductive step: rank increases to only when merging two rank- trees. By hypothesis each has nodes, so the result has nodes.
Since a tree has at most nodes: .
Path Compression
During Find-set(x), as we traverse up to the root, we redirect every node along the path to point directly to the root. Future queries on these nodes become .
Amortized Analysis with Path Compression
We claim the amortized cost of Find-set with path compression is using the potential method. Define:
where is the number of nodes in the subtree rooted at .
Make-set(x): actual cost . . Amortized cost .
Find-set(x): let be the nodes on the path. Actual cost . After compression, all become children of the root. Each for loses the subtree below it on the path, so its size decreases. The root’s size is unchanged. We get:
Since each and sizes are nested, a telescoping argument gives:
Union(x, y): actual cost (just redirect a root pointer). The root’s size increases, giving . Amortized cost .
Inverse Ackermann Bound
The analysis above is not tight. Combining union by rank with path compression, the amortized cost per operation is , where is the inverse Ackermann function (Tarjan, 1975). Since for any practically conceivable input, this is effectively per operation.
Intuitively: union by rank keeps trees shallow ( height), so Find-set paths are already short. Path compression then flattens those paths so that future queries skip straight to the root. The two optimizations reinforce each other. Rank limits how tall trees can grow, and compression aggressively shortcuts whatever height remains. After a few operations, almost every node points directly (or nearly directly) to the root, so subsequent operations cost almost nothing. The tiny residual cost that can’t be eliminated is captured by .
The Potential Method in Detail
The three methods for amortized analysis are:
- Aggregate — compute total cost over a sequence, divide by number of operations.
- Accounting — pay extra during cheap operations, save credit to spend on expensive ones.
- Potential — define a potential function over states; we will mostly focus on this method.
The potential method works by analogy to potential energy in physics. Assign a potential to each state of the data structure as it evolves: . The amortized cost of the -th operation is the actual cost plus the change in potential:
Summing over all operations, this telescopes: . If we choose so that for all , then , so the total amortized cost upper-bounds the total actual cost.
The intuition is that cheap operations increase the potential (storing “credit”), while expensive operations drain it (spending that credit) — similar to the accounting method. A well-chosen makes the change in potential offset the actual cost, keeping each small.
Note that this is not average-case analysis, as we make no assumptions about input distribution. We are bounding the worst-case total cost of any sequence of operations.
Example — Dynamic List
We have a list with operations
appendanddelete. When the list is full, we double the size and copy everything. When we reach capacity, we halve it. Both resizing operations cost .Let , where is the array size and is the number of filled slots.
- No resize: actual cost is , and , so .
- Double: actual cost is . Before the resize . After, . So .
- Halve: by a symmetric argument, as well.