Summary


The idea of randomized algorithms is to use random numbers to make decisions.

  • Monte Carlo algorithms always give good runtime but might not be correct.
  • Las Vegas algorithms might be slow, but are always correct.

Matrix Products

𝐴 and 𝐵 are 𝑛×𝑛 matrices. We want to compute their product. The naive algorithm computes dot products between rows of 𝐴 and columns of 𝐵, taking 𝑂(𝑛3) time (𝑂(𝑛2) entries, each requiring 𝑂(𝑛) work). Strassen found an 𝑂(𝑛2.81) algorithm, with the best known bound today being 𝑂(𝑛2.371339).

Matrix Product Verification

Given matrices 𝐴, 𝐵, and a claimed product 𝐶, we want to verify whether 𝐴𝐵=𝐶. If 𝐴𝐵=𝐶, then for any vector 𝑣, 𝐴𝐵𝑣=𝐶𝑣. Checking this requires 3 matrix-vector products, with total runtime 𝑂(𝑛2).

Algorithm (mod 2 arithmetic, entries in {0,1}):

  1. Pick a random binary vector 𝑣.
  2. Compute ̂𝑣=𝐴(𝐵𝑣) and ̃𝑣=𝐶𝑣.
  3. If ̂𝑣=̃𝑣, output YES. Else output NO.

Example

𝐴=(0110),𝐵=(0101),𝐶=(1001). Here 𝐴𝐵=(1010)𝐶 (mod 2). Pick 𝑣=(10): 𝐵𝑣=(01), 𝐴(𝐵𝑣)=(10), 𝐶𝑣=(10). They match, so we incorrectly output YES. But pick 𝑣=(01): 𝐵𝑣=(01), 𝐴(𝐵𝑣)=(10), 𝐶𝑣=(01). They differ, so we correctly output NO.

This is a Monte Carlo algorithm since a fixed 𝑣 can miss errors. Making 𝑣 random gives a probabilistic guarantee:

Theorem

If 𝐴𝐵𝐶, the algorithm outputs NO with probability 12.

Rank Finding

Given set 𝑆 of 𝑛 distinct numbers, find 𝑥𝑆 with rank(𝑥)=𝑖. The deterministic algorithm works as follows:

  1. Pick a pivot 𝑥𝑆.
  2. Compute 𝐿={𝑦𝑆|𝑦<𝑥} and 𝐺={𝑦𝑆|𝑦>𝑥}. Then rank(𝑥)=|𝐿|+1.
  3. If rank(𝑥)=𝑖, return 𝑥. If rank(𝑥)>𝑖, recurse on 𝐿. Else find rank 𝑖rank(𝑥) in 𝐺.

Quick Select

Paranoid quick select adjusts the above by picking a random pivot and re-picking if it’s bad (i.e., rank(𝑥)<𝑛4 or rank(𝑥)>3𝑛4). We expect 2 selections to get a good pivot, giving:

𝑇(𝑛)𝑇(3𝑛4)+𝑂(𝑛)=𝑂(𝑛)(expected)

Quick Sort

Quick sort has less constant overhead and uses less memory than merge sort. The basic algorithm for distinct elements:

  1. Pick random pivot 𝑥𝐴.
  2. Compute 𝐿={𝑦𝐴|𝑦<𝑥} and 𝐺={𝑦𝐴|𝑦>𝑥}.
  3. Return QS(𝐿)+𝑥+QS(𝐺).

We again analyze a paranoid version: re-pick the pivot if rank(𝑥)<𝑛4 or rank(𝑥)>3𝑛4. Expected 2 selections until a good pivot. The recurrence is:

𝑇(𝑛)=max𝑛4𝑖3𝑛4{𝑇(𝑖)+𝑇(𝑛𝑖)}+𝑂(𝑛)𝑇(𝑛4)+𝑇(3𝑛4)+𝑂(𝑛)

Each level sums to 𝑛 work. The shortest root-to-leaf path is log4(𝑛) (always taking the 𝑛4 branch) and the longest is log43(𝑛) (always taking the 3𝑛4 branch). Both are 𝑂(log𝑛), so total work is 𝑂(𝑛log𝑛) in expectation.

High-Probability Bound

We can strengthen “in expectation” to “with high probability” using the union bound and Chernoff bound.

Chernoff bound: for a binomial variable 𝑌𝐵(𝑛,𝑝),

(𝑌(1+𝛽)𝔼[𝑌])𝑒𝛽2𝔼[𝑌]3

Let 𝑌𝑖𝑘 be an indicator r.v. for picking a bad pivot at recursion depth 𝑘 for the subarray containing element 𝑎𝑖. Since a pivot is bad with probability 12, we have 𝔼[𝑌𝑖𝑘]=0.5.

At recursion depth 𝐿, the probability of exceeding the expected number of bad pivots by 20%:

[𝐿𝑘=1𝑌𝑖𝑘(1+0.2)0.5𝐿]𝑒(0.2)20.5𝐿3=𝑒𝐿150

Set 𝐿=300ln𝑛. Then:

[𝐿𝑘=1𝑌𝑖𝑘0.6𝐿]𝑒2ln𝑛=1𝑛2

If instead 𝐿𝑘=1𝑌𝑖𝑘<0.6𝐿, then we had 0.4𝐿 good pivots. Each good pivot shrinks the subarray by a factor of 34, so the subarray containing 𝑎𝑖 at level 𝐿 has size 𝑛(34)0.4𝐿<1. This means after 300ln𝑛 levels, the recursion has terminated.

By the union bound over all 𝑛 elements, the failure probability is 𝑛1𝑛2=1𝑛. So with probability 11𝑛, quicksort runs in 𝑂(𝑛log𝑛) time.

Problem

We have a 1×1 square with some subset of area 𝐴. We support two operations:

  • Randomly sample a point uniformly in the square (𝑂(1)).
  • Check if a point is in 𝐴 (𝑂(1)).

a) Give a randomized algorithm to estimate the area of 𝐴.

b) How many trials 𝑛 do we need so that our estimate 𝑋[(1𝜀)𝐴,(1+𝜀)𝐴] with probability >1𝛿?

Problem - Seating Students

𝑛 students arrive one by one to a room with 𝑚 seats. Each student picks a seat uniformly at random. If it’s taken, they try one more random seat. If that’s also taken, they need help. Let 𝑁 be the number of students that need help.

a) Find 𝔼[𝑁]. (Hint: 𝑛1𝑖=0𝑖2𝑛33)

b) Suppose 𝑚=2𝑛. Prove [𝑁𝑛3]𝑒𝑛12.