Summary

Described and analyzed a greedy solution to Interval Scheduling. Then, introduced a weighted variant which required a dynamic programming solution, and introduced online interval scheduling as an additional interesting problem.


Introduction

We will be covering design and analysis of algorithms. For design, we will look through techniques (greedy, divide-and-conquer, LP, etc.), paradigms (randomized, online, sublinear, sketching, sampling, etc.), and more. For analysis, we’ll look at amortized, competitive, approximation, and much more. Additionally, we will talk about designability of algorithms, namely intractability and reductions.

Interval Scheduling

The idea is that we have 𝑛 requests 𝑅={𝑟𝑖:𝑖=1,,𝑛}, each wanting a single resource for an interval 𝑟𝑖=(𝑎𝑖,𝑏𝑖) and 𝑎𝑖<𝑏𝑖 are real. Find a largest set of mutually compatible requests.

  • Two requests are mutually compatible if they don’t intersect; else, incompatible
  • A set of >2 requests is mutually compatible iff each pair is

Example

𝑟1=(1,2),𝑟2=(1,3),𝑟3=(1,4). Choosing any one is the optimal solution.

We can draw out these intervals on a number line, though it becomes harder to parse as the intervals become more numerous.

We will design a greedy meta-algorithm (main idea, but not complete) to solve this problem.

  1. Set 𝑆=
  2. Repeat until 𝑅=:
    1. Use the greedy rule to pick 𝑟𝑅. 𝑂(𝑛log(𝑛)) sort
    2. If 𝑟 compatible with 𝑆, 𝑆=𝑆{𝑟} 𝑂(𝑛)
    3. Set 𝑅=𝑅{𝑟}
  3. Output 𝑆 Our choice of the greedy rule determines the correctness of the algorithm. Choosing earliest start time is bad, as you can have a situation where the earliest interval takes up the whole interval, for instance.

Shortest interval first also is incorrect: you might have a short interval intersecting two otherwise compatible long intervals.

The correct greedy choice is earliest end time. We will use the exchange argument to prove the greedy choice:

The final runtime is 𝑂(𝑛log(𝑛)).

Weighted Interval Scheduling Problem

This is the same problem as before, but each request has a weight 𝑤𝑖. We want to find the heaviest set of mutually compatible requests.

Notice that our current greedy choice no longer works. The reason is left as an exercise. Instead, we need to use the optimal substructure property and build a dynamic program.

Let us restrict to finding the weight of the optimal solution. Given 𝑅, let 𝑊(𝑅) denote the maximum possible total weight for a set of mutually compatible requests in 𝑅. For any 𝑟𝑅 of weight 𝑤, let INC(𝑅) denote the subset of requests from 𝑅 that are incompatible.

𝑊(𝑅)=max{𝑊(𝑅{𝑟}),𝑤+𝑊(𝑅INC(𝑟))}

This expression essentially says that, in the first case, if the optimal solution doesn’t have 𝑟, we can exclude it from 𝑅. If it is a part of the optimal solution, we add its weight along with the remainder of 𝑅 excluding incompatible elements with 𝑟. For size 𝑛, we have:

𝑊(𝑅)=max{𝑊(𝑅{𝑟𝑛}),𝑤𝑛+𝑊(𝑅INC(𝑟𝑛))}

Claim

INC(𝑟𝑛)={𝑟𝑗:𝑗𝑡𝑛} for some index 𝑡𝑛.

Why? Since requests are sorted by end time, 𝑟𝑛 ends last. Any request overlapping 𝑟𝑛 must start before 𝑟𝑛 ends. These form a contiguous suffix {𝑟𝑡𝑛,,𝑟𝑛1}: all requests from index 𝑡𝑛 onward overlap with 𝑟𝑛.

𝑊(𝑅)=max{𝑊({𝑟1,,𝑟𝑛1}),𝑤𝑛+𝑊({𝑟𝑗:𝑗<𝑡𝑛})}

The left is with 𝑟𝑛 excluded, and the second with it included. 𝑡𝑛 is the first interval incompatible with 𝑟, so we keep the prefix when including 𝑤𝑛.

𝑊({𝑟1,,𝑟𝑛1})=max{𝑊({𝑟1,,𝑟𝑛2}),𝑤𝑛1+𝑊({𝑟𝑗:𝑗<𝑡𝑛1})}

We can continue unwinding. For the DP solution:

  1. Sort 𝑅 by end time. 𝑂(𝑛log(𝑛))
  2. Find 𝑡𝑖 for 𝑖=1,,𝑛 𝑂(𝑛log(𝑛)) through binary search
  3. For 𝑖=1,,𝑛 compute
𝑊({𝑟1,,𝑟𝑖})=max{𝑊({𝑟1,,𝑟𝑖1}),𝑤𝑖+𝑊({𝑟𝑗:𝑗<𝑡𝑛1})}

𝑂(𝑛) linear sweep 4) Output 𝑊({𝑟1,,𝑟𝑛})=𝑊(𝑅) So total runtime is 𝑂(𝑛log(𝑛)).

Online Interval Scheduling Problem

Here, we don’t have all the intervals upfront. They will come in one at a time. We can think of it as an adversary who wants to make the problem harder. We can either change the problem or setting: today, we’ll change the setting.

Say the set of intervals is fixed. Someone shuffles the intervals in a random way. With what chance are things going to go badly with some greedy choice? We need probabilistic methods to solve this. In this case, if we randomly get 𝑟2 first, it’s optimal.