9.3 Selection in worst-case linear time

9.3-1

Let's assume the input elements are divided into groups of k. Similar like the analysis in the book, at least half of the $\lceil \frac{n}{k} \rceil$ groups contribute at least $\lceil \frac{k}{2} \rceil$ elements that are greater than x, except for the one group that has fewer than k elements if k does not divide n exactly, and the one group containing x itself. Discounting these two groups, it follows that the number of elements greater than x is at least $\lceil \frac{k}{2} \rceil(\lceil \frac{1}{2} \lceil \frac{n}{k} \rceil \rceil - 2) \geq \frac{k}{2}(\frac{1}{2}\frac{n}{k} - 2) = \frac{n}{4} - k$.

Similarly, at least $\frac{n}{4} - k$ elements are less than x. Thus, in the worst case, step 5 calls SELECT recursively on at most $\frac{3n}{4} + k$ elements.

So when n is greater than some constant, we have $T(n) \leq T(\lceil \frac{n}{k} \rceil) + T(\frac{3n}{4} + k) + O(n)$. We assume T(n) runs in linear time, substituting it into the recurrence yields:

$$ \begin{eqnarray} T(n) &\leq& T(\lceil \frac{n}{k} \rceil) + T(\frac{3n}{4} + k) + O(n) \\ &\leq& T(\lceil \frac{n}{k} \rceil) + T(\frac{3n}{4} + k) + an \\ &\leq& c\frac{n}{k} + c(\frac{3n}{4} + k) + an \\ &=& cn + (\frac{c}{k}n + an - \frac{c}{4}n + ck) \\ &\leq& cn \end{eqnarray} $$

where the last step holds as long as $\frac{c}{k}n + an - \frac{c}{4}n + ck \leq 0$. So we need to find some k such that there exists constants c and a such that $\frac{c}{k}n + an - \frac{c}{4}n + ck \leq 0$.

We have $\frac{c}{k}n + an - \frac{c}{4}n + ck = c(\frac{n}{k} - \frac{n}{4} + k) + an \leq 0$. Because both c and a are positive, so it could only be $\frac{n}{k} - \frac{n}{4} + k \leq 0$. Let $f(k) = \frac{n}{k} - \frac{n}{4} + k$, so $f(4) = 4 > 0, f(5) = -\frac{n}{20} + 5 \leq 0 \text{ when } n \geq 100$. So we can always find a $n_0$ such that $f(k) \leq 0$ when $k \geq 5$.

Thus the algorithm work in linear time if the input elements are divided into groups of 7, but doesn't run in linear time if they are divided into groups of 3.

9.3-2

We already know that there are at least $\frac{3n}{10} - 6$ elements are less (greater) than x, because $\lceil \frac{n}{4} \rceil < \frac{n}{4} + 1$, let $\frac{3n}{10} - 6 \geq \frac{n}{4} + 1$, we get $n \geq 140$.

9.3-3

In the previous question, we already know that if $n \geq 140$, then at least $\lceil \frac{n}{4} \rceil$ elements are less (greater) than x. So the partition procedure splits the problem size to $\frac{n}{4}$ and $\frac{3n}{4}$ in quicksort. Thus, $T(n) = T(\frac{n}{4}) + T(\frac{3n}{4}) + \Theta(n)$. And the solution is $T(n) = \Theta(n\lg{n})$, which is solved in exercise 4.4-9.

9.3-4

9.3-5

The idea is similar.

SELECT(A, p, r, i)

if p == r
    return A[p]
median = BLACK-BOX-MEDIAN(A, p, r)
q = PARTITION(A, p, r, median)
k = q - p + 1
if i == k
    return A[q]
elseif i < k
    return SELECT(A, p, q - 1, i)
else
    return SELECT(A, q + 1, r, i - k)

First we find the median of the array, and use that median as pivot to partition the array. Then we recursively call the procedure to get the ith element. Now let's analysis the running time. The BLAK-BOX-MEDIAN takes O(n), and partition method also takes O(n), then it at least reduces the problem size to $\frac{n}{2}$, thus we have $T(n) = T(\frac{n}{2}) + O(n)$.

We guess T(n) = O(n), substituting it to the recurrence yielding:

$$ \begin{eqnarray} T(n) &=& T(\frac{n}{2}) + O(n) \\ &\leq& c\frac{n}{2} + dn \\ &=& (\frac{c}{2} + d)n \\ &=& O(n) \end{eqnarray} $$

So the algorithm solves the problem in linear time.

9.3-6

The k quantiles of an n-element set are $A[\lceil \frac{n}{k} \rceil], A[2\lceil \frac{n}{k} \rceil], \ldots, A[(k - 1)\lceil \frac{n}{k} \rceil]$. And the idea is simple, first we divide the elements into two parts, one part contains $\lceil \frac{k}{2} \rceil$ quantiles, and the other part contains $ k - \lceil \frac{k}{2} \rceil$ quantiles. Then we recursively call the procedure on the two parts. In order to divide the elements into two parts, we use the SELECT procedure to get the $\lceil \frac{k}{2} \rceil\lceil \frac{n}{k} \rceil$th smallest element, which is the last element of the $\lceil \frac{k}{2} \rceil$ quantiles.

KTH-QUANTILES(A, p, r, k)

if k == 1
    return
else:
    i = MATH.CEIL(k / 2) * SIZE-OF-ELEMENTS-IN-EACH-QUANTILE
    index, quantile = SELECT(A, p, r, i)
    OUTPUT quantile
    KTH-QUANTILES(A, p, index, MATH.CEIL(k / 2))
    KTH-QUANTILES(A, index + 1, r, k - MATH.CEIL(k / 2))

Now let's analysis the running time. We have $T(n, k) = 2T(\frac{n}{2}, \frac{k}{2}) + O(n)$. If we draw a recursion tree, we can see that each level of the tree costs O(n) and the depth of tree is $\lg{k}$, thus the running time is $O(n\lg{k})$.

9.3-7

First, we find the median of the set, it costs O(n), then we create another array that contains the absolute distance between the median and each element. Then we use the SELECT procedure to select the kth smallest element p in the new array, at last, we compare each element in S with median, if the distance between element and median is not greater than p, then the element one of the k closest elements. Each step costs O(n), so the algorithm runs in O(n).

K-CLOSEST(A, k)

i = MATH.CEIL(A.length / 2)
median = SELECT(A, i)
let B[1..n] be a new array
for j = 1 to n
    B[j] = MATH.ABS(A[j] - median)
kth = SELECT(B, k)
let C be a new array
for j = 1 to n
    if MATH.ABS(A[j] - median) < kth
        add A[j] to C
for j = 1 to n
    MATH.ABS(A[j] - median) == kth
        add A[j] to C
    if C.length >= k
        break
return C

Because the elements in A are all distinct, so each element in B have at most 1 duplicate. The reason to scan A twice to get C is that A is not sorted, if we combine the last two loops together, it is possible we cannot get the correct k closest elements to the median.

9.3-8

We can first get the medians of x and y, if the median of x is smaller than the median of y, then the median of all is in x's right part and y's left part, if the median of x is larger than the median of y, then the median of all is in x's left part and y's right part. So we can recursively solves the problem.

MEDIAN-OF-TWO-SORTED-ARRAYS(X, px, rx, Y, py, ry)

if rx - px == 1
    if X[px] <= Y[py]
        return MATH.MIN(X[rx], Y[py])
    else:
        return MATH.MIN(X[px], Y[ry])
median-index-of-x = MATH.FLOOR((px + rx) / 2)
median-index-of-y = MATH.FLOOR((py + ry) / 2)
if X[index] < Y[median-index-of-y]
    return MEDIAN-OF-TWO-SORTED-ARRAYS(X, index, rx, Y, py, median-index-of-y)
elif X[index] > Y[median-index-of-y]
    return MEDIAN-OF-TWO-SORTED-ARRAYS(X, px, index, Y, median-index-of-y, ry)
else:
    return X[index]

Each recursive procedure reduces the problem size to $\frac{n}{2}$, so we have $T(n) = T(\frac{n}{2}) + \Theta(1)$. Let's solve the recurrence by the master method. Here we have a = 1, b = 2, $f(n) = \Theta(1)$. so $n^{\log_b{a}} = 1$, thus $f(n) = \Theta(n^{\log_b{a}})$, so $T(n) = \Theta(n^{\log_b{a}}\lg{n}) = \Theta(\lg{n})$.

9.3-9

Suppose the n wells have y-coordinates $y_1, y_2, \ldots, y_n$, and we need to find $y_0$ such that the value $\sum_{i = 1}^{n}|y_i - y_0|$ is minimum. This is the 1-dimensional case of Geometric median, and the median minimize the sum. Explanation.