001/*
002 * Licensed to the Apache Software Foundation (ASF) under one or more
003 * contributor license agreements.  See the NOTICE file distributed with
004 * this work for additional information regarding copyright ownership.
005 * The ASF licenses this file to You under the Apache License, Version 2.0
006 * (the "License"); you may not use this file except in compliance with
007 * the License.  You may obtain a copy of the License at
008 *
009 *      http://www.apache.org/licenses/LICENSE-2.0
010 *
011 * Unless required by applicable law or agreed to in writing, software
012 * distributed under the License is distributed on an "AS IS" BASIS,
013 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
014 * See the License for the specific language governing permissions and
015 * limitations under the License.
016 */
017
018package org.apache.commons.numbers.arrays;
019
020/**
021 * Select indices in array data.
022 *
023 * <p>Arranges elements such that indices {@code k} correspond to their correctly
024 * sorted value in the equivalent fully sorted array. For all indices {@code k}
025 * and any index {@code i}:
026 *
027 * <pre>{@code
028 * data[i < k] <= data[k] <= data[k < i]
029 * }</pre>
030 *
031 * <p>Examples:
032 *
033 * <pre>
034 * data    [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7]
035 *
036 *
037 * k=4   : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7]
038 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7]
039 * </pre>
040 *
041 * <p>This implementation can select on multiple indices and will handle duplicate and
042 * unordered indices. The method detects ordered indices (with or without duplicates) and
043 * uses this during processing. Passing ordered indices is recommended if the order is already
044 * known; for example using uniform spacing through the array data, or to select the top and
045 * bottom {@code n} values from the data.
046 *
047 * <p>A quickselect adaptive method is used for single indices. This uses analysis of the
048 * partition sizes after each division to update the algorithm mode. If the partition
049 * containing the target does not sufficiently reduce in size then the algorithm is
050 * progressively changed to use partitions with guaranteed margins. This ensures a set fraction
051 * of data is eliminated each step and worse-case linear run time performance. This method can
052 * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of
053 * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at
054 * the edge of the partition bounded by {@code ka}.
055 *
056 * <p>Multiple keys are partitioned collectively using an introsort method which only recurses
057 * into partitions containing indices. Excess recursion will trigger use of a heapselect
058 * on the remaining range of indices ensuring non-quadratic worse case performance. Any
059 * partition containing a single index, adjacent pair of indices, or range of indices with a
060 * small separation will use the quickselect adaptive method for single keys. Note that the
061 * maximum number of times that {@code n} indices can be split is {@code n - 1} before all
062 * indices are handled as singles.
063 *
064 * <p>Floating-point order
065 *
066 * <p>The {@code <} relation does not impose a total order on all floating-point values.
067 * This class respects the ordering imposed by {@link Double#compare(double, double)}.
068 * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is
069 * considered greater than any other value; and all {@code Double.NaN} values are
070 * considered equal.
071 *
072 * <p>References
073 *
074 * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n}
075 * using repeat division of the data around a partition element, recursing into the
076 * partition that contains {@code k}.
077 *
078 * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in
079 * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst
080 * case bound.
081 *
082 * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is
083 * performed in the SELECT algorithm of Floyd and Rivest [3, 4].
084 *
085 * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses
086 * the median of medians as a partition element for selection which ensures a minimum fraction of
087 * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice
088 * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of
089 * Alexandrescu [6].
090 *
091 * <ol>
092 * <li>Hoare (1961)
093 * Algorithm 65: Find
094 * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a>
095 * <li>Musser (1999)
096 * Introspective Sorting and Selection Algorithms
097 * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23">
098 * Software: Practice and Experience 27, 983-993.</a>
099 * <li>Floyd and Rivest (1975)
100 * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements.
101 * Comm. ACM. 18 (3): 173.
102 * <li>Kiwiel (2005)
103 * On Floyd and Rivest's SELECT algorithm.
104 * Theoretical Computer Science 347, 214-238.
105 * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973)
106 * Time bounds for selection.
107 * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9">
108 * Journal of Computer and System Sciences. 7 (4): 448–461</a>.
109 * <li>Alexandrescu (2016)
110 * Fast Deterministic Selection
111 * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>.
112 * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a>
113 * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a>
114 * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a>
115 * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a>
116 * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a>
117 * </ol>
118 *
119 * @since 1.2
120 */
121public final class Selection {
122
123    /** No instances. */
124    private Selection() {}
125
126    /**
127     * Partition the array such that index {@code k} corresponds to its correctly
128     * sorted value in the equivalent fully sorted array.
129     *
130     * @param a Values.
131     * @param k Index.
132     * @throws IndexOutOfBoundsException if index {@code k} is not within the
133     * sub-range {@code [0, a.length)}
134     */
135    public static void select(double[] a, int k) {
136        IndexSupport.checkIndex(0, a.length, k);
137        doSelect(a, 0, a.length, k);
138    }
139
140    /**
141     * Partition the array such that indices {@code k} correspond to their correctly
142     * sorted value in the equivalent fully sorted array.
143     *
144     * @param a Values.
145     * @param k Indices (may be destructively modified).
146     * @throws IndexOutOfBoundsException if any index {@code k} is not within the
147     * sub-range {@code [0, a.length)}
148     */
149    public static void select(double[] a, int[] k) {
150        IndexSupport.checkIndices(0, a.length, k);
151        doSelect(a, 0, a.length, k);
152    }
153
154    /**
155     * Partition the array such that index {@code k} corresponds to its correctly
156     * sorted value in the equivalent fully sorted array.
157     *
158     * @param a Values.
159     * @param fromIndex Index of the first element (inclusive).
160     * @param toIndex Index of the last element (exclusive).
161     * @param k Index.
162     * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
163     * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
164     * sub-range {@code [fromIndex, toIndex)}
165     */
166    public static void select(double[] a, int fromIndex, int toIndex, int k) {
167        IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
168        IndexSupport.checkIndex(fromIndex, toIndex, k);
169        doSelect(a, fromIndex, toIndex, k);
170    }
171
172    /**
173     * Partition the array such that indices {@code k} correspond to their correctly
174     * sorted value in the equivalent fully sorted array.
175     *
176     * @param a Values.
177     * @param fromIndex Index of the first element (inclusive).
178     * @param toIndex Index of the last element (exclusive).
179     * @param k Indices (may be destructively modified).
180     * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
181     * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
182     * sub-range {@code [fromIndex, toIndex)}
183     */
184    public static void select(double[] a, int fromIndex, int toIndex, int[] k) {
185        IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
186        IndexSupport.checkIndices(fromIndex, toIndex, k);
187        doSelect(a, fromIndex, toIndex, k);
188    }
189
190    /**
191     * Partition the array such that index {@code k} corresponds to its correctly
192     * sorted value in the equivalent fully sorted array.
193     *
194     * <p>This method pre/post-processes the data and indices to respect the ordering
195     * imposed by {@link Double#compare(double, double)}.
196     *
197     * @param fromIndex Index of the first element (inclusive).
198     * @param toIndex Index of the last element (exclusive).
199     * @param a Values.
200     * @param k Index.
201     */
202    private static void doSelect(double[] a, int fromIndex, int toIndex, int k) {
203        if (toIndex - fromIndex <= 1) {
204            return;
205        }
206        // Sort NaN / count signed zeros.
207        // Caution: This loop contributes significantly to the runtime.
208        int cn = 0;
209        int end = toIndex;
210        for (int i = toIndex; --i >= fromIndex;) {
211            final double v = a[i];
212            // Count negative zeros using a sign bit check
213            if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
214                cn++;
215                // Change to positive zero.
216                // Data must be repaired after selection.
217                a[i] = 0.0;
218            } else if (v != v) {
219                // Move NaN to end
220                a[i] = a[--end];
221                a[end] = v;
222            }
223        }
224
225        // Partition
226        if (end - fromIndex > 1 && k < end) {
227            QuickSelect.select(a, fromIndex, end - 1, k);
228        }
229
230        // Restore signed zeros
231        if (cn != 0) {
232            // Use partition index below zero to fast-forward to zero as much as possible
233            for (int j = a[k] < 0 ? k : -1;;) {
234                if (a[++j] == 0) {
235                    a[j] = -0.0;
236                    if (--cn == 0) {
237                        break;
238                    }
239                }
240            }
241        }
242    }
243
244    /**
245     * Partition the array such that indices {@code k} correspond to their correctly
246     * sorted value in the equivalent fully sorted array.
247     *
248     * <p>This method pre/post-processes the data and indices to respect the ordering
249     * imposed by {@link Double#compare(double, double)}.
250     *
251     * @param fromIndex Index of the first element (inclusive).
252     * @param toIndex Index of the last element (exclusive).
253     * @param a Values.
254     * @param k Indices (may be destructively modified).
255     */
256    private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) {
257        if (k.length == 0 || toIndex - fromIndex <= 1) {
258            return;
259        }
260        // Sort NaN / count signed zeros.
261        // Caution: This loop contributes significantly to the runtime for single indices.
262        int cn = 0;
263        int end = toIndex;
264        for (int i = toIndex; --i >= fromIndex;) {
265            final double v = a[i];
266            // Count negative zeros using a sign bit check
267            if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
268                cn++;
269                // Change to positive zero.
270                // Data must be repaired after selection.
271                a[i] = 0.0;
272            } else if (v != v) {
273                // Move NaN to end
274                a[i] = a[--end];
275                a[end] = v;
276            }
277        }
278
279        // Partition
280        int n = 0;
281        if (end - fromIndex > 1) {
282            n = k.length;
283            // Filter indices invalidated by NaN check
284            if (end < toIndex) {
285                for (int i = n; --i >= 0;) {
286                    final int index = k[i];
287                    if (index >= end) {
288                        // Move to end
289                        k[i] = k[--n];
290                        k[n] = index;
291                    }
292                }
293            }
294            // Return n, the count of used indices in k.
295            // Use this to post-process zeros.
296            n = QuickSelect.select(a, fromIndex, end - 1, k, n);
297        }
298
299        // Restore signed zeros
300        if (cn != 0) {
301            // Use partition indices below zero to fast-forward to zero as much as possible
302            int j = -1;
303            if (n < 0) {
304                // Binary search on -n sorted indices: hi = (-n) - 1
305                int lo = 0;
306                int hi = ~n;
307                while (lo <= hi) {
308                    final int mid = (lo + hi) >>> 1;
309                    if (a[k[mid]] < 0) {
310                        j = mid;
311                        lo = mid + 1;
312                    } else {
313                        hi = mid - 1;
314                    }
315                }
316            } else {
317                // Unsorted, process all indices
318                for (int i = n; --i >= 0;) {
319                    if (a[k[i]] < 0) {
320                        j = k[i];
321                    }
322                }
323            }
324            for (;;) {
325                if (a[++j] == 0) {
326                    a[j] = -0.0;
327                    if (--cn == 0) {
328                        break;
329                    }
330                }
331            }
332        }
333    }
334
335    /**
336     * Partition the array such that index {@code k} corresponds to its correctly
337     * sorted value in the equivalent fully sorted array.
338     *
339     * @param a Values.
340     * @param k Index.
341     * @throws IndexOutOfBoundsException if index {@code k} is not within the
342     * sub-range {@code [0, a.length)}
343     */
344    public static void select(int[] a, int k) {
345        IndexSupport.checkIndex(0, a.length, k);
346        if (a.length <= 1) {
347            return;
348        }
349        QuickSelect.select(a, 0, a.length - 1, k);
350    }
351
352    /**
353     * Partition the array such that indices {@code k} correspond to their correctly
354     * sorted value in the equivalent fully sorted array.
355     *
356     * @param a Values.
357     * @param k Indices (may be destructively modified).
358     * @throws IndexOutOfBoundsException if any index {@code k} is not within the
359     * sub-range {@code [0, a.length)}
360     */
361    public static void select(int[] a, int[] k) {
362        IndexSupport.checkIndices(0, a.length, k);
363        if (k.length == 0 || a.length <= 1) {
364            return;
365        }
366        QuickSelect.select(a, 0, a.length - 1, k, k.length);
367    }
368
369    /**
370     * Partition the array such that index {@code k} corresponds to its correctly
371     * sorted value in the equivalent fully sorted array.
372     *
373     * @param a Values.
374     * @param fromIndex Index of the first element (inclusive).
375     * @param toIndex Index of the last element (exclusive).
376     * @param k Index.
377     * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
378     * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
379     * sub-range {@code [fromIndex, toIndex)}
380     */
381    public static void select(int[] a, int fromIndex, int toIndex, int k) {
382        IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
383        IndexSupport.checkIndex(fromIndex, toIndex, k);
384        if (toIndex - fromIndex <= 1) {
385            return;
386        }
387        QuickSelect.select(a, fromIndex, toIndex - 1, k);
388    }
389
390    /**
391     * Partition the array such that indices {@code k} correspond to their correctly
392     * sorted value in the equivalent fully sorted array.
393     *
394     * @param a Values.
395     * @param fromIndex Index of the first element (inclusive).
396     * @param toIndex Index of the last element (exclusive).
397     * @param k Indices (may be destructively modified).
398     * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
399     * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
400     * sub-range {@code [fromIndex, toIndex)}
401     */
402    public static void select(int[] a, int fromIndex, int toIndex, int[] k) {
403        IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
404        IndexSupport.checkIndices(fromIndex, toIndex, k);
405        if (k.length == 0 || toIndex - fromIndex <= 1) {
406            return;
407        }
408        QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length);
409    }
410}