View Javadoc
1   /*
2    * Licensed to the Apache Software Foundation (ASF) under one or more
3    * contributor license agreements.  See the NOTICE file distributed with
4    * this work for additional information regarding copyright ownership.
5    * The ASF licenses this file to You under the Apache License, Version 2.0
6    * (the "License"); you may not use this file except in compliance with
7    * the License.  You may obtain a copy of the License at
8    *
9    *      http://www.apache.org/licenses/LICENSE-2.0
10   *
11   * Unless required by applicable law or agreed to in writing, software
12   * distributed under the License is distributed on an "AS IS" BASIS,
13   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14   * See the License for the specific language governing permissions and
15   * limitations under the License.
16   */
17  
18  package org.apache.commons.numbers.arrays;
19  
20  /**
21   * Select indices in array data.
22   *
23   * <p>Arranges elements such that indices {@code k} correspond to their correctly
24   * sorted value in the equivalent fully sorted array. For all indices {@code k}
25   * and any index {@code i}:
26   *
27   * <pre>{@code
28   * data[i < k] <= data[k] <= data[k < i]
29   * }</pre>
30   *
31   * <p>Examples:
32   *
33   * <pre>
34   * data    [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7]
35   *
36   *
37   * k=4   : [0, 1, 2, 1], [2], [5, 2, 3, 3, 6, 7, 7, 7, 7]
38   * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [6, 7, 7, 7, 7]
39   * </pre>
40   *
41   * <p>This implementation can select on multiple indices and will handle duplicate and
42   * unordered indices. The method detects ordered indices (with or without duplicates) and
43   * uses this during processing. Passing ordered indices is recommended if the order is already
44   * known; for example using uniform spacing through the array data, or to select the top and
45   * bottom {@code n} values from the data.
46   *
47   * <p>A quickselect adaptive method is used for single indices. This uses analysis of the
48   * partition sizes after each division to update the algorithm mode. If the partition
49   * containing the target does not sufficiently reduce in size then the algorithm is
50   * progressively changed to use partitions with guaranteed margins. This ensures a set fraction
51   * of data is eliminated each step and worse-case linear run time performance. This method can
52   * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of
53   * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at
54   * the edge of the partition bounded by {@code ka}.
55   *
56   * <p>Multiple keys are partitioned collectively using an introsort method which only recurses
57   * into partitions containing indices. Excess recursion will trigger use of a heapselect
58   * on the remaining range of indices ensuring non-quadratic worse case performance. Any
59   * partition containing a single index, adjacent pair of indices, or range of indices with a
60   * small separation will use the quickselect adaptive method for single keys. Note that the
61   * maximum number of times that {@code n} indices can be split is {@code n - 1} before all
62   * indices are handled as singles.
63   *
64   * <p>Floating-point order
65   *
66   * <p>The {@code <} relation does not impose a total order on all floating-point values.
67   * This class respects the ordering imposed by {@link Double#compare(double, double)}.
68   * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is
69   * considered greater than any other value; and all {@code Double.NaN} values are
70   * considered equal.
71   *
72   * <p>References
73   *
74   * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n}
75   * using repeat division of the data around a partition element, recursing into the
76   * partition that contains {@code k}.
77   *
78   * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in
79   * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst
80   * case bound.
81   *
82   * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is
83   * performed in the SELECT algorithm of Floyd and Rivest [3, 4].
84   *
85   * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses
86   * the median of medians as a partition element for selection which ensures a minimum fraction of
87   * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice
88   * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of
89   * Alexandrescu [6].
90   *
91   * <ol>
92   * <li>Hoare (1961)
93   * Algorithm 65: Find
94   * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a>
95   * <li>Musser (1999)
96   * Introspective Sorting and Selection Algorithms
97   * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23">
98   * Software: Practice and Experience 27, 983-993.</a>
99   * <li>Floyd and Rivest (1975)
100  * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements.
101  * Comm. ACM. 18 (3): 173.
102  * <li>Kiwiel (2005)
103  * On Floyd and Rivest's SELECT algorithm.
104  * Theoretical Computer Science 347, 214-238.
105  * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973)
106  * Time bounds for selection.
107  * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9">
108  * Journal of Computer and System Sciences. 7 (4): 448–461</a>.
109  * <li>Alexandrescu (2016)
110  * Fast Deterministic Selection
111  * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>.
112  * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a>
113  * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a>
114  * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a>
115  * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a>
116  * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a>
117  * </ol>
118  *
119  * @since 1.2
120  */
121 public final class Selection {
122 
123     /** No instances. */
124     private Selection() {}
125 
126     /**
127      * Partition the array such that index {@code k} corresponds to its correctly
128      * sorted value in the equivalent fully sorted array.
129      *
130      * @param a Values.
131      * @param k Index.
132      * @throws IndexOutOfBoundsException if index {@code k} is not within the
133      * sub-range {@code [0, a.length)}
134      */
135     public static void select(double[] a, int k) {
136         IndexSupport.checkIndex(0, a.length, k);
137         doSelect(a, 0, a.length, k);
138     }
139 
140     /**
141      * Partition the array such that indices {@code k} correspond to their correctly
142      * sorted value in the equivalent fully sorted array.
143      *
144      * @param a Values.
145      * @param k Indices (may be destructively modified).
146      * @throws IndexOutOfBoundsException if any index {@code k} is not within the
147      * sub-range {@code [0, a.length)}
148      */
149     public static void select(double[] a, int[] k) {
150         IndexSupport.checkIndices(0, a.length, k);
151         doSelect(a, 0, a.length, k);
152     }
153 
154     /**
155      * Partition the array such that index {@code k} corresponds to its correctly
156      * sorted value in the equivalent fully sorted array.
157      *
158      * @param a Values.
159      * @param fromIndex Index of the first element (inclusive).
160      * @param toIndex Index of the last element (exclusive).
161      * @param k Index.
162      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
163      * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
164      * sub-range {@code [fromIndex, toIndex)}
165      */
166     public static void select(double[] a, int fromIndex, int toIndex, int k) {
167         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
168         IndexSupport.checkIndex(fromIndex, toIndex, k);
169         doSelect(a, fromIndex, toIndex, k);
170     }
171 
172     /**
173      * Partition the array such that indices {@code k} correspond to their correctly
174      * sorted value in the equivalent fully sorted array.
175      *
176      * @param a Values.
177      * @param fromIndex Index of the first element (inclusive).
178      * @param toIndex Index of the last element (exclusive).
179      * @param k Indices (may be destructively modified).
180      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
181      * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
182      * sub-range {@code [fromIndex, toIndex)}
183      */
184     public static void select(double[] a, int fromIndex, int toIndex, int[] k) {
185         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
186         IndexSupport.checkIndices(fromIndex, toIndex, k);
187         doSelect(a, fromIndex, toIndex, k);
188     }
189 
190     /**
191      * Partition the array such that index {@code k} corresponds to its correctly
192      * sorted value in the equivalent fully sorted array.
193      *
194      * <p>This method pre/post-processes the data and indices to respect the ordering
195      * imposed by {@link Double#compare(double, double)}.
196      *
197      * @param fromIndex Index of the first element (inclusive).
198      * @param toIndex Index of the last element (exclusive).
199      * @param a Values.
200      * @param k Index.
201      */
202     private static void doSelect(double[] a, int fromIndex, int toIndex, int k) {
203         if (toIndex - fromIndex <= 1) {
204             return;
205         }
206         // Sort NaN / count signed zeros.
207         // Caution: This loop contributes significantly to the runtime.
208         int cn = 0;
209         int end = toIndex;
210         for (int i = toIndex; --i >= fromIndex;) {
211             final double v = a[i];
212             // Count negative zeros using a sign bit check
213             if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
214                 cn++;
215                 // Change to positive zero.
216                 // Data must be repaired after selection.
217                 a[i] = 0.0;
218             } else if (v != v) {
219                 // Move NaN to end
220                 a[i] = a[--end];
221                 a[end] = v;
222             }
223         }
224 
225         // Partition
226         if (end - fromIndex > 1 && k < end) {
227             QuickSelect.select(a, fromIndex, end - 1, k);
228         }
229 
230         // Restore signed zeros
231         if (cn != 0) {
232             // Use partition index below zero to fast-forward to zero as much as possible
233             for (int j = a[k] < 0 ? k : -1;;) {
234                 if (a[++j] == 0) {
235                     a[j] = -0.0;
236                     if (--cn == 0) {
237                         break;
238                     }
239                 }
240             }
241         }
242     }
243 
244     /**
245      * Partition the array such that indices {@code k} correspond to their correctly
246      * sorted value in the equivalent fully sorted array.
247      *
248      * <p>This method pre/post-processes the data and indices to respect the ordering
249      * imposed by {@link Double#compare(double, double)}.
250      *
251      * @param fromIndex Index of the first element (inclusive).
252      * @param toIndex Index of the last element (exclusive).
253      * @param a Values.
254      * @param k Indices (may be destructively modified).
255      */
256     private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) {
257         if (k.length == 0 || toIndex - fromIndex <= 1) {
258             return;
259         }
260         // Sort NaN / count signed zeros.
261         // Caution: This loop contributes significantly to the runtime for single indices.
262         int cn = 0;
263         int end = toIndex;
264         for (int i = toIndex; --i >= fromIndex;) {
265             final double v = a[i];
266             // Count negative zeros using a sign bit check
267             if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
268                 cn++;
269                 // Change to positive zero.
270                 // Data must be repaired after selection.
271                 a[i] = 0.0;
272             } else if (v != v) {
273                 // Move NaN to end
274                 a[i] = a[--end];
275                 a[end] = v;
276             }
277         }
278 
279         // Partition
280         int n = 0;
281         if (end - fromIndex > 1) {
282             n = k.length;
283             // Filter indices invalidated by NaN check
284             if (end < toIndex) {
285                 for (int i = n; --i >= 0;) {
286                     final int index = k[i];
287                     if (index >= end) {
288                         // Move to end
289                         k[i] = k[--n];
290                         k[n] = index;
291                     }
292                 }
293             }
294             // Return n, the count of used indices in k.
295             // Use this to post-process zeros.
296             n = QuickSelect.select(a, fromIndex, end - 1, k, n);
297         }
298 
299         // Restore signed zeros
300         if (cn != 0) {
301             // Use partition indices below zero to fast-forward to zero as much as possible
302             int j = -1;
303             if (n < 0) {
304                 // Binary search on -n sorted indices: hi = (-n) - 1
305                 int lo = 0;
306                 int hi = ~n;
307                 while (lo <= hi) {
308                     final int mid = (lo + hi) >>> 1;
309                     if (a[k[mid]] < 0) {
310                         j = mid;
311                         lo = mid + 1;
312                     } else {
313                         hi = mid - 1;
314                     }
315                 }
316             } else {
317                 // Unsorted, process all indices
318                 for (int i = n; --i >= 0;) {
319                     if (a[k[i]] < 0) {
320                         j = k[i];
321                     }
322                 }
323             }
324             for (;;) {
325                 if (a[++j] == 0) {
326                     a[j] = -0.0;
327                     if (--cn == 0) {
328                         break;
329                     }
330                 }
331             }
332         }
333     }
334 
335     /**
336      * Partition the array such that index {@code k} corresponds to its correctly
337      * sorted value in the equivalent fully sorted array.
338      *
339      * @param a Values.
340      * @param k Index.
341      * @throws IndexOutOfBoundsException if index {@code k} is not within the
342      * sub-range {@code [0, a.length)}
343      */
344     public static void select(int[] a, int k) {
345         IndexSupport.checkIndex(0, a.length, k);
346         if (a.length <= 1) {
347             return;
348         }
349         QuickSelect.select(a, 0, a.length - 1, k);
350     }
351 
352     /**
353      * Partition the array such that indices {@code k} correspond to their correctly
354      * sorted value in the equivalent fully sorted array.
355      *
356      * @param a Values.
357      * @param k Indices (may be destructively modified).
358      * @throws IndexOutOfBoundsException if any index {@code k} is not within the
359      * sub-range {@code [0, a.length)}
360      */
361     public static void select(int[] a, int[] k) {
362         IndexSupport.checkIndices(0, a.length, k);
363         if (k.length == 0 || a.length <= 1) {
364             return;
365         }
366         QuickSelect.select(a, 0, a.length - 1, k, k.length);
367     }
368 
369     /**
370      * Partition the array such that index {@code k} corresponds to its correctly
371      * sorted value in the equivalent fully sorted array.
372      *
373      * @param a Values.
374      * @param fromIndex Index of the first element (inclusive).
375      * @param toIndex Index of the last element (exclusive).
376      * @param k Index.
377      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
378      * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
379      * sub-range {@code [fromIndex, toIndex)}
380      */
381     public static void select(int[] a, int fromIndex, int toIndex, int k) {
382         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
383         IndexSupport.checkIndex(fromIndex, toIndex, k);
384         if (toIndex - fromIndex <= 1) {
385             return;
386         }
387         QuickSelect.select(a, fromIndex, toIndex - 1, k);
388     }
389 
390     /**
391      * Partition the array such that indices {@code k} correspond to their correctly
392      * sorted value in the equivalent fully sorted array.
393      *
394      * @param a Values.
395      * @param fromIndex Index of the first element (inclusive).
396      * @param toIndex Index of the last element (exclusive).
397      * @param k Indices (may be destructively modified).
398      * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
399      * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
400      * sub-range {@code [fromIndex, toIndex)}
401      */
402     public static void select(int[] a, int fromIndex, int toIndex, int[] k) {
403         IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
404         IndexSupport.checkIndices(fromIndex, toIndex, k);
405         if (k.length == 0 || toIndex - fromIndex <= 1) {
406             return;
407         }
408         QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length);
409     }
410 }