【问题标题】：Fastest way to create new array with length N and fill it by repeating a given array创建长度为 N 的新数组并通过重复给定数组来填充它的最快方法
【发布时间】：2015-12-03 19:08:55
【问题描述】：

我想分配一个长度为 N 的新数组，并通过重复给定数组来填充它。界面如下所示：

<T> T[] repeat(T[] array, int n);

为了澄清我的意思，这里举个小例子：

String a = {"a", "b", "c"};
//     b = {"a", "b", "c", "a", "b", "c", "a", "b", "c", "a"}
String b = repeat(a, 10);

大多数程序员会提出以下解决方案（为了简化数组生成，选择了特定类型）：

public String[] repeat(String[] array, int n) {
  String[] repeated = new String[n];
  for (int i = 0; i < n; i++) {
    repeated[i] = array[i % array.length];
  }
  return repeated;
}

有更快的方法吗？

【问题讨论】：

标签： java arrays performance repeat

【解决方案1】：

我想出了这个通用的解决方案：

public static <T> T[] repeat(T[] arr, int newLength) {
  T[] dup = Arrays.copyOf(arr, newLength);
  for (long last = arr.length; last != 0 && last < newLength; last <<= 1) {
    System.arraycopy(dup, 0, dup, (int) last, (int) (Math.min(last << 1, newLength) - last));
  }
  return dup;
}

理论

System.arraycopy 是本机调用。因此它非常快，但并不意味着它是最快的方式。

所有其他解决方案都会逐个元素地复制数组。我的解决方案复制了更大的块。每次迭代都会复制数组中的现有元素，这意味着循环最多会运行 log2(n) 次。

分析报告

输入

TEST_ARRAY = { "a", "b", "c", "d", "e", "f" }
NEW_LENGTH = 10000

这是我重现结果的基准代码：

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 10, timeUnit = TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Threads(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.NANOSECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark {
  
  private static final String[] TEST_ARRAY = { "a", "b", "c", "d", "e", "f" };
  private static final int NEW_LENGTH = 10_000;
  
  @Benchmark
  public String[] testNativeCall() {
    String[] dup = Arrays.copyOf(TEST_ARRAY, NEW_LENGTH);
    for (int last = TEST_ARRAY.length; last != 0 && last < NEW_LENGTH; last <<= 1) {
      System.arraycopy(dup, 0, dup, last, Math.min(last << 1, NEW_LENGTH) - last);
    }
    return dup;
  }
  
  @Benchmark
  public String[] testLoopModulo() {
    String[] arr = new String[NEW_LENGTH];
    for (int i = 0; i < NEW_LENGTH; i++) {
      arr[i] = arr[i % TEST_ARRAY.length];
    }
    return arr;
  }
  
  @Benchmark
  public String[] testArrayList() {
    List<String> initialLetters = Arrays.asList(TEST_ARRAY);
    List<String> results = new ArrayList<>();
    int indexOfLetterToAdd = 0;
    for (int i = 0; i < 10000; i++) {
      results.add(initialLetters.get(indexOfLetterToAdd++));
      if (indexOfLetterToAdd == initialLetters.size()) {
        indexOfLetterToAdd = 0;
      }
    }
    return results.toArray(new String[results.size()]);
  }
  
  @Benchmark
  public String[] testLoopReset() {
    String result[] = new String[NEW_LENGTH];
    for (int i = 0, j = 0; i < NEW_LENGTH && j < TEST_ARRAY.length; i++, j++) {
      result[i] = TEST_ARRAY[j];
      if (j == TEST_ARRAY.length - 1) {
        j = -1;
      }
    }
    return result;
  }
  
  @Benchmark
  public String[] testStream() {
    String[] result = Stream.iterate(TEST_ARRAY, x -> x).flatMap(x -> Stream.of(TEST_ARRAY)).limit(NEW_LENGTH)
        .toArray(String[]::new);
    return result;
  }
}

结果

Benchmark                    Mode  Cnt      Score      Error  Units
MyBenchmark.testNativeCall   avgt   30   4154,553 ±   11,242  ns/op
MyBenchmark.testLoopModulo   avgt   30  19273,717 ±  235,547  ns/op
MyBenchmark.testArrayList    avgt   30  71079,139 ± 2686,136  ns/op
MyBenchmark.testLoopReset    avgt   30  18307,368 ±  202,520  ns/op
MyBenchmark.testStream       avgt   30  68898,278 ± 2488,104  ns/op

如您所见，本机调用方法是重复数组的最快方式。

其他结果

我还被要求用各种输入对这些方法进行基准测试。

输入范围

// Array size not fixed anymore - filled with random elements
SIZE = { 100, 1000, 100000, 1000000 }
NEW_LENGTH = { 100, 1000, 100000, 1000000 }

这意味着有SIZE x NEW_LENGTH 测试，结果如下：

Benchmark                   (NEW_LENGTH)   (SIZE)  Mode  Cnt        Score        Error  Units
MyBenchmark.testArrayList            100      100  avgt   30      706,274 ±      6,787  ns/op
MyBenchmark.testArrayList            100     1000  avgt   30      692,586 ±     15,076  ns/op
MyBenchmark.testArrayList            100   100000  avgt   30      685,214 ±      6,747  ns/op
MyBenchmark.testArrayList            100  1000000  avgt   30      685,333 ±      5,493  ns/op
MyBenchmark.testArrayList           1000      100  avgt   30     7170,897 ±     63,221  ns/op
MyBenchmark.testArrayList           1000     1000  avgt   30     7180,612 ±     93,280  ns/op
MyBenchmark.testArrayList           1000   100000  avgt   30     6818,585 ±    197,859  ns/op
MyBenchmark.testArrayList           1000  1000000  avgt   30     6810,614 ±    139,456  ns/op
MyBenchmark.testArrayList         100000      100  avgt   30   597614,173 ±   6446,318  ns/op
MyBenchmark.testArrayList         100000     1000  avgt   30   580696,750 ±   5141,845  ns/op
MyBenchmark.testArrayList         100000   100000  avgt   30   598657,608 ±   5126,519  ns/op
MyBenchmark.testArrayList         100000  1000000  avgt   30   595529,027 ±   4981,095  ns/op
MyBenchmark.testArrayList        1000000      100  avgt   30  6836746,484 ±  38848,467  ns/op
MyBenchmark.testArrayList        1000000     1000  avgt   30  6745066,786 ±  57971,469  ns/op
MyBenchmark.testArrayList        1000000   100000  avgt   30  7130391,072 ±  50583,914  ns/op
MyBenchmark.testArrayList        1000000  1000000  avgt   30  8791342,042 ± 172323,938  ns/op
MyBenchmark.testLoopModulo           100      100  avgt   30      301,252 ±      1,195  ns/op
MyBenchmark.testLoopModulo           100     1000  avgt   30      301,988 ±      2,056  ns/op
MyBenchmark.testLoopModulo           100   100000  avgt   30      299,892 ±      1,776  ns/op
MyBenchmark.testLoopModulo           100  1000000  avgt   30      300,468 ±      2,569  ns/op
MyBenchmark.testLoopModulo          1000      100  avgt   30     3277,018 ±     14,880  ns/op
MyBenchmark.testLoopModulo          1000     1000  avgt   30     3275,648 ±     21,742  ns/op
MyBenchmark.testLoopModulo          1000   100000  avgt   30     3258,570 ±     27,360  ns/op
MyBenchmark.testLoopModulo          1000  1000000  avgt   30     3259,617 ±     28,747  ns/op
MyBenchmark.testLoopModulo        100000      100  avgt   30   321483,331 ±   4320,938  ns/op
MyBenchmark.testLoopModulo        100000     1000  avgt   30   326319,662 ±   2419,602  ns/op
MyBenchmark.testLoopModulo        100000   100000  avgt   30   327027,966 ±   3174,011  ns/op
MyBenchmark.testLoopModulo        100000  1000000  avgt   30   319201,057 ±   4472,220  ns/op
MyBenchmark.testLoopModulo       1000000      100  avgt   30  3053122,364 ±  31814,342  ns/op
MyBenchmark.testLoopModulo       1000000     1000  avgt   30  3134151,676 ± 108227,023  ns/op
MyBenchmark.testLoopModulo       1000000   100000  avgt   30  3220082,188 ±  43925,401  ns/op
MyBenchmark.testLoopModulo       1000000  1000000  avgt   30  3204777,236 ±  25365,542  ns/op
MyBenchmark.testLoopReset            100      100  avgt   30      159,828 ±      1,107  ns/op
MyBenchmark.testLoopReset            100     1000  avgt   30      125,461 ±      0,881  ns/op
MyBenchmark.testLoopReset            100   100000  avgt   30      129,912 ±      7,801  ns/op
MyBenchmark.testLoopReset            100  1000000  avgt   30      134,503 ±      7,602  ns/op
MyBenchmark.testLoopReset           1000      100  avgt   30     1809,207 ±     93,642  ns/op
MyBenchmark.testLoopReset           1000     1000  avgt   30     1728,705 ±     70,808  ns/op
MyBenchmark.testLoopReset           1000   100000  avgt   30     1354,887 ±      9,631  ns/op
MyBenchmark.testLoopReset           1000  1000000  avgt   30     1350,327 ±     15,886  ns/op
MyBenchmark.testLoopReset         100000      100  avgt   30   159680,209 ±   2477,183  ns/op
MyBenchmark.testLoopReset         100000     1000  avgt   30   162030,985 ±   1949,660  ns/op
MyBenchmark.testLoopReset         100000   100000  avgt   30   149299,890 ±   1516,486  ns/op
MyBenchmark.testLoopReset         100000  1000000  avgt   30   136059,242 ±   3090,410  ns/op
MyBenchmark.testLoopReset        1000000      100  avgt   30  1407369,992 ±  12979,717  ns/op
MyBenchmark.testLoopReset        1000000     1000  avgt   30  1447001,173 ±  14979,769  ns/op
MyBenchmark.testLoopReset        1000000   100000  avgt   30  1463913,706 ±  12564,617  ns/op
MyBenchmark.testLoopReset        1000000  1000000  avgt   30  1404701,860 ±  21587,436  ns/op
MyBenchmark.testNativeCall           100      100  avgt   30       58,306 ±      0,669  ns/op
MyBenchmark.testNativeCall           100     1000  avgt   30       57,441 ±      0,590  ns/op
MyBenchmark.testNativeCall           100   100000  avgt   30       57,595 ±      0,386  ns/op
MyBenchmark.testNativeCall           100  1000000  avgt   30       60,196 ±      1,995  ns/op
MyBenchmark.testNativeCall          1000      100  avgt   30      450,808 ±      8,259  ns/op
MyBenchmark.testNativeCall          1000     1000  avgt   30      558,079 ±      5,724  ns/op
MyBenchmark.testNativeCall          1000   100000  avgt   30      557,246 ±      4,873  ns/op
MyBenchmark.testNativeCall          1000  1000000  avgt   30      565,005 ±      9,696  ns/op
MyBenchmark.testNativeCall        100000      100  avgt   30    73074,811 ±   3332,432  ns/op
MyBenchmark.testNativeCall        100000     1000  avgt   30    70970,603 ±   2693,394  ns/op
MyBenchmark.testNativeCall        100000   100000  avgt   30    69907,864 ±   2945,072  ns/op
MyBenchmark.testNativeCall        100000  1000000  avgt   30    74041,205 ±   2599,841  ns/op
MyBenchmark.testNativeCall       1000000      100  avgt   30   790679,353 ±  15672,480  ns/op
MyBenchmark.testNativeCall       1000000     1000  avgt   30   812660,137 ±  25490,999  ns/op
MyBenchmark.testNativeCall       1000000   100000  avgt   30   838094,181 ±  12374,194  ns/op
MyBenchmark.testNativeCall       1000000  1000000  avgt   30   925567,535 ±  19091,943  ns/op
MyBenchmark.testStream               100      100  avgt   30      810,262 ±     54,519  ns/op
MyBenchmark.testStream               100     1000  avgt   30     1344,998 ±     14,792  ns/op
MyBenchmark.testStream               100   100000  avgt   30   159901,562 ±   3453,210  ns/op
MyBenchmark.testStream               100  1000000  avgt   30  1407506,571 ± 419985,287  ns/op
MyBenchmark.testStream              1000      100  avgt   30     6464,099 ±    169,665  ns/op
MyBenchmark.testStream              1000     1000  avgt   30     5869,457 ±    260,297  ns/op
MyBenchmark.testStream              1000   100000  avgt   30   165394,656 ±   4943,362  ns/op
MyBenchmark.testStream              1000  1000000  avgt   30  1352900,959 ± 412849,634  ns/op
MyBenchmark.testStream            100000      100  avgt   30   423531,274 ±   3944,801  ns/op
MyBenchmark.testStream            100000     1000  avgt   30   391727,181 ±   5341,826  ns/op
MyBenchmark.testStream            100000   100000  avgt   30   427462,700 ±   7517,953  ns/op
MyBenchmark.testStream            100000  1000000  avgt   30   981304,769 ±  10206,849  ns/op
MyBenchmark.testStream           1000000      100  avgt   30  4528465,859 ±  72959,405  ns/op
MyBenchmark.testStream           1000000     1000  avgt   30  4121720,516 ±  60283,781  ns/op
MyBenchmark.testStream           1000000   100000  avgt   30  5920334,609 ±  63051,631  ns/op
MyBenchmark.testStream           1000000  1000000  avgt   30  6227476,270 ±  84066,493  ns/op

正如预期的那样，本地调用总是在前面（比循环版本快大约 2 倍）。

【讨论】：

对于非常大的数组大小有一点整数溢出：尝试repeat(new Object[2], (1 << 30) + 1)（byte[] 数组需要更少的内存，但您的实现不支持原始数组）。解决方法是对last 使用long 类型（并在必要时转换为int）。这也适用于您的other answer。
@Roman 你说得有道理，但我认为使用非常大数组的人应该意识到这种溢出的可能性。
@Flown 我不明白，你为什么不想修复代码？
@Roman 您是否总是在 for 循环中使用 long，因为它可能会溢出？
@Flown 如果我注意到有溢出的可能性，那么我通常会采取措施，无论是使用更长的类型还是一些显式的处理。

【解决方案2】：

您不需要在每个循环上都 Min，您可以在之后提取它。您还可以在每个循环中将偏移量加倍。

但是，我的似乎并没有明显更快，有时更快，有时不是，但它是 Min 调用的替代方案。

Benchmark                          Mode  Samples     Score  Score error  Units
c.e.MyBenchmark.testNativeCall     avgt       30  4027.890       26.544  ns/op
c.e.MyBenchmark.westonsRepeat      avgt       30  4035.817       49.168  ns/op

代码：

@Benchmark
public String[] westonsRepeat() {
    String[] sourceArray = TEST_ARRAY;
    int newLength = NEW_LENGTH;

    int offset = sourceArray.length;
    String[] dup = Arrays.copyOf(sourceArray, newLength);
    if (offset == 0 || offset >= newLength) return dup;

    int next = offset << 1;
    while (next < newLength) {
        System.arraycopy(dup, 0, dup, offset, offset);
        offset = next;
        next <<= 1;
    }
    System.arraycopy(dup, 0, dup, offset, newLength - offset);
    return dup;
}

我尝试的另一件事是计算我可以安全翻倍的次数，虽然这是一些死胡同，但我认为展示出来很有趣：

int timesCanDouble = (int)(log2(newLength) - log2(offset));
=>
int timesCanDouble = (int)(log2(newLength/offset));
=>
int timesCanDouble = 31 - numberOfLeadingZeros(newLength/offset));

【讨论】：

感谢您的努力。我有同样的想法，但编译器足够聪明，可以内联最小计算代码。所以不管你怎么做。