尝试将数据框随机化两次并将这两个样本都添加到新数据框中答案

【问题标题】：Attempting to randomize a data frame twice and add both of those samples to a new data frame尝试将数据框随机化两次并将这两个样本都添加到新数据框中
【发布时间】：2016-08-17 02:24:41
【问题描述】：

所以我有一个数据框：

> MLSpredictions
        fit    se.fit residual.scale      upr      lwr
1  1.392213 0.1476321              1 1.681572 1.102854
2  1.448370 0.1709856              1 1.783501 1.113238
3  1.392213 0.1476321              1 1.681572 1.102854
4  1.448370 0.1709856              1 1.783501 1.113238
5  1.448370 0.1709856              1 1.783501 1.113238
6  1.448370 0.1709856              1 1.783501 1.113238
7  1.506792 0.1969097              1 1.892734 1.120849
8  1.506792 0.1969097              1 1.892734 1.120849
9  1.567570 0.2253572              1 2.009270 1.125870
10 1.567570 0.2253572              1 2.009270 1.125870
11 1.630800 0.2563338              1 2.133214 1.128386
12 1.448370 0.1709856              1 1.783501 1.113238
13 1.448370 0.1709856              1 1.783501 1.113238
14 1.448370 0.1709856              1 1.783501 1.113238
15 1.506792 0.1969097              1 1.892734 1.120849
16 1.567570 0.2253572              1 2.009270 1.125870
17 1.567570 0.2253572              1 2.009270 1.125870
18 1.567570 0.2253572              1 2.009270 1.125870
19 1.567570 0.2253572              1 2.009270 1.125870

我想对整个数据框进行两次采样，并将这两个样本添加到新的数据框 MLSSeason：

我的尝试是：

MLSSeason[1:19] = sample(MLSpredictions)
MLSSeason[20:38] = sample(MLSpredictions)

但这并没有给我正确的解决方案。理想情况下，MLSSeason 将有 38 行，其中每个 MLSprediction 采样两个。

【问题讨论】：

MLSSeason = sample(MLSpredictions) 有效，但是当我尝试将 MLSSeason 的赛季翻倍时，我遇到了一些麻烦。
使用MLSpredictions[sample(nrow(MLSpredictions)),]
df[sample(1:(nrow(df)*2))/2,] 应该在 df 为 MLSprediction‌s 的情况下工作。它将为您提供行名，指示哪些是重复的，例如，第 4.1 行是第 4 行的重复样本。
@sayaa 有效。非常感谢。你能解释一下那个代码是怎么回事吗？我看到我们从 1:38 开始采样，然后将其除以 2。我不太确定这是如何为我们提供解决方案的。
我们乘以 2 得到 38 行并除以 2 以便我们索引原始数据框中的正确行。

标签： r dataframe sampling

【解决方案1】：

您无法将数据框提供给sample。它不会给您任何错误，但数据框会原封不动地返回。相反，您应该生成行索引。

MLSSeason <- MLSpredictions[c(sample(nrow(MLSpredictions)), sample(nrow(MLSpredictions))), ]

注意，这不等同于：

MLSpredictions[samp‌le(nrow(MLSprediction‌s)),]

你不能有重复的行。

【讨论】：

【解决方案2】：

如果您提供数据框进行采样，它将对数据框的列进行采样，而不是对行进行采样。

以下代码将对每一行采样两次，让您知道哪些行是第一次或第二次采样。

MLSprediction‌s[sample(1:(nrow(MLSprediction‌s)*2))/2,]

它将为您提供信息丰富的行名称，例如，11.1 是行 11 的第二次出现。

          fit    se.fit residual.scale      upr      lwr
16   1.567570 0.2253572              1 2.009270 1.125870
5    1.448370 0.1709856              1 1.783501 1.113238
11   1.630800 0.2563338              1 2.133214 1.128386
15   1.506792 0.1969097              1 1.892734 1.120849
1    1.392213 0.1476321              1 1.681572 1.102854
12   1.448370 0.1709856              1 1.783501 1.113238
11.1 1.630800 0.2563338              1 2.133214 1.128386
7    1.506792 0.1969097              1 1.892734 1.120849

如果您希望样本有一个块形成，例如，保证每行每 19 行采样一次，那么@ZheyuanLi 提供了理想的答案。如果没有，我的回答可能最适合你。

【讨论】：