如何通过匹配另一个数据框来填充数据框列值？答案

【问题标题】：How to fill in data frame column values by matching another data frame?如何通过匹配另一个数据框来填充数据框列值？
【发布时间】：2021-04-07 20:15:44
【问题描述】：

假设我有一个带有 x 和 y 坐标的数据框，如下所示：

          x        y
1  3.984804 4.470310
2  3.985005 4.470310
3  3.985071 4.470310
4  3.985262 4.469213
5  3.985262 4.469213
6  3.985262 4.469213
7  3.985001 4.471442
8  3.985001 4.471759
9  3.984981 4.472782
10 3.985001 4.478800

输入输出：

structure(list(x = c(3.98480399, 3.98500453380952, 3.98507138190476,
3.98526204428571, 3.98526204428571, 3.98526204428571, 3.98500133714286,
3.98500133714286, 3.98498099190476, 3.98500133714286), y = c(4.47030988428572,
4.47030988428572, 4.47030988428572, 4.46921270476191, 4.46921270476191,
4.46921270476191, 4.47144165047619, 4.47175932380952, 4.47278151761905,
4.47880045571429)), numFrames = 68418L, fps = 50, units = "mm", timeUnits = "s", row.names = c(NA,
10L), class = c("Trajectory", "data.frame"))

我还有另一个坐标如下的数据框：

          x1       y1
1  0.1466667 3.053333
2  0.1466667 3.446667
3  0.1466667 3.753333
4  0.1933333 4.053333
5  0.2800000 4.400000
6  0.4066667 4.653333
7  0.5400000 4.920000
8  0.7133333 5.193333
9  0.8400000 5.366667
10 8.2133333 5.233333
11 8.3733333 5.066667
12 8.5133333 4.853333
13 8.6866667 4.613333
14 8.7933333 4.440000
15 8.9066667 4.180000
16 9.0066667 3.526667
17 9.1200000 3.513333
18 9.1533333 3.046667
19 9.1400000 2.880000

输入输出：

structure(list(x1 = c(0.146666666666667, 0.146666666666667, 0.146666666666667,
0.193333333333333, 0.28, 0.406666666666667, 0.54, 0.713333333333333,
0.84, 8.21333333333333, 8.37333333333333, 8.51333333333333, 8.68666666666667,
8.79333333333333, 8.90666666666667, 9.00666666666667, 9.12, 9.15333333333333,
9.14), y1 = c(3.05333333333333, 3.44666666666667, 3.75333333333333,
4.05333333333333, 4.4, 4.65333333333333, 4.92, 5.19333333333333,
5.36666666666667, 5.23333333333333, 5.06666666666667, 4.85333333333333,
4.61333333333333, 4.44, 4.18, 3.52666666666667, 3.51333333333333,
3.04666666666667, 2.88)), class = "data.frame", row.names = c(NA,
-19L))

我想在第一个数据帧中添加一列，其中新列是第二个数据帧的 y1 值，数据帧之间的 x 值最接近。

例如第一行是：

          x        y        y1
1  3.984804 4.470310 4.4653333

因为第二个数据帧的x1的第6行最接近第一个数据帧的x，所以加上y值。

【问题讨论】：

我认为第二个数据集中的 9 0.8400000 5.366667 与 3.984804 最接近，而不是您选择的 6 0.4066667 4.653333。或者第二个数据中的某些 x 值是否应该高 10 倍？

标签： r

【解决方案1】：


d1 <- structure(list(x = c(3.98480399, 3.98500453380952, 3.98507138190476,
3.98526204428571, 3.98526204428571, 3.98526204428571, 3.98500133714286,
3.98500133714286, 3.98498099190476, 3.98500133714286), y = c(4.47030988428572,
4.47030988428572, 4.47030988428572, 4.46921270476191, 4.46921270476191,
4.46921270476191, 4.47144165047619, 4.47175932380952, 4.47278151761905,
4.47880045571429)), numFrames = 68418L, fps = 50, units = "mm", timeUnits = "s", row.names = c(NA,
10L), class = c("Trajectory", "data.frame"))

d2 <- structure(list(x1 = c(0.146666666666667, 0.146666666666667, 0.146666666666667,
0.193333333333333, 0.28, 0.406666666666667, 0.54, 0.713333333333333,
0.84, 8.21333333333333, 8.37333333333333, 8.51333333333333, 8.68666666666667,
8.79333333333333, 8.90666666666667, 9.00666666666667, 9.12, 9.15333333333333,
9.14), y1 = c(3.05333333333333, 3.44666666666667, 3.75333333333333,
4.05333333333333, 4.4, 4.65333333333333, 4.92, 5.19333333333333,
5.36666666666667, 5.23333333333333, 5.06666666666667, 4.85333333333333,
4.61333333333333, 4.44, 4.18, 3.52666666666667, 3.51333333333333,
3.04666666666667, 2.88)), class = "data.frame", row.names = c(NA,
                                                              -19L))



## calculate diff from all of x to all of x1:
dm <- abs(outer( d1$x, d2$x1, FUN="-" ))

## find closest per row:
i.closest <- apply( dm, 1, which.min )

d1$y1 <- d2$y1[i.closest]

【讨论】：

【解决方案2】：

您可以尝试max.col+ outer 如下所示

d1$y1 <- d2$y1[max.col(-abs(outer(d1$x, d2$x1, "-")))]

【讨论】：