【发布时间】:2015-07-17 16:35:37
【问题描述】:
假设你有类似的数据
fruits <- data.table(FruitID=c(1,2,3), Fruit=c("Apple", "Banana", "Strawberry"))
colors <- data.table(ColorID=c(1,2,3,4,5), FruitID=c(1,1,1,2,3), Color=c("Red","Yellow","Green","Yellow","Red"))
tastes <- data.table(TasteID=c(1,2,3), FruitID=c(1,1,3), Taste=c("Sweeet", "Sour", "Sweet"))
setkey(fruits, "FruitID")
setkey(colors, "ColorID")
setkey(tastes, "TasteID")
fruits
FruitID Fruit
1: 1 Apple
2: 2 Banana
3: 3 Strawberry
colors
ColorID FruitID Color
1: 1 1 Red
2: 2 1 Yellow
3: 3 1 Green
4: 4 2 Yellow
5: 5 3 Red
tastes
TasteID FruitID Taste
1: 1 1 Sweeet
2: 2 1 Sour
3: 3 3 Sweet
我通常需要对这样的数据执行左外连接。例如,“给我所有的水果和它们的颜色”需要我写(也许有更好的方法?)
setkey(colors, "FruitID")
result <- colors[fruits, allow.cartesian=TRUE]
setkey(colors, "ColorID")
这样一个简单而频繁的任务,三行代码显得多余,所以我写了一个方法myLeftJoin
myLeftJoin <- function(tbl1, tbl2){
# Performs a left join using the key in tbl1 (i.e. keeps all rows from tbl1 and only matching rows from tbl2)
oldkey <- key(tbl2)
setkeyv(tbl2, key(tbl1))
result <- tbl2[tbl1, allow.cartesian=TRUE]
setkeyv(tbl2, oldkey)
return(result)
}
我可以像这样使用
myLeftJoin(fruits, colors)
ColorID FruitID Color Fruit
1: 1 1 Red Apple
2: 2 1 Yellow Apple
3: 3 1 Green Apple
4: 4 2 Yellow Banana
5: 5 3 Red Strawberry
如何扩展此方法,以便可以将任意数量的表传递给它并获得所有表的链式左外连接?类似myLeftJoin(tbl1, ...)
例如,我希望myleftJoin(fruits, colors, tastes) 的结果等于
setkey(colors, "FruitID")
setkey(tastes, "FruitID")
result <- tastes[colors[fruits, allow.cartesian=TRUE], allow.cartesian=TRUE]
setkey(tastes, "TasteID")
setkey(colors, "ColorID")
result
TasteID FruitID Taste ColorID Color Fruit
1: 1 1 Sweeet 1 Red Apple
2: 2 1 Sour 1 Red Apple
3: 1 1 Sweeet 2 Yellow Apple
4: 2 1 Sour 2 Yellow Apple
5: 1 1 Sweeet 3 Green Apple
6: 2 1 Sour 3 Green Apple
7: NA 2 NA 4 Yellow Banana
8: 3 3 Sweet 5 Red Strawberry
也许我错过了使用 data.table 包中的方法的优雅解决方案?谢谢
(编辑:修正了我的数据中的一个错误)
【问题讨论】:
标签: r data.table