基于两个标识符将两行合并为一行答案

【问题标题】：Merge two rows into one based on two identifiers基于两个标识符将两行合并为一行
【发布时间】：2017-10-25 08:46:36
【问题描述】：

我有一个数据集，其中包含不同产品的买卖价格信息。但是，它不是将购买的价格和出售的价格存储在同一行中，而是存储在两个单独的行中，这两个行由买入和卖出的变量标识，如下所示。

Product|Product Type|Price|Bought|Sold
---------------------------------------
Apples |   Green    |  1  |   0  |  1
---------------------------------------
Apples |   Green    |  2  |   1  |  0
---------------------------------------
Apples |   Red      |  3  |   0  |  1
---------------------------------------
Apples |   Red      |  4  |   1  |  0
---------------------------------------

我想把买入和卖出的价格合并成一行，所以看起来有点像这样：

Product|Product Type|Bought Price|Sold Price
---------------------------------------------
Apples |   Green    |      1     |    2
---------------------------------------------
Apples |   Red      |      4     |    3

这是创建我的示例数据集的代码。提前感谢您的帮助。

Product <- c("Apples", "Apples", "Apples", "Apples", "Apples", "Apples",
             "Oranges", "Oranges", "Oranges", "Oranges", "Oranges", "Oranges",
             "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits")
ProductType <- c("Green", "Green", "Red", "Red", "Pink", "Pink",
                 "Big", "Big", "Medium", "Medium", "Small", "Small",
                 "Chocolate", "Chocolate", "Oat", "Oat", "Digestive", "Digestive")


Price <- c(2, 1, 3, 4, 1, 2,
           5, 3, 2, 1, 2, 3,
           6, 4, 1, 8, 6, 2)

Bought <- c(0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1)

Sold <- c(1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0)

sales <- data.frame(Product, ProductType, Price, Bought, Sold)

【问题讨论】：

买/卖中的 1 是coolean yes/no 还是数量指示？
试试sales %>% group_by(Product, ProductType) %>% summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])

标签： r join merge row

【解决方案1】：

使用 dplyr：

library(dplyr)

sales %>% 
  group_by(Product, ProductType) %>% 
  summarise(BoughtPrice = Price[ Bought == 1 ],
            SoldPrice = Price[ Sold == 1 ]) %>% 
  ungroup()

【讨论】：

【解决方案2】：

library(dplyr)
df <- data.frame(Product, ProductType, Price, Bought, Sold)
df %>% group_by(Product, ProductType) %>% 
  summarise(Bought_Price = sum(Price * Bought), 
            Sold_Price = sum(Sold * Price))

# A tibble: 9 x 4
# Groups:   Product [?]
# Product ProductType Bought_Price Sold_Price
# <fctr>      <fctr>        <dbl>      <dbl>
#   1   Apples       Green            1          2
# 2   Apples        Pink            2          1
# 3   Apples         Red            4          3
# 4 Buscuits   Chocolate            4          6
# 5 Buscuits   Digestive            2          6
# 6 Buscuits         Oat            8          1
# 7  Oranges         Big            3          5
# 8  Oranges      Medium            1          2
# 9  Oranges       Small            3          2

【讨论】：

与其他人提出的解决方案略有不同，这很好:)。非常感谢您的回复
当我发布我的解决方案时，我实际上并没有意识到还有其他两种解决方案。我猜他们几乎是一样的:)

【解决方案3】：

对于dplyr，我们按“Product”、“ProductType”和summarise 分组，通过子集“Price”其中“Bought”或“Sold”为 1 来创建“BoughtPrice”和“SoldPrice”

library(dplyr)
sales %>% 
     group_by(Product, ProductType) %>% 
     summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])

data.table 的类似方法是

library(data.table)
setDT(sales)[, lapply(.SD, function(x) Price[x==1]),
                   .(Product, ProductType), .SDcols = Bought:Sold]

【讨论】：