【问题标题】:Merge two rows into one based on two identifiers基于两个标识符将两行合并为一行
【发布时间】:2017-10-25 08:46:36
【问题描述】:

我有一个数据集,其中包含不同产品的买卖价格信息。但是,它不是将购买的价格和出售的价格存储在同一行中,而是存储在两个单独的行中,这两个行由买入和卖出的变量标识,如下所示。

Product|Product Type|Price|Bought|Sold
---------------------------------------
Apples |   Green    |  1  |   0  |  1
---------------------------------------
Apples |   Green    |  2  |   1  |  0
---------------------------------------
Apples |   Red      |  3  |   0  |  1
---------------------------------------
Apples |   Red      |  4  |   1  |  0
---------------------------------------

我想把买入和卖出的价格合并成一行,所以看起来有点像这样:

Product|Product Type|Bought Price|Sold Price
---------------------------------------------
Apples |   Green    |      1     |    2
---------------------------------------------
Apples |   Red      |      4     |    3

这是创建我的示例数据集的代码。提前感谢您的帮助。

Product <- c("Apples", "Apples", "Apples", "Apples", "Apples", "Apples",
             "Oranges", "Oranges", "Oranges", "Oranges", "Oranges", "Oranges",
             "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits", "Buscuits")
ProductType <- c("Green", "Green", "Red", "Red", "Pink", "Pink",
                 "Big", "Big", "Medium", "Medium", "Small", "Small",
                 "Chocolate", "Chocolate", "Oat", "Oat", "Digestive", "Digestive")


Price <- c(2, 1, 3, 4, 1, 2,
           5, 3, 2, 1, 2, 3,
           6, 4, 1, 8, 6, 2)

Bought <- c(0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1,
            0, 1, 0, 1, 0, 1)

Sold <- c(1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0,
          1, 0, 1, 0, 1, 0)

sales <- data.frame(Product, ProductType, Price, Bought, Sold)

【问题讨论】:

  • 买/卖中的 1 是coolean yes/no 还是数量指示?
  • 试试sales %&gt;% group_by(Product, ProductType) %&gt;% summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])

标签: r join merge row


【解决方案1】:

使用 dplyr:

library(dplyr)

sales %>% 
  group_by(Product, ProductType) %>% 
  summarise(BoughtPrice = Price[ Bought == 1 ],
            SoldPrice = Price[ Sold == 1 ]) %>% 
  ungroup()

【讨论】:

    【解决方案2】:
    library(dplyr)
    df <- data.frame(Product, ProductType, Price, Bought, Sold)
    df %>% group_by(Product, ProductType) %>% 
      summarise(Bought_Price = sum(Price * Bought), 
                Sold_Price = sum(Sold * Price))
    
    # A tibble: 9 x 4
    # Groups:   Product [?]
    # Product ProductType Bought_Price Sold_Price
    # <fctr>      <fctr>        <dbl>      <dbl>
    #   1   Apples       Green            1          2
    # 2   Apples        Pink            2          1
    # 3   Apples         Red            4          3
    # 4 Buscuits   Chocolate            4          6
    # 5 Buscuits   Digestive            2          6
    # 6 Buscuits         Oat            8          1
    # 7  Oranges         Big            3          5
    # 8  Oranges      Medium            1          2
    # 9  Oranges       Small            3          2
    

    【讨论】:

    • 与其他人提出的解决方案略有不同,这很好:)。非常感谢您的回复
    • 当我发布我的解决方案时,我实际上并没有意识到还有其他两种解决方案。我猜他们几乎是一样的:)
    【解决方案3】:

    对于dplyr,我们按“Product”、“ProductType”和summarise 分组,通过子集“Price”其中“Bought”或“Sold”为 1 来创建“BoughtPrice”和“SoldPrice”

    library(dplyr)
    sales %>% 
         group_by(Product, ProductType) %>% 
         summarise(BoughtPrice = Price[Bought==1], SoldPrice = Price[Sold ==1])
    

    data.table 的类似方法是

    library(data.table)
    setDT(sales)[, lapply(.SD, function(x) Price[x==1]),
                       .(Product, ProductType), .SDcols = Bought:Sold]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-02-08
      • 2022-01-25
      • 1970-01-01
      • 2014-08-13
      • 1970-01-01
      • 2019-05-11
      相关资源
      最近更新 更多