【问题标题】:Add a new row on basis of column values in R根据 R 中的列值添加新行
【发布时间】:2021-08-26 08:05:16
【问题描述】:

我试图在 R 中了解这个简单的预处理任务。我试图将理想值列作为产品 ID 中标题为理想的行。我认为下面的图片会更清楚地说明它。

> dput(df)

structure(list(Consumer = c(43L, 43L, 43L, 43L, 43L, 41L, 41L, 
41L, 41L, 41L), Product = c(106L, 992L, 366L, 257L, 548L, 106L, 
992L, 366L, 257L, 548L), Firm = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 
0L, 0L, 0L), Juicy = c(1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L
), Sweet = c(0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L), Ideal_Firm = c(1L, 
1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L), Ideal_Juicy = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Ideal_Sweet = c(1L, 1L, 1L, 
1L, 1L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-10L))

【问题讨论】:

  • 请使用dput() 并发布您的数据框样本。它会让事情变得更容易。
  • 您好,感谢您的强调。我已经更新了 dput()

标签: r dataframe data-science preprocessor


【解决方案1】:

以下是一个解决方案:

df <- data.frame(
  Consumer = c(rep(43, 5), rep(41, 5)),
  Product = rep(sample(100:900,size = 5, replace = F), 2),
  Firm = c(sample(rep(0:1, 5), replace = T)),
  Juicy = c(sample(rep(0:1, 5), replace = T)),
  Sweet = c(sample(rep(0:1, 5), replace = T)),
  Ideal_Firm = 1, 
  Ideal_Juicy = c(rep(1, 5), rep(2, 5)), 
  Ideal_Sweet = c(rep(1, 5), rep(0, 5))
)

library(dplyr)
df <- merge(
  # Bind the observation...
  df %>% select(Consumer:Sweet) %>% 
    pivot_wider(id_cols = Consumer,names_from = Product,values_from = Firm:Sweet),
  # ... to the ideal
  df %>% group_by(Consumer) %>% 
    # Here I put mean, but it could be 1, median, min, max... If I understood correctly, it has to be 1?
    summarise(across(Ideal_Firm:Ideal_Sweet, ~mean(.x))) %>%
    # Rename so the column name has the form [characteristic]_ideal instead of Ideal_[characteristic]
    # remove prefix Ideal_ ...
    rename_at(.vars = vars(starts_with("Ideal_")),
              .funs = funs(sub("Ideal_", "", .))) %>%
    # ... add _Ideal as a suffix instead
    rename_at(vars(-Consumer), function(x) paste0(x,"_Ideal"))
)

# Then manipulate to get into long form again
df <- df %>% pivot_longer(cols = !Consumer) %>%
  separate(name, c("Characteristic", "Product")) %>%
  pivot_wider(id_cols = Consumer:Product, names_from = Characteristic, values_from = value)
df       

【讨论】:

  • 您好 Rosalie,出色的回答只是缺少一件事,每个客户的理想值都是相同的,但总体上确实不同。我在想是否有一些条件。
  • 您好 Inshal,我将输入数据集修改为具有不同的“理想”,它仍然有效!我在做理想数据集时 group_by(Consumer),因此它为每个消费者提供了不同的 mean()。请参阅上面的编辑答案。我误解你的问题了吗?
  • 这看起来不错。生病看看它是否适用于我的主要数据框。非常感谢,罗莎莉!!
  • 不客气!我添加了几行将前缀“Ideal_”更改为后缀“_Ideal”,因此它与您的数据集列名称匹配。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-06-11
  • 1970-01-01
相关资源
最近更新 更多