如果包含字符串，则按一列分组并获取 R 中另一列的最大值答案

【问题标题】：Groupby one column if string contained and get maximum values of another column in R如果包含字符串，则按一列分组并获取 R 中另一列的最大值
【发布时间】：2021-10-05 16:40:29
【问题描述】：

给定一个数据框如下：

df <- structure(list(city = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 
1L), .Label = c("bj", "sh"), class = "factor"), type = structure(c(3L, 
1L, 3L, 1L, 4L, 2L, 4L, 2L), .Label = c("buy_area", "buy_price", 
"sale_area", "sale_price"), class = "factor"), value = c(1200L, 
800L, 1900L, 1500L, 15L, 10L, 17L, 9L)), class = "data.frame", row.names = c(NA, 
-8L))

输出：

如何从value 列中获取两种type 的最大值：分别包含area 和price。

预期结果将是两个值：area 为 1900，price 为 17。

到 groupby type 并获得最多 values 我们可以使用：

ddply(df, .(variable), summarise, max.value = max(value))

更新：@det 解决方案的输出：

【问题讨论】：

标签： r dplyr tidyverse plyr

【解决方案1】：

试试这个：

df %>% separate(type,c("type","area")) %>% group_by(area) %>% filter(value == max(value,na.rm = TRUE))

【讨论】：

c("type","area"), 还是c("price","area")?
我得到的数据框是 sale_area ， buy_area 等...我认为当您指定只需要显示最大值时，类型列是无关紧要的。所以我把它们分开了。
虽然这段代码 sn-p 可以解决问题，但including an explanation 确实有助于提高帖子的质量。请记住，您是在为将来的读者回答问题，而这些人可能不知道您提出代码建议的原因。
适当地注意到你的建议，谢谢@gerhard

【解决方案2】：

更新：这个更简洁（这是对 Ronak Shah 答案的一个小修改：

df %>% 
    separate(type, c("sale_buy", "area_price")) %>% 
    group_by(area_price) %>% 
    summarise(max = max(value))

输出：

  area_price   max
  <chr>      <int>
1 area        1900
2 price         17

第一个答案：一种方法可能是：

library(dplyr)
df %>% 
    group_by(type) %>% 
    summarise(max = max(value)) %>% 
    filter(grepl("sale", type))

输出：

  type         max
  <fct>      <int>
1 sale_area   1900
2 sale_price    17

【讨论】：

我认为我们应该使用 grepl area 和 price，而不是 sale，不是吗？
使用filter(grepl("sale", type))，我们搜索包含“sale”作为字符串的值，并将这些行保留在类型列中。我认为这是预期的输出。如果您删除过滤器线，您可能会知道为什么会这样。无论如何，您可以根据需要进行调整。顺便说一个好问题（upvote！）

【解决方案3】：

将type 列分成两列，按组求最大值。

library(dplyr)
library(tidyr)

df %>%
  separate(type, c('type', 'col'), sep = '_') %>%
  group_by(col) %>%
  summarise(value = max(value, na.rm = TRUE))

#  col   value
#  <chr> <int>
#1 area   1900
#2 price    17

您还可以从type 中提取'area' 或'price' 并将其用作分组列。

df %>%
  group_by(type = stringr::str_extract(type, 'area|price')) %>%
  summarise(value = max(value, na.rm = TRUE))

【讨论】：

谢谢，但我想在代码中使用area 和price，因为在我的真实数据中，没有分隔符_。 :(
在这种情况下，您可以从type 列中提取面积或价格值。查看更新的答案。
非常简洁漂亮的答案，谢谢。

【解决方案4】：

创建将type 分类为区域或价格的列并按该列分组：

df %>%
  mutate(
    type2 = case_when(
      str_detect(type, "_area$") ~ "area",
      str_detect(type, "_price$") ~ "price",
      TRUE ~ NA_character_
    )
  ) %>%
  group_by(type2) %>%
  summarise(max_value = max(value))

输出：

  type2 max_value
  <chr>     <int>
1 area       1900
2 price        17

【讨论】：

好像只产生一个最大值。
它返回data.frame，其中包含max_value列中所需组的最大值
请检查帖子中的更新，我怎样才能分别获得面积和价格的最大值？
试试把dplyr::summariseinsted summarise