【问题标题】:Column name with the min and max values in a dataset in RR中数据集中具有最小值和最大值的列名
【发布时间】:2021-08-02 03:47:51
【问题描述】:

我有这个数据集:

   Year  January February March April   May  June  July August 
   <chr>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>         
 1 2018     45        51    63    61    79    85    88     85         
 2 2017     51        60    65    69    75    82    86     84          
 3 2016     47        55    61    68    72    84    87     85        
... with 20 more rows     

我想得到每行的最小值和最大值对应的月份,以及最大值和最小值之间的差异。这是我的最小值和最大值代码,

x <- colnames(data)[apply(data[,c(2:9)],1,which.max)]
y <- colnames(data)[apply(data[,c(2:9)],1,which.min)]
data$MaxMonth <- x
data$MinMonth <- y

但是,它给了我 Year 作为某些 which.min 函数的输出。

   Year    January February March April May  June  July  August   MaxMonth  MinMonth    Diff
   <chr>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>          
 1 2018     45        51    63    61    79    85    88     85      July      January    43
 2 2017     51        60    65    69    75    82    86     84      July      Year       35
 3 2016     47        55    61    68    72    84    87     85      July      Year       40
... with 20 more rows 

【问题讨论】:

  • 你也应该在colnames中使用[,c(2:9)]

标签: r dataframe max apply min


【解决方案1】:

无需执行 3 个应用功能。你可以这样做:

nms <- names(df)[-1]
n <- seq(nrow(df))
maxMonth = max.col(df[-1])
minMonth = max.col(-df[-1]) 
diff <-  df[-1][cbind(n, maxMonth)] - df[-1][cbind(n, minMonth)]
cbind(df, maxMonth = nms[maxMonth], minMonth = nms[minMonth], diff)

  Year January February March April May June July August maxMonth minMonth diff
1 2018      45       51    63    61  79   85   88     85     July  January   43
2 2017      51       60    65    69  75   82   86     84     July  January   35
3 2016      47       55    61    68  72   84   87     85     July  January   40

【讨论】:

    【解决方案2】:

    我认为对您帖子的评论突出了问题所在

    你应该写

    x <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.max)]
    y <- colnames(data)[2:9][apply(data[,c(2:9)],1,which.min)]
    data$MaxMonth <- x
    data$MinMonth <- y
    

    这样会更好吗?

    【讨论】:

    • 好的,谢谢。我注意到这是我的错误。
    【解决方案3】:

    我们可以用pivot_longer重塑长格式,按'Year'分组,得到'value'的max/min对应的列名(用which.max/which.min),然后与原始数据连接

    library(dplyr)
    library(tidyr)
    df %>% 
        pivot_longer(cols = -1) %>%
        group_by(Year) %>%
        summarise(maxMonth = name[which.max(value)],
               minMonth = name[which.min(value)]) %>%
        left_join(df, .)
     
    

    【讨论】:

    • 谢谢。当我有大数据集时,这是最清楚的。
    • 如果我在 summarise 中继续上面的代码,我有: summarise(maxMonth = name[which.max(value)], minMonth = name[which.min(value)], maxValue = max(value), minValue= min(value), Diff = maxValue- minValue)。它将列出 Diff 下的差异,我如何提取最大值和最小值之间最大差异的年份,在这种情况下是 2018 的值 42
    • @JakeParker 你可以使用%&gt;% slice_max(n = 1, order_by = Diff) 来返回行,或者如果你只想要年份那么%&gt;% summarise(year = year[which.max(Diff)]) %&gt;% pull(year)
    【解决方案4】:
    library(tidyverse)
    df %>% 
      mutate(max_month = pmap(across(January:August), ~ names(c(...)[which.max(c(...))])),
             min_month = pmap(across(January:August), ~ names(c(...)[which.min(c(...))]))
             ) %>% 
        unnest(cols = c(max_month, min_month)) %>%
      rowwise() %>% 
      mutate(Diff = max(c_across(January:August)) - min(c_across(January:August)))
    

    输出:

       Year January February March April   May  June  July August max_month min_month  Diff
      <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <chr>     <chr>     <dbl>
    1  2018      45       51    63    61    79    85    88     85 July      January      43
    2  2017      51       60    65    69    75    82    86     84 July      January      35
    3  2016      47       55    61    68    72    84    87     85 July      January      40
    

    【讨论】:

      猜你喜欢
      • 2013-06-05
      • 2020-12-23
      • 2021-08-25
      • 2017-01-12
      • 1970-01-01
      • 2015-07-29
      • 1970-01-01
      • 2016-09-10
      • 1970-01-01
      相关资源
      最近更新 更多