【问题标题】:Add column index of min value for each row in new column为新列中的每一行添加最小值的列索引
【发布时间】:2020-09-17 17:31:02
【问题描述】:

我有一个现有的 df,我想在其中添加一个新列,其列索引为特定列范围的最小值 - 在本例中为 [,28:30]。

我已经能够添加两列来获得这个范围的平均值和标准差,但是,当我使用 which.min 来填充我的最后一列时,它不起作用。

这是我迄今为止尝试过的:

df$m1 <- apply(df[,28:30], 1, mean)
df$sd <-apply(df[,28:30], 1, sd)
df$indxcol <- apply(df[,28:30],1, which.min)

上面的代码有效,但它添加了列名和索引值,而我只希望索引为整数。

我也用 mutate 试过这个,但它没有添加任何东西


df$m1 <- apply(df[,28:30], 1, mean)
df$sd <-apply(df[,28:30], 1, sd)
df%>%mutate(indxcol = apply(.[,28:30],1, which.min))


这是我的 df 样本

df <- structure(list(YY = c(2007, 2007, 2007, 2007, 2007, 2007), DD = c(4, 
4, 4, 4, 4, 4), MM = c("Aug", "Aug", "Aug", "Aug", "Aug", "Aug"
), Date = structure(c(13729, 13729, 13729, 13729, 13729, 13729
), class = "Date"), `ID (FIFA)` = c("FRA D1", "FRA D1", "FRA D1", 
"FRA D1", "FRA D1", "FRA D1"), Country = c("France", "France", 
"France", "France", "France", "France"), League = c("Ligue 1", 
"Ligue 1", "Ligue 1", "Ligue 1", "Ligue 1", "Ligue 1"), Season = c("2007/2008", 
"2007/2008", "2007/2008", "2007/2008", "2007/2008", "2007/2008"
), HOME = c("Bordeaux", "Caen", "Lille", "Monaco", "Paris SG", 
"Rennes"), AWAY = c("Lens", "Nice", "Lorient", "St Etienne", 
"Sochaux", "Nancy"), `Final Scores` = c(1, 1, 0, 1, 0, 0), ...12 = c(0, 
0, 0, 1, 0, 2), ...13 = c("H", "H", "D", "D", "D", "A"), ...14 = c("U", 
"U", "U", "U", "U", "U"), `ET/Pen/Awd` = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    `1st Half Scores` = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), ...17 = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), ...18 = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), ...19 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2nd Half Scores` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...21 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...22 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), ...23 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), FTMoneyline...24 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...25 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...26 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Payout, %...27` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), FTMoneyline...28 = c(2.2, 
    2.4, 1.72, 1.9, 1.72, 1.72), ...29 = c(2.75, 2.75, 3, 2.9, 
    3, 3.2), ...30 = c(3.4, 3, 5, 4, 5, 4.5), `Payout, %...31` = c(89.9038461538462, 
    89.795918367347, 89.7079276773296, 89.1946580331849, 89.7079276773296, 
    89.596295760382), `FT TG 2.5...32` = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), ...33 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `FT TG 2.5...34` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...35 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), OUTCOME = c(1, 
    1, 2, 2, 2, 3), REFGAME = c("G423", "G424", "G425", "G426", 
    "G427", "G428"), m1 = c(2.78333333333333, 2.71666666666667, 
    3.24, 2.93333333333333, 3.24, 3.14)), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

这是我使用第一个代码得到的 df 输出


df <-structure(list(YY = c(2007, 2007, 2007, 2007, 2007, 2007), DD = c(4, 
4, 4, 4, 4, 4), MM = c("Aug", "Aug", "Aug", "Aug", "Aug", "Aug"
), Date = structure(c(13729, 13729, 13729, 13729, 13729, 13729
), class = "Date"), `ID (FIFA)` = c("FRA D1", "FRA D1", "FRA D1", 
"FRA D1", "FRA D1", "FRA D1"), Country = c("France", "France", 
"France", "France", "France", "France"), League = c("Ligue 1", 
"Ligue 1", "Ligue 1", "Ligue 1", "Ligue 1", "Ligue 1"), Season = c("2007/2008", 
"2007/2008", "2007/2008", "2007/2008", "2007/2008", "2007/2008"
), HOME = c("Bordeaux", "Caen", "Lille", "Monaco", "Paris SG", 
"Rennes"), AWAY = c("Lens", "Nice", "Lorient", "St Etienne", 
"Sochaux", "Nancy"), `Final Scores` = c(1, 1, 0, 1, 0, 0), ...12 = c(0, 
0, 0, 1, 0, 2), ...13 = c("H", "H", "D", "D", "D", "A"), ...14 = c("U", 
"U", "U", "U", "U", "U"), `ET/Pen/Awd` = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    `1st Half Scores` = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), ...17 = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), ...18 = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), ...19 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2nd Half Scores` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...21 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...22 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), ...23 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), FTMoneyline...24 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...25 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...26 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `Payout, %...27` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), FTMoneyline...28 = c(2.2, 
    2.4, 1.72, 1.9, 1.72, 1.72), ...29 = c(2.75, 2.75, 3, 2.9, 
    3, 3.2), ...30 = c(3.4, 3, 5, 4, 5, 4.5), `Payout, %...31` = c(89.9038461538462, 
    89.795918367347, 89.7079276773296, 89.1946580331849, 89.7079276773296, 
    89.596295760382), `FT TG 2.5...32` = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_), ...33 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `FT TG 2.5...34` = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ...35 = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), OUTCOME = c(1, 
    1, 2, 2, 2, 3), REFGAME = c("G423", "G424", "G425", "G426", 
    "G427", "G428"), m1 = c(2.78333333333333, 2.71666666666667, 
    3.24, 2.93333333333333, 3.24, 3.14), sd = c(0.600694043031337, 
    0.301385688667085, 1.65311826558175, 1.05039675043925, 1.65311826558175, 
    1.39097088395121), indxcol = list(c(FTMoneyline...28 = 1L), 
        c(FTMoneyline...28 = 1L), c(FTMoneyline...28 = 1L), c(FTMoneyline...28 = 1L), 
        c(FTMoneyline...28 = 1L), c(FTMoneyline...28 = 1L))), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

感谢您的帮助

【问题讨论】:

  • 您需要重新分配给原始对象df &lt;- df%&gt;%mutate(indxcol = apply(.[,7:9],1, which.min))。第 7 列和第 8 列是 character 类。在执行此操作之前,您可能需要转换为数字。 str(df[7:9])
  • 更快的选择是max.col df &lt;- df %&gt;% mutate(across(7:9, as.numeric)) %&gt;% mutate(indxcol = max.col(-1 * .[7:9], 'first'))
  • 非常感谢您的回答@akrun。它对这个 df 很有效,但实际上我正在另一个数据帧上尝试这个,结果也包括列的名称,我不明白为什么。这是一个
  • 您能否展示一个说明问题的小示例的dput。您的第一组代码似乎有拼写错误apply(l[,7:9],1, unlist(which.min)),即使用了unlist
  • 我收到df %&gt;% mutate(indxcol = max.col(-1 * .[28:30], 'first')) %&gt;% .$indxcol# [1] 1 1 1 1 1 1

标签: r


【解决方案1】:

输出异常的原因可能是在某些行/行中,列子集只有NAs,导致integer(0)

which.min(c(NA, NA, NA))
#integer(0)

举个例子

df1 <- data.frame(col1 = c(1, 2, NA), col2 = c(2, 3, NA), col3 = c(3, 4, NA))

现在,输出是list

apply(df1, 1, which.min)
#[[1]]
#col1 
#   1 

#[[2]]
#col1 
#   1 

#[[3]]
#integer(0)

代替which.min,我们可以使用索引1 将integer(0) 强制转换为NA

apply(df1, 1, function(x) which.min(x[!is.na(x)])[1])
#[1]  1  1 NA

在 OP 的代码中,它会是

df$indxcol <- apply(df[,28:30],1, function(x) which.min(x[!is.na(x)])[1])

【讨论】:

  • 这确实是问题所在,您的解决方案运行良好,感谢您的帮助@akrun
【解决方案2】:

使用dplyr 的解决方案并确认@akrun 用于定位NA 问题的所有工作。使用 OP 的数据,但添加一行 NA 值进行测试。使用select 语句仅打印感兴趣的列。我们将测试which.min 的结果,如果它不是length == 1,则声明它不适用

library(dplyr)

df <- add_row(df, .after = 6)

df %>%
   rowwise() %>%
   mutate(m1 = mean(c_across(28:30)),
          sd = sd(c_across(28:30)),
          idxcol = ifelse(length(which.min(c_across(28:30))) == 1, 
                          which.min(c_across(28:30)), 
                          NA)
          ) %>%
   ungroup() %>%
   select(28:30, 38:40)

#> # A tibble: 7 x 6
#>   FTMoneyline...28 ...29 ...30    m1     sd idxcol
#>              <dbl> <dbl> <dbl> <dbl>  <dbl>  <int>
#> 1             2.2   2.75   3.4  2.78  0.601      1
#> 2             2.4   2.75   3    2.72  0.301      1
#> 3             1.72  3      5    3.24  1.65       1
#> 4             1.9   2.9    4    2.93  1.05       1
#> 5             1.72  3      5    3.24  1.65       1
#> 6             1.72  3.2    4.5  3.14  1.39       1
#> 7            NA    NA     NA   NA    NA         NA

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-08-21
    • 2013-06-06
    • 2018-01-29
    • 1970-01-01
    • 2014-12-23
    • 2022-01-13
    • 1970-01-01
    相关资源
    最近更新 更多