【问题标题】:return column name when value above threshold当值高于阈值时返回列名
【发布时间】:2020-02-09 15:09:32
【问题描述】:

这是对先前问题的后续跟进:Return column name for max function

ethnic <- c("white", "black", "hispanic", "asian", "other")
ethnicity$ethnicity  <- ethnic[max.col(ethnicity[ethnic], 'first')]

此代码根据比例最高的种族类别返回每个人的种族。太好了。

但是,我想更进一步。而不是返回比例最高的族群,我希望它返回超过 0.8 的族群。不同的是,如果比例最高(最大)的种族类别低于 0.8,则返回“No Match”。

例如

John:
white  black  hispanic  asian  other
0.5.   0.2    0.1       0.2    0.0

这应该返回No Match

Jack:
white  black  hispanic  asian  other
0.8    0.1    0.0       0.1    0.0

这应该返回白色

这是一个使用dput() 的可重现示例:

ethnicity <- structure(list(year = c(2010L, 2013L, 2009L, 2014L, 2001L), property = c("6446 025", 
"6710 034", "0525 065", "0272 006", "1720 030"), address = c("1147 NAPLES ST", 
"73 MIZPAH ST", "43 ESTATES CT", "650 BUSH ST", ""), city = c("SAN FRANCISCO CA", 
"SAN FRANCISCO CA", "SAN RAFAEL CA", "SAN FRANCISCO CA", ""), 
    city_overflow = c("", "", "", "", ""), zip = c("94112", "94131", 
    "94901", "94108", ""), surname = c("DELEON", "HENDERSON", 
    "KOORHAN", "EXECUTIVE", "WONG"), name = c("ESTELA", "DANIEL", 
    "GLENN", "HOTEL", "CHUN"), middle = c(NA, "V ", "S", "VINTAGE COURT", 
    NA), Owner2 = c("GAMEZ JUAN", " HELENE E", NA, NA, NA), Owner3 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), org = c(FALSE, FALSE, FALSE, FALSE, FALSE), surname.match = c("DELEON", 
    "HENDERSON", "", "", "WONG"), white = c(0.0664, 0.5963, 0.6665, 
    0.6665, 0.0348), black = c(0.0104, 0.3398, 0.0853, 0.0853, 
    0.0079), hispanic = c(0.8306, 0.0251, 0.1367, 0.1367, 0.04
    ), asian = c(0.0831, 0.0045, 0.0797, 0.0797, 0.8649), other = c(0.0095, 
    0.0342, 0.0318, 0.0318, 0.0524), ethnicity = c("hispanic", 
    "white", "white", "white", "asian")), row.names = c("1998", 
"3431", "6884", "39310", "9524"), class = "data.frame")

【问题讨论】:

标签: r


【解决方案1】:

这是一个以 R 为基础的答案。

ethnicity$ethnicity <- apply(ethnicity[14:18], 1, function(x) {
  i <- x >= 0.8
  if(any(i)) ethnic[i] else "No Match"
})
ethnicity

【讨论】:

  • 我可以在 r studio 中按原样使用它吗?
  • @user12801590 我相信是的。如果您尝试过,请提供反馈。
  • 效果很好,非常感谢!我非常感谢。
【解决方案2】:

为了继续原来的解决方案,我添加了一个值为 0.8 的新列。然后max.col 可以找到最大值的列索引。如果最大值为 0.8,则返回“No_Match”的索引。

原解决方案

ethnic[max.col(ethnicity[ethnic], 'first')]

# [1] "hispanic" "white"    "white"    "white"    "asian"   

调整解决方案

c(ethnic, "No_Match")[max.col(cbind(ethnicity[ethnic], 0.8), 'first')]

# [1] "hispanic" "No_Match" "No_Match" "No_Match" "asian"

【讨论】:

  • 非常感谢!这太不可思议了。
猜你喜欢
  • 2022-08-23
  • 2013-07-01
  • 1970-01-01
  • 1970-01-01
  • 2017-07-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多