【发布时间】:2020-02-09 15:09:32
【问题描述】:
这是对先前问题的后续跟进:Return column name for max function
ethnic <- c("white", "black", "hispanic", "asian", "other")
ethnicity$ethnicity <- ethnic[max.col(ethnicity[ethnic], 'first')]
此代码根据比例最高的种族类别返回每个人的种族。太好了。
但是,我想更进一步。而不是返回比例最高的族群,我希望它返回超过 0.8 的族群。不同的是,如果比例最高(最大)的种族类别低于 0.8,则返回“No Match”。
例如
John:
white black hispanic asian other
0.5. 0.2 0.1 0.2 0.0
这应该返回No Match。
Jack:
white black hispanic asian other
0.8 0.1 0.0 0.1 0.0
这应该返回白色。
这是一个使用dput() 的可重现示例:
ethnicity <- structure(list(year = c(2010L, 2013L, 2009L, 2014L, 2001L), property = c("6446 025",
"6710 034", "0525 065", "0272 006", "1720 030"), address = c("1147 NAPLES ST",
"73 MIZPAH ST", "43 ESTATES CT", "650 BUSH ST", ""), city = c("SAN FRANCISCO CA",
"SAN FRANCISCO CA", "SAN RAFAEL CA", "SAN FRANCISCO CA", ""),
city_overflow = c("", "", "", "", ""), zip = c("94112", "94131",
"94901", "94108", ""), surname = c("DELEON", "HENDERSON",
"KOORHAN", "EXECUTIVE", "WONG"), name = c("ESTELA", "DANIEL",
"GLENN", "HOTEL", "CHUN"), middle = c(NA, "V ", "S", "VINTAGE COURT",
NA), Owner2 = c("GAMEZ JUAN", " HELENE E", NA, NA, NA), Owner3 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), org = c(FALSE, FALSE, FALSE, FALSE, FALSE), surname.match = c("DELEON",
"HENDERSON", "", "", "WONG"), white = c(0.0664, 0.5963, 0.6665,
0.6665, 0.0348), black = c(0.0104, 0.3398, 0.0853, 0.0853,
0.0079), hispanic = c(0.8306, 0.0251, 0.1367, 0.1367, 0.04
), asian = c(0.0831, 0.0045, 0.0797, 0.0797, 0.8649), other = c(0.0095,
0.0342, 0.0318, 0.0318, 0.0524), ethnicity = c("hispanic",
"white", "white", "white", "asian")), row.names = c("1998",
"3431", "6884", "39310", "9524"), class = "data.frame")
【问题讨论】:
-
请添加minimal reproducible example。这样你就可以帮助别人帮助你!
标签: r