【发布时间】:2020-09-24 19:15:32
【问题描述】:
我有一个包含许多变量及其选项的大数据框,因此我想要计算所有变量及其选项的数量。例如下面的数据框。
我还有另一个相同的数据框,如果我想合并这两个数据框,检查列的名称是否相同,如果不是,则获取不同列名的名称。
不包括 c(uniqueid,name) 列 目的是在 count 的帮助下找出我们是否有任何拼写错误的单词,或者这些单词是否有任何口音。
df11 <- data.frame(uniqueid=c(9143,2357,4339,8927,9149,4285,2683,8217,3702,7857,3255,4262,8501,7111,2681,6970),
name=c("xly,mnn","xab,Lan","mhy,mun","vgtu,mmc","ftu,sdh","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","sghu,njui","sgyu,hytb","vdti,kula","mftyu,huta","mhuk,ghul","cday,bhsue","ajtu,nudj"),
city=c("A","B","C","C","D","F","S","C","E","S","A","B","W","S","C","A"),
age=c(22,45,67,34,43,22,34,43,34,52,37,44,41,40,39,30),
country=c("usa","USA","AUSTRALI","AUSTRALIA","uk","UK","SPAIN","SPAIN","CHINA","CHINA","BRAZIL","BRASIL","CHILE","USA","CANADA","UK"),
language=c("ENGLISH(US)","ENGLISH(US)","ENGLISH","ENGLISH","ENGLISH(UK)","ENGLISH(UK)","SPANISH","SPANISH","CHINESE","CHINESE","ENGLISH","ENGLISH","ENGLISH","ENGLISH","ENGLISH","ENGLISH(US)"),
gender=c("MALE","FEMALE","male","m","f","MALE","FEMALE","f","FEMALE","MALE","MALE","MALE","FEMALE","FEMALE","MALE","MALE"))
输出应该类似于变量组计数及其选项的摘要。它是一种枢轴,例如:对于城市 因此它应该选择数据框中的所有可用列,并为列中可用的所有选项提供一种计数摘要
【问题讨论】:
-
您的输入数据目前不可重现。也不清楚您的预期输出是什么。
-
你试过
unique()、lenght()、base::intersect()、attributes()、str()、names()等吗? -
我刚刚更新了问题,对 R 来说有点新
-
你可以看看"tidyverse" options,或者,如果你只是想要快速的结果,你可以试试
apply(df11[, 2:ncol(df11)], 2, table)之类的东西 -
为了更好的显示:
apply(df11[, 2:ncol(df11)], 2, function(x) as.data.frame(table(x)))
标签: r