【问题标题】:Frequencies data table multiple columns频率数据表多列
【发布时间】:2016-11-13 01:27:21
【问题描述】:

我有一个这样的数据表

require(data.table)
dt <- data.table(a= c("a","a","b","b","b"), b= c("a","a","c","c","e"),    c=c("d","d","b","b","b"))

我想从所有列统计频率。我知道如何一一完成,但我想在一条指令中完成,因为我的数据有很多列。

结果必须是这个:

dt[,a1:=.N, by = c("a")]
dt[,a2:=.N, by = c("b")]
dt[,a3:=.N, by = c("c")]  

【问题讨论】:

  • 使用for() 循环。
  • @RichScriven 你能给我举个例子吗?
  • 试试nm1 &lt;- paste0("a", seq_along(dt));for(j in seq_along(dt)){ dt[, nm1[j] := .N, by = c(names(dt)[j])] }

标签: r data.table multiple-columns frequency


【解决方案1】:
require(data.table)
dt <- data.table(a= c("a","a","b","b","b"), 
                 b= c("a","a","c","c","e"),   
                 c=c("d","d","b","b","b"))
#dt
#   a b c
#1: a a d
#2: a a d
#3: b c b
#4: b c b
#5: b e b

l=lapply(seq_along(colnames(dt)), 
         function(i) dt[,eval(colnames(dt)[i]),with=F][, x:=.N,by=eval(colnames(dt)[i])])
#l 
#[[1]]
#   a x
#1: a 2
#2: a 2
#3: b 3
#4: b 3
#5: b 3

#[[2]]
#   b x
#1: a 2
#2: a 2
#3: c 2
#4: c 2
#5: e 1

#[[3]]
#   c x
#1: d 2
#2: d 2
#3: b 3
#4: b 3
#5: b 3


df = as.data.frame(l)

# replacing alternate column names with concatenating "_count" to it
colnames(df)[seq(2,length(colnames(df)),2)]=
 paste0(colnames(df)[seq(1,length(colnames(df)),2)],"_count")

#df
#  a a_count b b_count c c_count
#1 a       2 a       2 d       2
#2 a       2 a       2 d       2
#3 b       3 c       2 b       3
#4 b       3 c       2 b       3
#5 b       3 e       1 b       3

【讨论】:

  • 如果你消除了x :=.N 并且只做:l=lapply(seq_along(colnames(dt)), function(i) dt[,eval(colnames(dt)[i]),with=F][,.N,by=eval(colnames(dt)[i])]) 解决方案会更好。
猜你喜欢
  • 1970-01-01
  • 2018-06-15
  • 1970-01-01
  • 1970-01-01
  • 2019-06-14
  • 2022-08-19
  • 1970-01-01
  • 2021-08-23
  • 2016-12-07
相关资源
最近更新 更多