【问题标题】:Table frequency from multiple col and multiple row in RR中多列和多行的表频率
【发布时间】:2017-03-25 03:41:59
【问题描述】:

我正在尝试从此数据框中获取频率表:

tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L),
                       a3 = c(0L, 1L, 0L), b1 = c(1L, 0L, 1L),
                       b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 1L)),
                       .Names = c("a1", "a2", "a3", "b1", "b2", "b3"),
                       class = "data.frame", row.names = c(NA, -3L))


tmp2 <- read.csv("tmp2.csv", sep=";")
tmp2
> tmp2
  a1 a2 a3 b1 b2 b3
1  1  1  0  1  1  0
2  0  0  1  0  0  1
3  0  1  0  1  0  1

我尝试获取如下频率表:

table(tmp2[,1:3], tmp2[,4:6])

但我明白了:

sort.list(y) 中的错误:对于“sort.list”,“x”必须是原子的
您是否在列表中调用了“排序”?

预期输出:

信息:不需要方阵,例如我应该能够添加 b4 b5 并保留 a1 a2 a3

【问题讨论】:

  • 为什么是a2 b1 2?
  • 在 tmp2 中支持 1 行 = 1 个客户端。所以 2 个客户有 a2 和 b1
  • crossprod 在这里也很有用; crossprod(as.matrix(tmp2[1:3]), as.matrix(tmp2[4:6]))

标签: r frequency


【解决方案1】:

一个选项:

matrix(colSums(tmp2[,rep(1:3,3)] & tmp2[,rep(4:6,each=3)]),
       ncol=3,nrow=3,
       dimnames=list(colnames(tmp2)[1:3],colnames(tmp2)[4:6]))
#   b1 b2 b3
#a1  1  1  0
#a2  2  1  1
#a3  0  0  1

如果ab的列数不同,可以试试:

acols<-1:3 #state the indices of the a columns
bcols<-4:6 #same for b; if you add a column this should be 4:7
matrix(colSums(tmp2[,rep(acols,length(bcols))] & tmp2[,rep(bcols,each=length(acols))]),
           ncol=length(bcols),nrow=length(acols),
           dimnames=list(colnames(tmp2)[acols],colnames(tmp2)[bcols]))

【讨论】:

  • 您好,谢谢,这很有趣。我有个问题。如果我有例如 a1 a2 a3 和 b1 b2 b3 b4 ,那会起作用吗? (就是说加b4)?
【解决方案2】:

这是一个可能的解决方案:

aIdxs <- 1:3
bIdxs <- 4:7

# init matrix
m <- matrix(0,
            nrow = length(aIdxs), ncol=length(bIdxs),
            dimnames = list(colnames(tmp2)[aIdxs],colnames(tmp2)[bIdxs]))

# create all combinations of a's and b's column indexes
idxs <- expand.grid(aIdxs,bIdxs)

# for each line and for each combination we add 1
# to the matrix if both a and b column are 1 
for(r in 1:nrow(tmp2)){
  m <- m + matrix(apply(idxs,1,function(x){ all(tmp2[r,x]==1) }),
                  nrow=length(aIdxs), byrow=FALSE)
}
> m
   b1 b2 b3
a1  1  1  0
a2  2  1  1
a3  0  0  1

【讨论】:

    【解决方案3】:

    这里是另一种可能的解决方案。您的输入对于“表”来说有点棘手,因为您天生就有两组“a”和“b”,每行中的二进制指示符仅指示“a”和“b”之间的成对实例,并且您希望遍历它们.下面是一个通用的(但可能不是那么优雅)的函数,它适用于不同长度的 'a's 和 'b's:

    tmp2 <- structure(list(a1 = c(1L, 0L, 0L), a2 = c(1L, 0L, 1L), a3 = c(0L, 
                                                                  1L, 0L), b1 = c(1L, 0L, 1L), b2 = c(1L, 0L, 0L), b3 = c(0L, 1L, 
                                                                                                                          1L)), .Names = c("a1", "a2", "a3", "b1", "b2", "b3"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                    -3L))                                                                                                                                                                                                               
    fun = function(x) t(do.call("cbind", lapply(x[,grep("a", colnames(x))], 
        function(p) rowSums(do.call("rbind", lapply(x[,grep("b", colnames(x))], 
        function(q) q*p ))))))
    fun(tmp2)
    #> fun(tmp2)
    #   b1 b2 b3
    #a1  1  1  0
    #a2  2  1  1
    #a3  0  0  1
    
    # let's do a bigger example
    set.seed(1)
    m = matrix(rbinom(size=1, n=50, prob=0.75), ncol=10, dimnames=list(paste("instance_", 1:5, sep=""), c(paste("a",1:4,sep=""), paste("b",1:6,sep=""))))
    
    # Notice that the count of possible a and b elements are not equal
    #> m
    #           a1 a2 a3 a4 b1 b2 b3 b4 b5 b6
    #instance_1  1  0  1  1  0  1  1  1  0  0
    #instance_2  1  0  1  1  1  1  1  0  1  1
    #instance_3  1  1  1  0  1  1  1  1  0  1
    #instance_4  0  1  1  1  1  0  1  1  1  1
    #instance_5  1  1  0  0  1  1  0  1  1  1
    
    fun(as.data.frame(m))
    #> fun(as.data.frame(m))
    #   b1 b2 b3 b4 b5 b6
    #a1  3  4  3  3  2  3
    #a2  3  2  2  3  2  3
    #a3  3  3  4  3  2  3
    #a4  2  2  3  2  2  2
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-12-01
      • 2022-01-19
      • 1970-01-01
      • 2023-02-23
      • 1970-01-01
      • 2013-03-08
      • 2017-06-16
      • 1970-01-01
      相关资源
      最近更新 更多