【问题标题】:R: matrix of string of integers to array of integer countsR:整数字符串矩阵到整数计数数组
【发布时间】:2013-11-18 11:59:39
【问题描述】:

我有一个由逗号分隔的整数组成的字符矩阵:

> mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)
> mat
     [,1] [,2]  [,3]   
[1,] NA   NA    "3"    
[2,] "1"  "2,1" "1,3,3"

我希望有一个输出,它是一个数字数组,其中 z 索引表示矩阵中整数的计数:

, , 1

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   1    1    1 

, , 2

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   1    NA

, , 3

     [,1] [,2] [,3]
[1,]   NA   NA   1
[2,]   NA   NA   2

我怎样才能做到这一点?

为了了解数据的规模,最终数组的尺寸约为 20,000 x 2,000 x 200,矩阵将是数组的前两个尺寸 (20,000 x 2,000)。

【问题讨论】:

    标签: arrays r matrix


    【解决方案1】:

    这使用循环,可能不是最有效的解决方案:

    mat<-matrix(c(NA,"1",NA,"2,1","3","1,3,3"),nrow=2)
    
    #split the strings
    temp <- strsplit(mat, ",", fixed=TRUE)
    
    #unique values
    levels <- na.omit(unique(do.call(c, temp)))
    
    #convert to factors and use table
    temp <- t(sapply(temp, function(x) table(factor(x, levels=levels))))
    
    #make it an array
    array(temp, c(nrow(mat), ncol(mat), length(levels)))
    # , , 1
    # 
    #      [,1] [,2] [,3]
    # [1,]    0    0    0
    # [2,]    1    1    1
    # 
    # , , 2
    # 
    #      [,1] [,2] [,3]
    # [1,]    0    0    0
    # [2,]    0    1    0
    # 
    # , , 3
    # 
    #      [,1] [,2] [,3]
    # [1,]    0    0    1
    # [2,]    0    0    2
    

    编辑:

    这避免了在循环中应用tablefactor 并且应该更快:

    temp <- strsplit(mat, ",", fixed=TRUE)
    
    id <- rep(seq_along(temp), sapply(temp, length))
    temp <- factor(do.call(c, temp))
    array(t(table(temp, id)), c(nrow(mat), ncol(mat), length(levels(temp))))
    

    【讨论】:

    • 这里有没有办法用 mclapply 替换 sapply?
    • 当然可以。之后你只需要do.call(rbind, temp)
    猜你喜欢
    • 1970-01-01
    • 2016-12-03
    • 1970-01-01
    • 2017-10-29
    • 1970-01-01
    • 2020-09-08
    • 2018-01-02
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多