【问题标题】:Editing Arules Data Frame in R在 R 中编辑 Arules 数据框
【发布时间】:2014-11-26 07:27:31
【问题描述】:

您好,我已将我的规则转换为数据框以供进一步分析,但问题是我的数据框如下所示:

df <- data.frame(rules=c("{45107} => {62557}","{17759} => {60521 }",
"{53721} => {53720}","{63830} => {17753}","{45413} => {45412}",
"{3885,59800,17759} => {4749}","{17721,55906} => {9314}"))

    rules
{45107} => {62557}
{17759} => {60521 }
{53721} => {53720}
{63830} => {17753}
{45413} => {45412}
{3885,59800,17759} => {4749}
{17721,55906} => {9314}

你能帮我把我的数据框改成这种格式吗?

lhs1    lhs2    lhs3    rhs
45107           62557
17759           60521
53721           53720
63830           17753
45413           45412
3885    59800   17759   4749
17721   55906   9314

【问题讨论】:

  • @syan dasgupta,我已经将我的 arules 转换为数据框,所以我现在的问题是如何将我的列分成数据框中的多个列。这个问题实际上与规则无关。
  • 哎呀对不起是的;明白了
  • 很抱歉,但我认为=&gt; 之后的所有数字都会进入rhs 列,而之前的所有数字都将在lhs. 列中“发送”?
  • @CathG 当我将 arules 更改为数据框时,我也是这么想的,但不幸的是它合并到 1 列称为规则。

标签: r arules


【解决方案1】:

你也可以做这样的事情,应该很高效。

library(splitstackshape)  ## for cSplit() and loads data.table package

dt <- data.table(
    do.call(rbind, strsplit(gsub("[{} ]", "", df$rules), "=>"))
)
cbind(cSplit(dt[, .(V1)], "V1", ","), dt[, .(V2)])

#     V1_1  V1_2  V1_3    V2
# 1: 45107    NA    NA 62557
# 2: 17759    NA    NA 60521
# 3: 53721    NA    NA 53720
# 4: 63830    NA    NA 17753
# 5: 45413    NA    NA 45412
# 6:  3885 59800 17759  4749
# 7: 17721 55906    NA  9314

【讨论】:

    【解决方案2】:

    使用您的 data.frame df 并将所有数字放在 =&gt; 之后的 rhs 中:

    # define the number of maximum "lhs", there is 2 options :
       # option 1, if there are few rules and number of maximum "lhs" is obvious :
    maxlhs<-3
       # option 2, if there are many many rules and you don't want to count all "lhs" :
    maxlhs<-max(sapply(df$rules,FUN=function(x)length(gregexpr(',',x)[[1]]))) + 1 
    
    # create your new data.frame by "reformatting" the rules
    newdf<-t(apply(df,1,function(rule,maxlhs){
                    split1<-strsplit(gsub("[ }{]","",rule),"=>")[[1]]
                    split2<-strsplit(split1[1],",")[[1]]
                    split2<-c(split2,rep(NA,maxlhs-length(split2)))
                    return(as.numeric(c(split2,split1[2])))
                        },maxlhs=maxlhs))
    # name the new data.frame's columns
    colnames(newdf)<-c(paste0("lhs",1:maxlhs),"rhs")
    
    > newdf
          lhs1  lhs2  lhs3   rhs
    [1,] 45107    NA    NA 62557
    [2,] 17759    NA    NA 60521
    [3,] 53721    NA    NA 53720
    [4,] 63830    NA    NA 17753
    [5,] 45413    NA    NA 45412
    [6,]  3885 59800 17759  4749
    [7,] 17721 55906    NA  9314
    

    可以吗,还是您希望新的 data.frame 与您的问题中显示的完全一样?

    【讨论】:

    • 您需要确定可能的lhs变量的最大数量为3。这里是硬编码的
    • @sayandasgupta,是的,你是对的,我会把它作为参数,谢谢!
    • 使用这个maxlhs &lt;- max(sapply(df$rules,FUN=function(x)length(gregexpr(',',x)[[1]]))) + 1
    • @sayandasgupta,虽然如果只有很少的规则可能没有必要,但如果有很多规则,这确实非常有用,所以,再次,你是对的,我会的添加该行,再次感谢。
    【解决方案3】:
    # your data
    library(stringr)
    data <- structure(list(rules = c("{45107} => {62557}", "{17759} => {60521 }", "{53721} =>     {53720}", "{63830} => {17753}", "{45413} => {45412}", "{3885,59800,17759} => {4749}", "{17721,55906} => {9314}")), .Names = "rules", class = "data.frame", row.names = c(NA, -7L))
    
    # extract all numbers
    lhs <- lapply(data, function(x) str_extract_all(x, "\\d+"))$rules
    mx <- max(sapply(lhs, length))
    
    do.call("rbind", lapply(lhs, function(x){
      if(length(x) < mx){
       return(c(unlist(x)[-length(x)], matrix(NA, 1, mx - length(x)), unlist(x)[length(x)]))
       } else {
       return(x)
    }}))
    
         [,1]    [,2]    [,3]    [,4]   
    [1,] "45107" NA      NA      "62557"
    [2,] "17759" NA      NA      "60521"
    [3,] "53721" NA      NA      "53720"
    [4,] "63830" NA      NA      "17753"
    [5,] "45413" NA      NA      "45412"
    [6,] "3885"  "59800" "17759" "4749" 
    [7,] "17721" "55906" NA      "9314" 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-11-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-05-15
      相关资源
      最近更新 更多