【问题标题】:Why rbind throws a warning为什么 rbind 会抛出警告
【发布时间】:2014-09-25 23:58:54
【问题描述】:

这与Are there more elegant ways to transform ragged data into a tidy dataframe有关

为什么下面的代码不起作用:

events = structure(list(date = structure(c(-714974, -714579, -717835), class = "Date"), 
    days = c(1, 6, 0.5), name = c("Intro to stats", "Stats Winter school", 
    "TidyR tools"), topics = c("probability|R", "R|regression|ggplot", 
    "tidyR|dplyr")), .Names = c("date", "days", "name", "topics"
), row.names = c(NA, -3L), class = "data.frame")

> newdf <- data.frame(topic=character(), days=character())
> for(i in 1:length(events$topics)){
+ xx = unlist(strsplit(events$topics[i],'\\|'))
+ for(j in 1:length(xx)){
+ yy = c(xx[j], events$days[i]/length(xx))
+ print(yy)
+ newdf=rbind(newdf, yy)
+ }
+ }
[1] "probability" "0.5"        
[1] "R"   "0.5"
[1] "R" "2"
[1] "regression" "2"         
[1] "ggplot" "2"     
[1] "tidyR" "0.25" 
[1] "dplyr" "0.25" 
There were 11 warnings (use warnings() to see them)
> newdf
  X.probability. X.0.5.
1    probability    0.5
2           <NA>    0.5
3           <NA>   <NA>
4           <NA>   <NA>
5           <NA>   <NA>
6           <NA>   <NA>
7           <NA>   <NA>
> 
> warnings()
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA ... :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
4: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
5: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
6: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
7: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
8: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
9: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
10: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, NA,  ... :
  invalid factor level, NAs generated
11: In `[<-.factor`(`*tmp*`, ri, value = structure(c(1L, 1L,  ... :
  invalid factor level, NAs generated
> 

yy 没问题,但 rbind 不工作。错误在哪里,如何纠正?感谢您的帮助。

【问题讨论】:

    标签: r


    【解决方案1】:

    你可以试试:

    newdf <- data.frame(topic=character(), daysPerTopic=character(), stringsAsFactors=F)
    for(i in 1:length(events$topics)){
    xx = unlist(strsplit(events$topics[i],'\\|'))
    for(j in 1:length(xx)){
    yy = data.frame(topic=xx[j], daysPerTopic=events$days[i]/length(xx), stringsAsFactors=F)
    newdf <- rbind(newdf, yy) 
     }
     }
    
     newdf
    #        topic daysPerTopic
    # 1 probability         0.50
    # 2           R         0.50
    # 3           R         2.00
    # 4  regression         2.00
    # 5      ggplot         2.00
    # 6       tidyR         0.25
    # 7       dplyr         0.25
    

    或者

     op <- options(stringsAsFactors=F)  #set to F
    
     #Your code
     newdf <- data.frame(topic=character(), days=character())
     for(i in 1:length(events$topics)){
     xx = unlist(strsplit(events$topics[i],'\\|'))
     for(j in 1:length(xx)){
    yy = c(xx[j], events$days[i]/length(xx))
    print(yy)
    newdf=rbind(newdf, yy)
     }
     }
    
     newdf
    #  X.probability. X.0.5.
    # 1    probability    0.5
    # 2              R    0.5
    # 3              R      2
    # 4     regression      2
    # 5         ggplot      2
    # 6          tidyR   0.25
    # 7          dplyr   0.25
    
     options(op) #et back to default
    

    【讨论】:

    • 我没有意识到 rbind 的两个参数都应该是数据帧。
    • @rnso,他们不是。处理因素时要小心,仅此而已。
    • 好的。 stringsAsFactors=F 是关键问题。谢谢
    【解决方案2】:

    您是否尝试过调试您的for 循环?例如,通过添加print(class(yy))print(str(newdf)),您会看到在第一次迭代后,两个newdf 向量都成为因子。

    # [1] "probability" "0.5"        
    # [1] "character"
    # 'data.frame':  0 obs. of  2 variables:
    #   $ topic: Factor w/ 0 levels: 
    #   $ days : Factor w/ 0 levels: 
    #   NULL
    # [1] "R"   "0.5"
    # [1] "character"
    # 'data.frame': 1 obs. of  2 variables:
    #   $ X.probability.: Factor w/ 1 level "probability": 1
    # $ X.0.5.        : Factor w/ 1 level "0.5": 1
    # NULL
    # [1] "R" "2"
    # [1] "character"
    # 'data.frame': 2 obs. of  2 variables:
    #   $ X.probability.: Factor w/ 1 level "probability": 1 NA
    # $ X.0.5.        : Factor w/ 1 level "0.5": 1 1
    
    ...
    

    你会说“但我将它们定义为character”。没错,但是如果您阅读 rbind 文档,您会看到

    对于 cbind (rbind),零长度的向量(包括 NULL)被忽略 除非结果将有零行(列),以实现 S 兼容性。 (零范围矩阵不会出现在 S3 中,也不会在 R 中被忽略。)

    rbind 的另一个属性是它从 data.frame 继承它的属性,其中一个是 stringsAsFactors == TRUE

    这里发生的事情可以很容易地用一个虚拟的例子来说明,考虑一下

    temp <- data.frame(A = letters[1:3])
    str(temp)
    ## 'data.frame':    3 obs. of  1 variable:
    ## $ A: Factor w/ 3 levels "a","b","c": 1 2 3
    
    temp$A[3] <- "d"
    ## Warning message:
    ## In `[<-.factor`(`*tmp*`, 3, value = c(1L, 2L, NA)) :
    ##   invalid factor level, NA generated
    
    temp$A
    ## [1] a    b    <NA>
    ## Levels: a b c
    

    您可以在这里看到两件事:

    • data.frame 自动将 character 类转换为因子
    • 在尝试将新级别解析为 factor 向量时,它会将其转换为 NA 并抛出您收到的确切错误

    正如@akrun 所说,设置为options(stringsAsFactors=F) 将解决您的问题

    【讨论】:

    • 是的,是的,嗯,是的,是的,我同意。 +1
    • 我确实在调试代码中尝试了 print(..) 行,但并没有在这里写下所有内容。
    【解决方案3】:

    设置 选项(字符串AsFactors = FALSE) 并且您的代码应该按预期工作。结果中出现警告和 NA 的原因是由于隐式转换为因子以及 newdf 列和 yy 之间的类型不匹配,请参阅https://stackoverflow.com/a/1640729/1541036

    为了获得相同结果的更简洁的方法,这里是使用 data.table 的 group by 解决方案

    library(data.table)
    events <- as.data.table(events)
    events2 <- events[, list(topic=unlist(strsplit(topics, '|', fixed=TRUE))), by=c("date", "days", "name")]
    events2[, probability := days / .N, by=name]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-03-10
      • 2016-04-06
      • 1970-01-01
      相关资源
      最近更新 更多