【问题标题】:dplyr Error: length(rows) == 1 is not TRUE in Rdplyr 错误:长度(行)== 1 在 R 中不是 TRUE
【发布时间】:2018-10-26 19:58:51
【问题描述】:

作为一些背景知识,我正在使用的数据来自某些变量的前 3 名。我需要能够计算 1s、2s、3s 和 NAs(# ppl 没有将其列入前 3 名)。

我有我的数据框 LikelyRenew_ReasonB,我使用 dplyr 过滤特定年份和状态,它可以正常工作/没有错误。

LikelyRenew_ReasonB <-    
  LikelyRenew_Reason %>%
      filter(year ==1, status ==2)

> LikelyRenew_ReasonB
  cost products commun reimburse policy discount status year
1   NA       NA     NA        NA     NA       NA      2    1
2   NA       NA      1         2     NA       NA      2    1
3    2       NA      3        NA      1       NA      2    1
4   NA       NA     NA         1     NA       NA      2    1
5   NA       NA      3         1      2       NA      2    1
6   NA       NA      2         1      3       NA      2    1
7   NA       NA      1        NA     NA       NA      2    1
8   NA        2      3         1     NA       NA      2    1
9    3       NA      1        NA      2       NA      2    1

但是,当我尝试获取摘要计数时,它会引发错误:Error: length(rows) == 1 is not TRUE in R. 我不知道为什么会出现此错误,如果我更改过滤器会进一步到年==3,状态==1,然后它工作正常。关于我在这里缺少什么的任何想法?

    LikelyRenew_ReasonB  %>%
          summarize(
            costC = count(cost), 
            productsC = count(products),
            communC = count(commun),
            reimburseC = count(reimburse),
            policyC = count(policy),
            discountC = count(discount))

这就是 LikelyRenew_ReasonB 的样子(*请注意,当我有 year ==3,status ==1 作为过滤器时,这是 dput head)

> dput(head(LikelyRenew_ReasonB))
structure(list(costC = structure(list(x = c(1, 2, 3, NA), freq = c(10L, 
11L, 17L, 149L)), .Names = c("x", "freq"), row.names = c(NA, 
4L), class = "data.frame"), productsC = structure(list(x = c(1, 
2, 3, NA), freq = c(31L, 40L, 30L, 86L)), .Names = c("x", "freq"
), row.names = c(NA, 4L), class = "data.frame"), communC = structure(list(
x = c(1, 2, 3, NA), freq = c(51L, 50L, 34L, 52L)), .Names = c("x", 
"freq"), row.names = c(NA, 4L), class = "data.frame"), reimburseC = 
structure(list(
x = c(1, 2, 3, NA), freq = c(42L, 26L, 25L, 94L)), .Names = c("x", 
"freq"), row.names = c(NA, 4L), class = "data.frame"), policyC = 
structure(list(
x = c(1, 2, 3, NA), freq = c(31L, 25L, 28L, 103L)), .Names = c("x", 
"freq"), row.names = c(NA, 4L), class = "data.frame"), discountC = 
structure(list(
x = c(1, 2, 3, NA), freq = c(2L, 2L, 3L, 180L)), .Names = c("x", 
 "freq"), row.names = c(NA, 4L), class = "data.frame")), .Names = c("costC", 
"productsC", "communC", "reimburseC", "policyC", "discountC"), row.names = 
c(NA, 
 4L), class = "data.frame")

这是一个“工作”的例子。同样,问题是由于某种原因,当我将状态/年份更改为不同的兴趣段时出现错误。

> LikelyRenew_ReasonB <-    
+   LikelyRenew_Reason %>%
+   dplyr::filter(year ==3, status ==1) %>%
+   plyr::summarize(
+                 costC = count(cost), 
+                 productsC = count(products),
+                 communC = count(commun),
+                 reimburseC = count(reimburse),
+                 policyC = count(policy),
+                 discountC = count(discount))

这是正确输出的示例

    > LikelyRenew_ReasonB
    costC.x costC.freq productsC.x productsC.freq
1       1         10           1             31
2       2         11           2             40
3       3         17           3             30
4      NA        149          NA             86

【问题讨论】:

  • 对不起,我一直在摆弄它试图弄明白
  • > dput(head(LikelyRenew_Reason)) 结构(list(cost = c(3, NA, NA, NA, NA, 3), products = c(2, NA, NA, 3, 3 , 2), commun = c(1, 1, 2, 1, 2, 1), 报销 = c(NA, 2, 1, NA, NA, NA), 保单 = c(NA, NA, 3, NA, 1, NA), 折扣 = c(NA, 3, NA, 2, NA, NA), 状态 = c(5, 5, 1, 5, 1, 1), 年 = c(3, 3, 3, 3 , 3, 3)),codepage = 65001L, .Names = c("cost", "products", "commun", "reimburse", "policy", "discount", "status", "year"), row .names = c(NA, 6L), class= "data.frame")
  • 所有的 dput(head(LikelyRenew_Reason)) 就像 1500 行。如果这有帮助的话,去掉 variable.labels = structure(c(...))
  • 好的,最初,你没有提到你使用plyr::summarize而不是dplyr::summarize,所以我们假设你试图使用dplyr::summarize。下次问问题时,请始终包括您正在使用的包,以及哪些功能来自哪个包。否则会很混乱。

标签: r filter dplyr plyr


【解决方案1】:

Count() 是 summarise() https://dplyr.tidyverse.org/reference/tally.html 的包装器。也许你想要的是使用 sum() 而不是 count()?

LikelyRenew_ReasonB %>%
    summarize(
        costC = sum(cost, na.rm = TRUE),
        productsC = sum(products, na.rm = TRUE),
        communC = sum(commun, na.rm = TRUE),
        reimburseC = sum(reimburse, na.rm = TRUE),
        policyC = sum(policy, na.rm = TRUE),
        discountC = sum(discount, na.rm = TRUE))

【讨论】:

  • 数据来自前3名的排名问题,所以我希望它计算1s,2s,3s和NAs的数量(没有包括在前3名的人)
  • @ElainSlaven 你能澄清一下预期的输出是什么样的吗?
  • 我把它添加到@SGY的问题中
猜你喜欢
  • 2020-06-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-04-09
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多