R - 创建一列字符矩阵答案

【问题标题】：R - creating a column of character matricesR - 创建一列字符矩阵
【发布时间】：2018-11-10 23:05:01
【问题描述】：

这是我的可重现数据框：

library(tidyverse)
df <- structure(list(PN = c("41681", "16588", "34881", 
"36917", "33116", "68447"), `2017-10` = c(0L, 
0L, 0L, 0L, 0L, 0L), `2017-11` = c(0L, 1L, 0L, 0L, 0L, 0L), `2017-12` = c(0L, 
0L, 0L, 0L, 1L, 0L), `2018-01` = c(0L, 0L, 1L, 1L, 0L, 0L), `2018-02` = c(1L, 
0L, 0L, 0L, 0L, 0L), `2018-03` = c(0L, 0L, 0L, 0L, 0L, 0L), `2018-04` = c(0L, 
0L, 0L, 0L, 0L, 1L), Status = c("OK", "NOK", "OK", "NOK", "OK", 
"OK")), .Names = c("PN", "2017-10", "2017-11", "2017-12", 
"2018-01", "2018-02", "2018-03", "2018-04", "Status"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

长话短说......让我获得上述输出的两个步骤是：

1分析初期

mutate(n = parse_integer(str_replace_na(n, replacement = 0)))

2稍后分析

mutate(
  Status = 
    ifelse(
      (apply(.[, 2:7], 1, sum) > 0) & 
        (.[, 8] > 0), 
      "NOK", 
      "OK"
      )
)

@joran 和 @akrun 两位堆栈战士告诉我，我“创建了一列字符矩阵”，这就是为什么我一直收到 “arrange_impl(.data, dots) 中的错误) : 参数 1 的类型矩阵不受支持" 错误。

用简单的英语我做了什么？我是那种还不了解原子向量和原子粒子之间区别的人。你能用简洁明了的方式回答吗？

或者您可以告诉我阅读 R for Data Science 中的 XYZ 章节或类似的内容。我也会接受（可能在 cmets 中）。

【问题讨论】：

这应该是%>% 中的一个问题。假设如果你在外面做，它是正常向量apply(df[2:7], 1, sum) > 0 [1] TRUE TRUE TRUE TRUE TRUE FALSE可能是某种错误
我没有看到任何字符矩阵列
@Moody_Mudskipper 如果你运行mutate，然后检查str（我使用的是R 3.4.4`和dplyr_0.7.5）
它来自 .[, 8] > 0 ，因为你有一个 tibble，而不是 data.frame，它仍然是一个 tibble。改用.[[8]] > 0（测试df[, 8] > 0 & apply(df[, 2:7], 1, sum) > 0，然后df[[8]] > 0 & apply(df[, 2:7], 1, sum) > 0
mutate(Status = as.vector(Status)) 修复了原来的问题。问题是我不知道 problem 是什么。什么/如何/在哪里/为什么/何时创建“字符矩阵列”？我的原子浮点 R Inferno 错误 C## 是什么？ Heeellllpppppp。谢谢。

标签： r dplyr lapply

【解决方案1】：

要以通常预期的方式运行，ifelse 需要logical 的向量作为第一个参数。

您在这里输入的是（将. 替换为df）：

(apply(df[, 2:7], 1, sum) > 0) & (df[, 8] > 0)
# which btw we can rewrite more clearly as:
# rowSums(df[2:7]) > 0 & df[,8] >0

#      2018-04
# [1,]   FALSE
# [2,]   FALSE
# [3,]   FALSE
# [4,]   FALSE
# [5,]   FALSE
# [6,]   FALSE

常规的data.frame 不会发生这种情况，因为df[,8] 会被转换为向量。

阅读?Extract 关于drop 参数，tibbles 的行为有点像data.frames 与drop = FALSE 做的事情。

head(iris[,1])
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

head(iris[,1,drop=FALSE])
#   Sepal.Length
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

head(as_tibble(iris)[,1])
# # A tibble: 6 x 1
#   Sepal.Length
# <dbl>
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

我们不需要深入了解它是如何转化为您的错误结果的，我们只需设法更正输入即可。

为此，您可以使用df[[8]] 而不是df[,8]，它始终是一个向量。

df %>% mutate(
  Status = 
    ifelse(
      rowSums(.[, 2:7]) > 0 & .[[8]] > 0, 
      "NOK", 
      "OK"
    )
) %>% str

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of  9 variables:
# $ PN     : chr  "41681" "16588" "34881" "36917" ...
# $ 2017-10: int  0 0 0 0 0 0
# $ 2017-11: int  0 1 0 0 0 0
# $ 2017-12: int  0 0 0 0 1 0
# $ 2018-01: int  0 0 1 1 0 0
# $ 2018-02: int  1 0 0 0 0 0
# $ 2018-03: int  0 0 0 0 0 0
# $ 2018-04: int  0 0 0 0 0 1
# $ Status : chr  "OK" "OK" "OK" "OK" ...

现在结构不再有问题了。

另一种方法是使用if_else（来自dplyr 包）代替ifelse，在您的解决方案中只添加一个下划线字符但不会教会我们太多东西:)。它在内部进行魔术转换，就像您在 cmets 中使用 as.vector 所做的那样。

获取您的原始代码并仅添加神奇的_：

df %>% mutate(
  Status = 
    if_else(
      (apply(.[, 2:7], 1, sum) > 0) & 
        (.[, 8] > 0), 
      "NOK", 
      "OK"
    )
) %>% str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of  9 variables:
# $ PN     : chr  "41681" "16588" "34881" "36917" ...
# $ 2017-10: int  0 0 0 0 0 0
# $ 2017-11: int  0 1 0 0 0 0
# $ 2017-12: int  0 0 0 0 1 0
# $ 2018-01: int  0 0 1 1 0 0
# $ 2018-02: int  1 0 0 0 0 0
# $ 2018-03: int  0 0 0 0 0 0
# $ 2018-04: int  0 0 0 0 0 1
# $ Status : chr  "OK" "OK" "OK" "OK" ...

错误说明

df %>% mutate(
  Status = 
    ifelse(
      (apply(.[, 2:7], 1, sum) > 0) & 
        (.[, 8] > 0), 
      "NOK", 
      "OK"
    )
) %>% str

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of  9 variables:
# $ PN     : chr  "41681" "16588" "34881" "36917" ...
# $ 2017-10: int  0 0 0 0 0 0
# $ 2017-11: int  0 1 0 0 0 0
# $ 2017-12: int  0 0 0 0 1 0
# $ 2018-01: int  0 0 1 1 0 0
# $ 2018-02: int  1 0 0 0 0 0
# $ 2018-03: int  0 0 0 0 0 0
# $ 2018-04: int  0 0 0 0 0 1
# $ Status : chr [1:6, 1] "OK" "OK" "OK" "OK" ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr "2018-04"

表明Status是一个6行1列的字符矩阵。 arrange 不喜欢那样。

为什么会得到一个字符矩阵？

df[, 8] 是一个小标题
所以df[, 8] > 0 是一个矩阵
所以(apply(.[, 2:7], 1, sum) > 0) & (.[, 8] > 0) 是一个矩阵

?ifelse 说的是输出值：

具有相同长度和属性的向量（包括维度和 "class") 作为测试

所以Status 将是一个矩阵，一切终于变得有意义了；）。

有关更多信息，另请参阅?dplyr::if_else。

【讨论】：

哇。正是我想要的。如果可以的话，我会再按几次向上箭头。