R从列中拆分值并将值类保留为因素而不是列表

【问题标题】：R Split values from a column and keep values class as factors not as listR从列中拆分值并将值类保留为因素而不是列表
【发布时间】：2018-09-07 07:45:16
【问题描述】：

我知道类似的问题被问了很多，但我找不到能满足我的问题的问题。

这是我的问题。我有一个如下所示的数据框：

Sample        Condition
RN001         1_healthy
RN002         14_healthy
RN008         20_disease
RN009         21_disease
RN0010        10_healthy

我需要从 Condition 列中拆分值来得到这个：

Sample        Condition
RN001         healthy
RN002         healthy
RN008         disease
RN009         disease
RN0010        healthy

我已经试过了：

data$Condition <- lapply(strsplit(as.character(data$Condition), "_"), '[', 2)

但是我得到了一个这样的列表数据结构：

[[1]]
[1] "healthy"

[[2]]
[1] "healthy"

[[3]]
[1] "disease"

[[4]]
[1] "disease"

我需要的是一个具有类因子的数据结构，如下所示：

 [1] healthy healthy disease disease healthy ...
 2 Levels:  healthy disease

感谢您的 cmets。

【问题讨论】：

【解决方案1】：

我们使用sub 来删除前缀部分，方法是匹配开头 (^) 后跟下划线 (_) 的一个或多个数字 (\\d+)，并将其替换为空白 ("")

data$Condition <- sub("^\\d+_", "", data$Condition)
data$Condition
#[1] "healthy" "healthy" "disease" "disease" "healthy"

lapply 的输出始终是list。所以，如果我们需要vector，请使用sapply

data$Condition <- sapply(strsplit(as.character(data$Condition), "_"), '[', 2)

或unlist list 输出来自lapply

data$Condition <- unlist(lapply(strsplit(as.character(data$Condition), "_"), '[', 2))

【讨论】：