【发布时间】:2017-07-31 08:13:19
【问题描述】:
我有以下数据框:
df <- structure(list(gene_id = c("RNA18S5", "RNA18S5", "RNA18S5", "RNA18S5",
"RNA18S5"), samplename = c("XX_135_S14.Adipose", "XX_133_S12.Adipose",
"XX_128_S7.Umbilical", "XX_117_S11.Liver", "XX_124_S3.Pulmonary"
), gene_expr = c(6533029L, 5494889L, 5491158L, 5232914L, 5151004L
)), .Names = c("gene_id", "samplename", "gene_expr"), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
df
#> gene_id samplename gene_expr
#> 1 RNA18S5 XX_135_S14.Adipose 6533029
#> 2 RNA18S5 XX_133_S12.Adipose 5494889
#> 3 RNA18S5 XX_128_S7.Umbilical 5491158
#> 4 RNA18S5 XX_117_S11.Liver 5232914
#> 5 RNA18S5 XX_124_S3.Pulmonary 5151004
我想做的是拆分samplename并创建新列。
我试过了:
library(tidyverse)
df <- df %>%
mutate(subtype=stringr::str_split(samplename,"\\.")[[1]][2])
df
这给出了这个:
# A tibble: 5 x 4
gene_id samplename gene_expr subtype
<chr> <chr> <int> <chr>
1 RNA18S5 XX_135_S14.Adipose 6533029 Adipose
2 RNA18S5 XX_133_S12.Adipose 5494889 Adipose
3 RNA18S5 XX_128_S7.Umbilical 5491158 Adipose
4 RNA18S5 XX_117_S11.Liver 5232914 Adipose
5 RNA18S5 XX_124_S3.Pulmonary 5151004 Adipose
请注意,子类型列不正确。我希望输出是:
gene_id samplename gene_expr subtype
1 RNA18S5 XX_135_S14.Adipose 6533029 Adipose
2 RNA18S5 XX_133_S12.Adipose 5494889 Adipose
3 RNA18S5 XX_128_S7.Umbilical 5491158 Umbilical
4 RNA18S5 XX_117_S11.Liver 5232914 Liver
5 RNA18S5 XX_124_S3.Pulmonary 5151004 Pulmonary
正确的做法是什么?
【问题讨论】:
-
你真的不需要拆分。
df %>% mutate(subtype = sub('.*\\.', '', samplename))应该足够了 -
但是如果你真的想拆分,你最好用
str_split_fixed...stringr::str_split_fixed(df$samplename, "\\.", 2)[,2]