【问题标题】:R How to add hashtag in data frame into new columnR如何将数据框中的标签添加到新列中
【发布时间】:2022-01-04 04:03:11
【问题描述】:

如何将数据框中的主题标签添加到新列中?

这是我的数据框:

dataframe <- data.frame(a = c('A', 'B', 'C', 'D', 'E'),
                 b = c("hello friends! #goodday", 
                       "the flood getting worse #peoplefirst #sos", 
                       "i love adele new song, it is remarkable", 
                       "john doe loves judo", 
                       "the new variant of covid19 is worrying #staysafe"))

最终的数据框应该是这样的:

a   b                                                 c
A   hello friends! #goodday                           #goodday
B   the flood getting worse #peoplefirst #sos         #peoplefirst #sos              
C   i love adele new song, it is remarkable           NA
D   john doe loves judo                               NA
E   the new variant of covid19 is worrying #staysafe  #staysafe

【问题讨论】:

  • 在你的第二行,“#sos”是从哪里来的?

标签: r string


【解决方案1】:

使用stringr 包:

dataframe$c <- lapply(str_extract_all(dataframe$b, "#\\w+"),
                      function(x) paste(x, collapse=" "))
dataframe

  a                                                b                 c
1 A                          hello friends! #goodday          #goodday
2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
3 C          i love adele new song, it is remarkable                  
4 D                              john doe loves judo                  
5 E the new variant of covid19 is worrying #staysafe         #staysafe

【讨论】:

  • 感谢@tim-biegeleisen 的建议!
【解决方案2】:

使用mutatemapstr_extract_allna_if 的解决方案如下所示。

library(tidyverse)

dataframe |>
  # For every row extract all the letters following a hashtag
  # and paste them into a single character string (for multiple matches)
  mutate(c = map(.x = b, 
                 .f = function(x) paste0(str_extract_all(x, "#[A-z]+", 
                                                         simplify = T), 
                                         collapse = " "))) |>
  # Change empty spaces to NA
  na_if("")

#  a                                                b                 c
#1 A                          hello friends! #goodday          #goodday
#2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
#3 C          i love adele new song, it is remarkable                NA
#4 D                              john doe loves judo                NA
#5 E the new variant of covid19 is worrying #staysafe         #staysafe

【讨论】:

  • 感谢@jonathan-v-solórzano 的洞察力!
【解决方案3】:

另一种方法是使用gsub:

dataframe$c <- gsub("^[^#]*", "", dataframe$b)

# a                                                b                 c
# 1 A                          hello friends! #goodday          #goodday
# 2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
# 3 C          i love adele new song, it is remarkable                  
# 4 D                              john doe loves judo                  
# 5 E the new variant of covid19 is worrying #staysafe         #staysafe

【讨论】:

  • 感谢 alexb 的出色方式!
  • @Haza 没问题。请根据 SO 指导投票/接受答案。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2015-06-29
  • 1970-01-01
  • 1970-01-01
  • 2018-10-27
  • 1970-01-01
  • 1970-01-01
  • 2014-12-28
相关资源
最近更新 更多