【发布时间】:2020-02-27 06:07:05
【问题描述】:
我有一个数据要分隔行。
df <- data.frame(text=c("Lately, I haven't been able to view my Online Payment Card. It's prompting me to have to upgrade my account whereas before it didn't. I have used the Card at various online stores before and have successfully used it. But now it's starting to get very frustrating that I have to said \"upgrade\" my account. Do fix this... **I noticed some users have the same issue..","I've been using this app for almost 2 years without any problems. Until, their system just blocked my virtual paying card without any notice. So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs. This app has been a big disappointment."), id=c(1,2), stringsAsFactors = FALSE)
我想拆分文本列中的句子并提出以下内容:
df <- data.frame (text = c("Lately, I haven't been able to view my Online Payment Card. It's prompting me to have to upgrade my account whereas before it didn't. I have used the Card at various online stores before and have successfully used it. But now it's starting to get very frustrating that I have to said \"upgrade\" my account. Do fix this... **I noticed some users have the same issue..",
"I've been using this app for almost 2 years without any problems. Until, their system just blocked my virtual paying card without any notice. So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs. This app has been a big disappointment.",
"Lately, I haven't been able to view my Online Payment Card.",
"It's prompting me to have to upgrade my account whereas before it didn't.",
"I have used the Card at various online stores before and have successfully used it.",
"But now it's starting to get very frustrating that I have to said upgrade my account.",
"Do fix this|", "**I noticed some users have the same issue|",
"I've been using this app for almost 2 years without any problems.",
"Until, their system just blocked my virtual paying card without any notice.",
"So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs.",
"This app has been a big disappointment."), id = c(1, 2, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2), tag = c("DONE", "DONE", NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), stringsAsFactors = FALSE)
我已经使用此代码完成了它,但是我认为 for-loop 太慢了。我需要为 73,000 行执行此操作。所以我需要一种更快的方法。 尝试 1:
library("qdap")
df$tag <- NA
for (review_num in 1:nrow(df)) {
x = sent_detect(df$text[review_num])
if (length(x) > 1) {
for (sentence_num in 1:length(x)) {
df <- rbind(df, df[review_num,])
df$text[nrow(df)] <- x[sentence_num]
}
df$tag[review_num] <- "DONE"
}
}
尝试 2:行:73000,花费时间:252 分钟或 ~4 小时
reviews_df1 <- data.frame(id=character(0), text=character(0))
for (review_num in 1:nrow(df)) {
preprocess_sent <- sent_detect(df$text[review_num])
if (length(preprocess_sent) > 0) {
x <- data.frame(id=df$id[review_num],
text=preprocess_sent)
reviews_df <- rbind(reviews_df1, x)
}
colnames(reviews_df) <- c("id", "text")
}
尝试 3:行:29000,花费时间:170 分钟或 ~2.8 小时
library(qdap)
library(dplyr)
library(tidyr)
df <- data.frame(text=c("Lately, I haven't been able to view my Online Payment Card. It's prompting me to have to upgrade my account whereas before it didn't. I have used the Card at various online stores before and have successfully used it. But now it's starting to get very frustrating that I have to said \"upgrade\" my account. Do fix this... **I noticed some users have the same issue..","I've been using this app for almost 2 years without any problems. Until, their system just blocked my virtual paying card without any notice. So, I was forced to apply for an upgrade and it was rejected thrice, despite providing all of my available IDs. This app has been a big disappointment."), id=c(1,2), stringsAsFactors = FALSE)
df %>%
group_by(text) %>%
mutate(sentences = list(sent_detect(df$text))) %>%
unnest(cols=sentences) -> out.df
out.df
【问题讨论】:
-
试试
data.table。可能会更快,但我不知道会快多少。 -
您是否尝试使用应用于您要拆分的字符串的用户自定义函数来“矢量化”您的循环?请参阅
?apply()或一些blogs 关于这个“系列”功能。这种结构往往比for循环更快。
标签: r performance for-loop