找到一个单词的位置，得到R中单词前后的3个单词答案

【问题标题】：Find the position of a word and get 3 words before and after the word in R找到一个单词的位置，得到R中单词前后的3个单词
【发布时间】：2025-12-22 03:35:12
【问题描述】：

我是这个library(stringr) 的新手。在我的df 中，我有一个名为 Sentences 的列，其中每行包含一个句子。现在我想找到一个单词和单词前后3个单词的位置..

例如-

string <- "We have a three step process to validate 
           claims data we use in our analysis."

如果我们搜索单词validate，它将返回8，而单词----'step' 'process' 'to' 'claims' 'data' 'we'。我试过str_match 和str_extract。

【问题讨论】：

标签： sql r

【解决方案1】：

使用strsplit 和grep：

myString <- "We have a three step process to validate claims data we use in our analysis."

# Split the string into individual words
splitString <- strsplit(myString, " ")[[1]]

# Find the location of the word of interest
loc <- grep("validate", splitString)

# Subset as you normally would
splitString[(loc-3):(loc+3)]
# [1] "step"     "process"  "to"       "validate" "claims"   "data"     "we"

更新

如果向量中有多个字符串，可以尝试以下操作。我对其进行了一些修改，以使其更安全，而不是尝试提取不存在的位置。

words <- c("How data is Validated?", 
           "We have a three step process to validate claims data we use in our analysis.",
           "Sample Validate: Since No One vendor can provide the total population of claims in a given geographic region")

x <- strsplit(words, " ")
lapply(x, function(y) {
  len <- length(y)
  locs <- grep("validate", y, ignore.case=TRUE)
  min <- ifelse((locs - 3) <= 0, 1, locs-3)
  max <- ifelse((locs + 3) >= length(y), length(y), locs + 3)
  y[min:max]
})
# [[1]]
# [1] "How"        "data"       "is"         "Validated?"
# 
# [[2]]
# [1] "step"     "process"  "to"       "validate" "claims"   "data"     "we"      
# 
# [[3]]
# [1] "Sample"    "Validate:" "Since"     "No"        "One"

如您所见，结果是向量的list。

【讨论】：

> words [1] “如何验证数据？” [2] “我们有一个三步流程来验证我们在分析中使用的索赔数据。” [3] “样本验证：由于没有一个供应商可以提供给定地理区域内的索赔总数，”> splitString loc loc [1] 2 如果返回每个句子中 validate 的位置以及 +-3 个单词会很有帮助。感谢您的帮助