【问题标题】:Find the position of a word and get 3 words before and after the word in R找到一个单词的位置,得到R中单词前后的3个单词
【发布时间】:2025-12-22 03:35:12
【问题描述】:

我是这个library(stringr) 的新手。在我的df 中,我有一个名为 Sentences 的列,其中每行包含一个句子。现在我想找到一个单词和单词前后3个单词的位置..

例如-

string <- "We have a three step process to validate 
           claims data we use in our analysis." 

如果我们搜索单词validate,它将返回8,而单词----'step' 'process' 'to' 'claims' 'data' 'we'。我试过str_matchstr_extract

【问题讨论】:

    标签: sql r


    【解决方案1】:

    使用strsplitgrep

    myString <- "We have a three step process to validate claims data we use in our analysis."
    
    # Split the string into individual words
    splitString <- strsplit(myString, " ")[[1]]
    
    # Find the location of the word of interest
    loc <- grep("validate", splitString)
    
    # Subset as you normally would
    splitString[(loc-3):(loc+3)]
    # [1] "step"     "process"  "to"       "validate" "claims"   "data"     "we"      
    

    更新

    如果向量中有多个字符串,可以尝试以下操作。我对其进行了一些修改,以使其更安全,而不是尝试提取不存在的位置。

    words <- c("How data is Validated?", 
               "We have a three step process to validate claims data we use in our analysis.",
               "Sample Validate: Since No One vendor can provide the total population of claims in a given geographic region")
    
    x <- strsplit(words, " ")
    lapply(x, function(y) {
      len <- length(y)
      locs <- grep("validate", y, ignore.case=TRUE)
      min <- ifelse((locs - 3) <= 0, 1, locs-3)
      max <- ifelse((locs + 3) >= length(y), length(y), locs + 3)
      y[min:max]
    })
    # [[1]]
    # [1] "How"        "data"       "is"         "Validated?"
    # 
    # [[2]]
    # [1] "step"     "process"  "to"       "validate" "claims"   "data"     "we"      
    # 
    # [[3]]
    # [1] "Sample"    "Validate:" "Since"     "No"        "One"      
    

    如您所见,结果是向量的list

    【讨论】:

    • > words [1] “如何验证数据?” [2] “我们有一个三步流程来验证我们在分析中使用的索赔数据。” [3] “样本验证:由于没有一个供应商可以提供给定地理区域内的索赔总数,”> splitString loc loc [1] 2 如果返回每个句子中 validate 的位置以及 +-3 个单词会很有帮助。感谢您的帮助
    最近更新 更多