在关键字之前和之后拉字符串答案

【问题标题】：Pull Strings Before and After Key Words在关键字之前和之后拉字符串
【发布时间】：2020-01-14 21:37:31
【问题描述】：

不确定这在 SAS 中是否可行；虽然我正在慢慢学习 SAS 中的任何事情都是可能的......

我有一个包含 600 名患者的数据集，在该数据集中我有一个评论变量。评论变量包含每个患者就他/她的护理陈述的几句话。例如，数据集如下所示：

 ID        Comment
 1         Today we have great service. everyone was really nice.
 2         The customer service team did not know what they were talking about and was rude.
 3         Everyone was very helpful 5 stars.
 4         Not very helpful at all.
 5         Staff was nice.
 6         All the people was really nice.

假设我确定了一些我感兴趣的关键词；例如友善、粗鲁和乐于助人。

有没有办法提取这些单词之前的 2 个字符串并生成频率表？

 WORD            Frequency 
 Was Really Nice         2
 And Was Rude            1
 Was Very Helpful        1
 Not very helpful        1

我已经编写了一个代码，它可以帮助我识别关键词，这个代码创建了注释变量中每个单词的频率计数。

 data PG_2 / view=PG_2;
 length word $20;
 set PG_1;
 do i = 1 by 1 until(missing(word));
 word = upcase(scan(COMMENT, i));
 if not missing(word) then output;
 end;
 keep word;
 run;

 proc freq data=PG_2 order=freq;
 table word / out=wordfreq(drop=percent);
 run;

【问题讨论】：

标签： sas

【解决方案1】：

您是否查看过 SAS 中的 perl 正则表达式 (PRX) 函数。我认为他们可能会解决您的问题。

您可以使用正则表达式捕获组，使用prxparse 和prxposn 在关键字之前直接提取两个词。下面应该抓取评论变量中单词 nice 之前的任何两个单词，并将它们添加到 firstTwoStrings 变量中。

data firstTwoStrings;
   length firstTwoStrings $200;
   retain re;
   if _N_ = 1 then
      re = prxparse('/(\w+ \w+) nice/'); /*change 'nice' to your desired keyword*/
   set comments;
   if prxmatch(re, COMMENT) then 
      do;
         firstTwoStrings = prxposn(re, 1, COMMENT);
      end;
run;

【讨论】：

或者使用下面的语句一次搜索所有有趣的单词 '/(\w+ \w+ (nice|rude|helpful))/' 也可以考虑使用prxnext在一个字符串中进行多次匹配
@Ben_Corcoran 感谢这看起来很棒，现在要测试一下！很抱歉没有尽快回复 - 我一发布这个问题就患上了胃病，过去 3 天都没有工作。