如何从字符串“？”中存在的问号开始搜索字符串和落后？答案

【问题标题】：How to start searching a string from a question mark exists in the string "?" and backward?如何从字符串“？”中存在的问号开始搜索字符串和落后？
【发布时间】：2017-10-25 17:00:15
【问题描述】：

例如我有这个：

$string = 'PHP is a server side web programming language , Do you like PHP ?  , PHP is fantastic';

$array = array('html','css','javascript','ajax','html5','css3','jquery','PHP');

foreach($array as $ar){
   //Check if one of the $array values exists before the question mark '?' in the $string
}

我想在问号“？”之前进行搜索仅在 $string 中，因此如果 $array 值 "PHP" 不在问号 "?" 之前然后什么都不会发生，因为它不存在，PHP 可以是 $array 中的任何其他值，所以我不知道应该找到该值的长度，我的意思是这个词可以重复并且长度不同。

i.e : $string = 'html .... , html is fantastic , Do you like html? , I love html'; ，现在单词的长度更大了，它可能会更大。

如何在问号之前和“like”之后找到唯一的直“PHP” ['Do you like PHP ?'] 单词的长度是多少？

【问题讨论】：

不要挑剔，但问号前还有一个空格。
你的other question怎么了？
How to search the end of a string for a text exists in an array?的可能重复
@ishegg：很好，没注意到。
@Joe，如果解决了，接受对您帮助最大的答案。

标签： php arrays regex

【解决方案1】：

你可以用正则表达式做你想做的事，但如果你对文本进行标记，你将拥有更大的灵活性：

<?php
$string = 'PHP is a server side web programming language , Do you like PHP?, Do you like Javascript ? What is Ajax?? Coding is fun.';
$find = ['html','css','javascript','ajax','html5','css3','jquery','php'];

// Convert to lowercase and add whitespace to punctuation
$tokenized_string = preg_replace("/([^a-zA-Z0-9'-_ ])/", ' \1 ', strtolower($string));

// Condense multiple sequential spaces into a single space
$tokenized_string = preg_replace('/ {2,}/', ' ', $tokenized_string);

// Tokenize the text into words
$words = explode(' ', $tokenized_string);

// Find search terms directly preceding a question mark token
$question_words = array_filter(
    array_intersect($words, $find),
    function($k) use ($words) {
        return @$words[$k+1] == '?';
    },
    ARRAY_FILTER_USE_KEY
);

// Output our matches
var_dump($question_words);

这会创建一个标准化的标记数组，如$words，例如：

array(30) {
  [0] =>
  string(3) "php"
  [1] =>
  string(2) "is"
  [2] =>
  string(1) "a"
  [3] =>
  string(6) "server"
  [4] =>
  string(4) "side"
  [5] =>
  string(3) "web"
  [6] =>
  string(11) "programming"
  [7] =>
  string(8) "language"
  [8] =>
  string(1) ","
  [9] =>
  string(2) "do"
  [10] =>
  string(3) "you"
  [11] =>
  string(4) "like"
  [12] =>
  string(3) "php"
  [13] =>
  string(1) "?"
  [14] =>
  string(1) ","
  [15] =>
  string(2) "do"
  [16] =>
  string(3) "you"
  [17] =>
  string(4) "like"
  [18] =>
  string(10) "javascript"
  [19] =>
  string(1) "?"
  [20] =>
  string(4) "what"
  [21] =>
  string(2) "is"
  [22] =>
  string(4) "ajax"
  [23] =>
  string(1) "?"
  [24] =>
  string(1) "?"
  [25] =>
  string(6) "coding"
  [26] =>
  string(2) "is"
  [27] =>
  string(3) "fun"
  [28] =>
  string(1) "."
  [29] =>
  string(0) ""
}

它返回在问号之前找到的一组搜索词，以它们在$words 数组中的位置为关键字：

array(3) {
  [12] =>
  string(3) "php"
  [18] =>
  string(10) "javascript"
  [22] =>
  string(4) "ajax"
}

这假设您没有使用像 node.js 这样的搜索词，其中包含标点符号，尽管您可以通过这种方法很容易地适应这一点。

它还假设您没有像 amazon s3 这样的多字搜索词。除了使用 array_intersect() 之外，您还可以使用 array_keys($words, '?') 遍历问号标记，并根据字长在其前面的标记中检查您的搜索词。

【讨论】：