【问题标题】:finding common prefix of array of strings查找字符串数组的公共前缀
【发布时间】:2021-07-19 18:52:50
【问题描述】:

我有一个这样的数组:

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

我想找到字符串的最长公共前缀。在这种情况下,它将是'Softball - '

我想我会遵循这个过程

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

问题

  1. 是否有内置函数或更简单的方法来执行此操作?

  2. 对于我的 5 线阵列,这可能没问题,但如果我要做几千个线阵列,就会有很多开销,所以我必须用我的起始值 @987654325 进行移动计算@,例如 $i = 字符串的一半,如果失败,则 $i/2 直到它工作,然后将 $i 增加 1 直到我们成功。这样我们就可以通过最少的比较来获得结果。

是否已经有针对此类问题的公式/算法?

【问题讨论】:

  • 您在寻找最长的公共前缀还是子字符串?例如。如果你有 a_abli 和 a_cable,答案应该是“a_”还是“abl”?
  • 前缀,我修改了标题更具体
  • 如果你有这样的数组,为什么不直接分解每个项目并获取第一个元素?
  • @silentghost,重新爆炸,它是用户可编辑的数据,所以它可能与我给出的示例有很大不同

标签: php algorithm string


【解决方案1】:

如果您可以对数组进行排序,那么有一个简单且非常快速的解决方案。

只需将第一项与最后一项进行比较。

如果对字符串进行了排序,则所有字符串共有的任何前缀对于排序的第一个和最后一个字符串都是公共的。

sort($sport);

$s1 = $sport[0];               // First string
$s2 = $sport[count($sport)-1]; // Last string
$len = min(strlen($s1), strlen($s2));

// While we still have string to compare,
// if the indexed character is the same in both strings,
// increment the index. 
for ($i=0; $i<$len && $s1[$i]==$s2[$i]; $i++); 

$prefix = substr($s1, 0, $i);

【讨论】:

  • 辉煌;使用已经优化的排序算法,所以不需要花费精力优化这个特殊的代码。
  • 看起来这只是比较两个字符串,而不是整个字符串列表......我在这里遗漏了什么吗?
  • @NathanJ.Brauer 对字符串进行排序对它们进行排序,因此所有字符串共有的任何前缀也将适用于第一个和最后一个字符串。因此我们只需要比较第一个和最后一个字符串来确定公共前缀。
【解决方案2】:

我会用这个:

$prefix = array_shift($array);  // take the first item as initial prefix
$length = strlen($prefix);
// compare the current prefix with the prefix of the same length of the other items
foreach ($array as $item) {
    // check if there is a match; if not, decrease the prefix by one character at a time
    while ($length && substr($item, 0, $length) !== $prefix) {
        $length--;
        $prefix = substr($prefix, 0, -1);
    }
    if (!$length) {
        break;
    }
}

更新 这是另一种解决方案,迭代地比较字符串的每个第 n 个字符,直到发现不匹配:

$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
while ($pl < $l) {
    $c = $array[0][$pl];
    for ($i=1; $i<$n; $i++) {
        if ($array[$i][$pl] !== $c) break 2;
    }
    $pl++;
}
$prefix = substr($array[0], 0, $pl);

这更有效,因为最多只有 numberOfStrings‍·‍commonPrefixLength 个原子比较。

【讨论】:

  • 我怀疑第二种解决方案在 OP 的测试用例中效率较低,给定许多字符串,因为它一次构建一个字符的前缀,在此期间测试所有字符串。所以它总是会做 numberOfStrings x commonPrefixLength 比较。而第一种解决方案可以快速消除大部分前缀,只需比较前两个字符串。
  • 更好(对于某些数据)将是两个循环。第一个循环从零开始,但只检查前两个字符串,一次一个字符。这给出了可能大小的“上限”。然后对其余字符串执行第一个解决方案,减少前缀。对于显示的测试数据,成本是N + L 而不是N * L
【解决方案3】:

我在代码中实现了@diogoriba 算法,结果如下:

  • 找到前两个字符串的共同前缀,然后将其与从第 3 个开始的所有后续字符串进行比较,如果没有找到共同点,则修剪公共字符串,在前缀中的共同点多于前缀的情况下获胜不同。
  • 但是,bumperbox 的原始算法(错误修正除外)在字符串前缀的共同点少于不同点的情况下胜出。 代码cmets中的详细信息!

我实现的另一个想法:

首先检查数组中最短的字符串,并将其用于比较,而不是简单的第一个字符串。 在代码中,这是通过自定义编写的函数arrayStrLenMin()实现的。

  • 可以显着降低迭代次数,但函数 arrayStrLenMin() 本身可能会导致(或多或少)迭代。
  • 仅从数组中第一个字符串的长度开始似乎很笨拙,但如果 arrayStrLenMin() 需要多次迭代,可能会变得有效。

以尽可能少的迭代获取数组中字符串的最大公共前缀 (PHP)

代码 + 广泛测试 + 备注:

function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
    $errArrZeroLength = -1; // Return value for error: Array is empty
    $errOtherType = -2;     // Return value for error: Found other type (than string in array)
    $errStrNone = -3;       // Return value for error: No strings found (in array)

    $arrLength = count($arr);
    if ($arrLength <= 0 ) { return $errArrZeroLength; }
    $cur = 0;

    foreach ($arr as $key => $val) {
        if (is_string($val)) {
            $min = strlen($val);
            $strFirstFound = $key;
            // echo("Key\tLength / Notification / Error\n");
            // echo("$key\tFound first string member at key with length: $min!\n");
            break;
        }
        else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
    }
    if (! isset($min)) { return $errStrNone; } // No string was found in array.

    // SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
    // http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html

    // If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!

    if (! $forLoop) {
        foreach ($arr as $key => $val) {
            if (is_string($val)) {
                $cur = strlen($val);
                // echo("$key\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
        // else { echo("$key\tNo string!\n"); }
        }
    }

    // If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!

    else {
        for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
            if (is_string($arr[$i])) {
                $cur = strlen($arr[$i]);
                // echo("$i\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
            // else { echo("$i\tNo string!\n"); }
        }
    }

    return $min;
}

function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 1; $i <= $end + 1; $i++) {
        // Grab the part from 0 up to $i
        $commonStrMax = substr($arr[0], 0, $i);
        echo("Match: $i\t$commonStrMax\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($commonStrMax != substr($str, 0, $i)) {
                    return substr($commonStrMax, 0, $i-1);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return $commonStrMax; // Thus entire first common string is the common prefix!
}

function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 0 ; $i <= $end + 1; $i++) {
        // Grab char $i
        $char = substr($arr[0], $i, 1);
        echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
                    return substr($arr[0], 0, $i);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}


function strCommonPrefixByNeighbour($arr) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    /// Get the common string prefix of the first 2 strings
    $strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
    if ($strCommonMax === false) { return false; }
    if ($strCommonMax == "") { return ""; }
    $strCommonMaxLength = strlen($strCommonMax);

    /// Now start looping from the 3rd string
    echo("-----\n");
    for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
        echo("  STR: $i\t{$arr[$i]}\n");

        /// Compare the maximum common string with the next neighbour

        /*
        //// Compare by char: Method unsuitable!

        // Iterate from string end to string beginning
        for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
            echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
            // If you find the first mismatch from the end, break.
            if ($arr[$i][$ii] != $strCommonMax[$ii]) {
                $strCommonMaxLength = $ii - 1; break;
                // BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
            }
        }
        */

        //// Compare by string

        for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
            echo("MATCH: $ii\t$strCommonMax\n");
            if (substr($arr[$i],0,$ii) == $strCommonMax) {
                break;
            }
            else {
                $strCommonMax = substr($strCommonMax,0,$ii - 1);
                $strCommonMaxLength--;
            }
        }
    }
    return substr($arr[0], 0, $strCommonMaxLength);
}





// Tests for finding the common prefix

/// Scenarios

$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);

$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);

$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);

$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);

$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);

$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);

$one = array ("/Volumes/1/a/a/aaaaa/1");

$empty = array ( );


// Test Results for finding  the common prefix

/*

I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!

Considering iteration efficiency, I also performed tests:

I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^  Str:" | wc -l   GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^  Str:" | wc -l   PLUS   | egrep "^MATCH:" | wc -l   GIVES TOTAL ITERATION SUM.

My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.

*/

// Test Results Table
// Used Functions | Iteration amount | Remarks

// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!

// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!

echo("Common prefix of all members:\n");
var_dump($result);





// Tests for finding the shortest string in array

/// Arrays

// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);

/// Scenarios

// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!

//// Fewest iterations (immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// More iterations (No immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/


//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    0   No string!
    1   No string!
    2   No string!
    3   No string!
    4   4
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/

【讨论】:

  • Gumbo 的回答要简单得多。你能证明这个答案明显更快吗?
  • 查看 Gustav Bertram 的最新答案(目前远在下方),通过首先排序字符串列表来避免所有这些。这样做的好处是您可以利用已经存在的、优化的排序方法。因此可以轻松实现简单的解决方案,无需花费精力进行优化!
【解决方案4】:

可能有一些非常受好评的算法,但就在我的脑海中,如果你知道你的共同点会像你的例子一样在左侧,你可以做得比你发布的方法首先找到前两个字符串的共性,然后遍历列表的其余部分,根据需要修剪公共字符串以实现共性,或者如果你一直修剪到没有,则以失败告终。

【讨论】:

    【解决方案5】:

    我认为你的方法是正确的。但不是在所有字符串都通过时增加 i ,您可以这样做:

    1) 比较数组中的前 2 个字符串,找出它们有多少个常见字符。例如,将常用字符保存在名为 maxCommon 的单独字符串中。

    2) 将第三个字符串与 maxCommon 进行比较。如果公共字符数较少,则将 maxCommon 修剪为匹配的字符。

    3) 对阵列的其余部分重复并冲洗。在进程结束时,maxCommon 将拥有所有数组元素共有的字符串。

    这会增加一些开销,因为您需要使用 maxCommon 比较每个字符串,但会大大减少获得结果所需的迭代次数。

    【讨论】:

    • +1 我会怎么做。另一个优点:一旦你得到一个第一个字符不同的字符串,你可以停止遍历剩余的字符串,因为没有公共前缀。
    【解决方案6】:

    我假设“公共部分”是指“最长的公共前缀”。这比任何常见的子字符串都更容易计算。

    如果不读取最坏情况下的(n+1) * m 字符和最好情况下的n * m + 1,则无法做到这一点,其中n 是最长公共前缀的长度,m 是字符串的数量。

    一次比较一个字母可以达到这种效率(Big Theta (n * m))。

    您提出的算法在 Big Theta(n^2 * m) 中运行,这对于大型输入要慢得多。

    第三个提出的算法,寻找前两个字符串的最长前缀,然后与第三个、第四个等进行比较,运行时间也为Big Theta(n * m),但常数因子更高。在实践中它可能只会稍微慢一些。

    总的来说,我建议只滚动你自己的函数,因为第一个算法太慢了,另外两个无论如何编写起来也同样复杂。

    查看 WikiPedia 了解 Big Theta 表示法的说明。

    【讨论】:

      【解决方案7】:

      这是一个优雅的 JavaScript 递归实现:

      function prefix(strings) {
          switch (strings.length) {
      
            case 0:
              return "";
      
            case 1:
              return strings[0];
      
            case 2:
              // compute the prefix between the two strings
              var a = strings[0],
                  b = strings[1],
                  n = Math.min(a.length, b.length),
                  i = 0;
              while (i < n && a.charAt(i) === b.charAt(i))
                  ++i;
              return a.substring(0, i);
      
            default:
              // return the common prefix of the first string,
              // and the common prefix of the rest of the strings
              return prefix([ strings[0], prefix(strings.slice(1)) ]);
          }
      }
      

      【讨论】:

        【解决方案8】:
        1. 我不知道

        2. 是的:不用比较从 0 到长度 i 的子字符串,您可以简单地检查第 i 个字符(您已经知道字符 0 到 i-1 匹配)。

        【讨论】:

          【解决方案9】:

          简短而甜蜜的版本,也许不是最有效的:

          /// Return length of longest common prefix in an array of strings.
          function _commonPrefix($array) {
              if(count($array) < 2) {
                  if(count($array) == 0)
                      return false; // empty array: undefined prefix
                  else
                      return strlen($array[0]); // 1 element: trivial case
              }
              $len = max(array_map('strlen',$array)); // initial upper limit: max length of all strings.
              $prevval = reset($array);
              while(($newval = next($array)) !== FALSE) {
                  for($j = 0 ; $j < $len ; $j += 1)
                      if($newval[$j] != $prevval[$j])
                          $len = $j;
                  $prevval = $newval;
              }
              return $len;
          }
          
          // TEST CASE:
          $arr = array('/var/yam/yamyam/','/var/yam/bloorg','/var/yar/sdoo');
          print_r($arr);
          $plen = _commonprefix($arr);
          $pstr = substr($arr[0],0,$plen);
          echo "Res: $plen\n";
          echo "==> ".$pstr."\n";
          echo "dir: ".dirname($pstr.'aaaa')."\n";
          

          测试用例的输出:

          Array
          (
              [0] => /var/yam/yamyam/
              [1] => /var/yam/bloorg
              [2] => /var/yar/sdoo
          )
          Res: 7
          ==> /var/ya
          dir: /var
          

          【讨论】:

            【解决方案10】:

            @bumperbox

            1. 您的基本代码需要一些修正才能在所有场景中工作!

              • 您的循环只比较最后一个字符之前的一个字符!
              • 不匹配可能发生在最新公共字符之后的 1 个循环周期。
              • 因此,您必须至少检查到第一个字符串的最后一个字符之后的 1 个字符。
              • 因此您的比较运算符必须是“
            2. 目前你的算法失败了

              • 如果第一个字符串完全包含在所有其他字符串中,
              • 或完全包含在除最后一个字符之外的所有其他字符串中。

            在我的下一个答案/帖子中,我将附上迭代优化代码!

            原 Bumperbox 代码 PLUS 修正 (PHP):

            function shortest($sports) {
             $i = 1;
            
             // loop to the length of the first string
             while ($i < strlen($sports[0])) {
            
              // grab the left most part up to i in length
              // REMARK: Culturally biased towards LTR writing systems. Better say: Grab frombeginning...
              $match = substr($sports[0], 0, $i);
            
              // loop through all the values in array, and compare if they match
              foreach ($sports as $sport) {
               if ($match != substr($sport, 0, $i)) {
                // didn't match, return the part that did match
                return substr($sport, 0, $i-1);
               }
              }
             $i++; // increase string length
             }
            }
            
            function shortestCorrect($sports) {
             $i = 1;
             while ($i <= strlen($sports[0]) + 1) {
              // Grab the string from its beginning with length $i
              $match = substr($sports[0], 0, $i);
              foreach ($sports as $sport) {
               if ($match != substr($sport, 0, $i)) {
                return substr($sport, 0, $i-1);
               }
              }
              $i++;
             }
             // Special case: No mismatch happened until loop end! Thus entire str1 is common prefix!
             return $sports[0];
            }
            
            $sports1 = array(
            'Softball',
            'Softball - Eastern',
            'Softball - North Harbour');
            
            $sports2 = array(
            'Softball - Wester',
            'Softball - Western',
            );
            
            $sports3 = array(
            'Softball - Western',
            'Softball - Western',
            );
            
            $sports4 = array(
            'Softball - Westerner',
            'Softball - Western',
            );
            
            echo("Output of the original function:\n"); // Failure scenarios
            
            var_dump(shortest($sports1)); // NULL rather than the correct 'Softball'
            var_dump(shortest($sports2)); // NULL rather than the correct 'Softball - Wester'
            var_dump(shortest($sports3)); // NULL rather than the correct 'Softball - Western'
            var_dump(shortest($sports4)); // Only works if the second string is at least one character longer!
            
            echo("\nOutput of the corrected function:\n"); // All scenarios work
            var_dump(shortestCorrect($sports1));
            var_dump(shortestCorrect($sports2));
            var_dump(shortestCorrect($sports3));
            var_dump(shortestCorrect($sports4));
            

            【讨论】:

              【解决方案11】:

              这样的事情怎么样?如果我们可以使用空终止字符,则可以通过不必检查字符串的长度来进一步优化它(但我假设 python 字符串的长度缓存在某处?)

              def find_common_prefix_len(strings):
                  """
                  Given a list of strings, finds the length common prefix in all of them.
                  So
                  apple
                  applet
                  application
                  would return 3
                  """
                  prefix          = 0
                  curr_index      = -1
                  num_strings     = len(strings)
                  string_lengths  = [len(s) for s in strings]
                  while True:
                      curr_index  += 1
                      ch_in_si    = None
                      for si in xrange(0, num_strings):
                          if curr_index >= string_lengths[si]:
                              return prefix
                          else:
                              if si == 0:
                                  ch_in_si = strings[0][curr_index]
                              elif strings[si][curr_index] != ch_in_si:
                                  return prefix
                      prefix += 1
              

              【讨论】:

                【解决方案12】:

                我会使用这样的递归算法:

                1 - 获取数组中的第一个字符串 2 - 使用第一个字符串作为参数调用递归前缀方法 3 - 如果前缀为空,则不返回前缀 4 - 遍历数组中的所有字符串 4.1 - 如果任何字符串不以前缀开头 4.1.1 - 使用前缀 - 1 作为参数调用递归前缀方法 4.2 返回前缀

                【讨论】:

                  【解决方案13】:
                  
                      // Common prefix
                      $common = '';
                  
                      $sports = array(
                      'Softball T - Counties',
                      'Softball T - Eastern',
                      'Softball T - North Harbour',
                      'Softball T - South',
                      'Softball T - Western'
                      );
                  
                      // find mini string
                      $minLen = strlen($sports[0]);
                      foreach ($sports as $s){
                          if($minLen > strlen($s))
                              $minLen = strlen($s);
                      }
                  
                  
                      // flag to break out of inner loop
                      $flag = false;
                  
                      // The possible common string length does not exceed the minimum string length.
                      // The following solution is O(n^2), this can be improve.
                      for ($i = 0 ; $i < $minLen; $i++){
                          $tmp = $sports[0][$i];
                  
                          foreach ($sports as $s){
                              if($s[$i] != $tmp)
                                  $flag = true;
                          }
                          if($flag)
                              break;
                          else
                              $common .= $sports[0][$i];
                      }
                  
                      print $common;
                  

                  【讨论】:

                    【解决方案14】:

                    这里的解决方案仅适用于在字符串开头查找共性。这是一个在字符串数组中查找最长公共子字符串 anywhere 的函数。

                    http://www.christopherbloom.com/2011/02/24/find-the-longest-common-substring-using-php/

                    【讨论】:

                      【解决方案15】:

                      最佳答案似乎有点长,所以这是一个运行时间为 O(n2) 的简洁解决方案。

                      function findLongestPrefix($arr) {
                        return array_reduce($arr, function($prefix, $item) {
                          $length = min(strlen($prefix), strlen($item));
                          while (substr($prefix, 0, $length) !== substr($item, 0, $length)) {
                            $length--;
                          }
                          return substr($prefix, 0, $length);
                        }, $arr[0]);
                      }
                      
                      print findLongestPrefix($sports); // Softball -
                      

                      【讨论】:

                        【解决方案16】:

                        对于它的价值,这是我想出的另一种选择。

                        我用它来查找产品代码列表的通用前缀(即,在多个产品 SKU 的开头具有通用系列字符的情况下):

                        /**
                         * Try to find a common prefix for a list of strings
                         * 
                         * @param array $strings
                         * @return string
                         */
                        function findCommonPrefix(array $strings)
                        {
                            $prefix = '';
                            $chars = array_map("str_split", $strings);
                            $matches = call_user_func_array("array_intersect_assoc", $chars);
                            if ($matches) {
                                $i = 0;
                                foreach ($matches as $key => $value) {
                                    if ($key != $i) {
                                        unset($matches[$key]);
                                    }
                                    $i++;
                                }
                                $prefix = join('', $matches);
                            }
                        
                            return $prefix;
                        }
                        

                        【讨论】:

                          【解决方案17】:

                          这是对@Gumbo 答案的补充。如果要确保所选的公共前缀不会破坏单词,请使用它。我只是让它在所选字符串的末尾寻找一个空格。如果存在,我们知道所有短语都有更多内容,因此我们将其截断。

                          function product_name_intersection($array){
                          
                              $pl = 0; // common prefix length
                              $n = count($array);
                              $l = strlen($array[0]);
                              $first = current($array);
                          
                              while ($pl < $l) {
                                  $c = $array[0][$pl];
                                  for ($i=1; $i<$n; $i++) {
                                      if (!isset($array[$i][$pl]) || $array[$i][$pl] !== $c) break 2;
                                  }
                                  $pl++;
                              }
                              $prefix = substr($array[0], 0, $pl);
                          
                              if ($pl < strlen($first) && substr($prefix, -1, 1) != ' ') {
                          
                                  $prefix = preg_replace('/\W\w+\s*(\W*)$/', '$1', $prefix);
                              }
                          
                              $prefix =  preg_replace('/^\W*(.+?)\W*$/', '$1', $prefix);
                          
                              return $prefix;
                          }
                          

                          【讨论】:

                            【解决方案18】:

                            为这个问题分享一个 Typescript 解决方案。我将它分为两​​种方法,只是为了在使用时保持清洁。

                            function longestCommonPrefix(strs: string[]): string {
                                let output = '';
                                if(strs.length > 0) {
                                    output = strs[0];
                                    if(strs.length > 1) {
                                        for(let i=1; i <strs.length; i++) {
                                            output = checkCommonPrefix(output, strs[i]);
                                        }
                                    }
                                }  
                                return output;
                            };
                                
                            function checkCommonPrefix(str1: string, str2: string): string {
                                let output = '';
                                let len = Math.min(str1.length, str2.length);
                                let i = 0;
                                while(i < len) {
                                    if(str1[i] === str2[i]) {
                                        output += str1[i];
                                    } else {
                                        i = len;
                                    }
                                    i++;
                                }
                                return output;
                            }
                            

                            【讨论】:

                              猜你喜欢
                              • 1970-01-01
                              • 2020-11-13
                              • 2021-02-14
                              • 2017-08-03
                              • 2021-01-13
                              • 2020-12-12
                              • 2011-10-01
                              • 2016-01-02
                              • 1970-01-01
                              相关资源
                              最近更新 更多