在没有库函数的 Sml 中查找字符串是否是另一个字符串的子字符串答案

【问题标题】：Finding if a string is a substring of another in Sml without library functions在没有库函数的 Sml 中查找字符串是否是另一个字符串的子字符串
【发布时间】：2019-08-02 11:46:34
【问题描述】：

我正在尝试编写一个函数 subString : string * string -> int 检查第一个字符串是否是第二个字符串的子字符串并且区分大小写。

如果第一个字符串是子字符串，我想返回从 0 开始的索引，如果不是，则返回 -1。如果它出现多次，则返回第一次出现的索引。

例如：

subString("bc","abcabc") ===>1
subString("aaa","aaaa") ===>0
subString("bc","ABC") ===>-1

我在思考这个问题时遇到了很多麻烦，因为我不太熟悉 sml 或在 sml 中使用字符串，而且我不应该使用像 String.sub 这样的任何内置函数。

不过我可以使用辅助函数。

我能想到的只是在辅助函数中以某种方式使用 explode 并以某种方式检查列表然后将它们内爆，但是我如何获得索引位置？

我只有

fun subString(s1,s2) =
     if null s2 then ~1
     else if s1 = s2 then 0
     else 1+subString(s1, tl s2);

我正在考虑使用一个辅助函数来分解字符串，然后可能会比较两者，但我不知道如何让它工作。

【问题讨论】：

标签： string recursion substring sml ml

【解决方案1】：

这已经是一个很好的开始了，但是还有一些小问题：

在您的递归情况下，您将 1 添加到递归结果，即使递归应用程序没有找到子字符串并返回 -1。在加 1 之前，您应该检查结果是否为 -1。

在第二行中，您检查两个字符串是否相等。如果你这样做，你只会在字符串以那个子字符串结尾时找到一个子字符串。所以在第 2 行你真正想做的是测试 s2 是否以 s1 开头。我建议您编写一个执行该测试的辅助函数。对于这个辅助函数，您确实可以使用explode，然后递归检查列表的第一个字符是否相同。一旦你有了这个辅助函数，就在第 2 行使用它而不是相等测试。

【讨论】：

【解决方案2】：

我不应该使用任何内置函数，例如String.sub

真可惜！由于字符串有一个抽象接口，而列表可以直接访问其主要构造函数[] 和::，因此您必须使用库函数来获取 anywhere字符串。 explode 也是一个库函数。但是好吧，如果你的限制是你必须将你的字符串转换成一个列表来解决这个练习，那就这样吧。

鉴于您当前的代码，

fun subString(s1,s2) =
     if null s2 then ~1
     else if s1 = s2 then 0
     else 1+subString(s1, tl s2);

我觉得这里有一个问题：

   subString ([#"b",#"c"], [#"a",#"b",#"c",#"d"])
~> if null ([#"a",#"b",#"c",#"d"]) then ... else
   if [#"b",#"c"] = [#"a",#"b",#"c",#"d"] then ... else
   1 + subString([#"b",#"c"], [#"b",#"c",#"d"])

~> 1 + subString([#"b",#"c"], [#"b",#"c",#"d"])
~> 1 + if null ([#"b",#"c",#"d"]) then ... else
       if [#"b",#"c"] = [#"b",#"c",#"d"] then ... else
       1 + subString([#"b",#"c"], [#"c",#"d"])

检查s1 = s2 似乎还不够：我们应该喜欢说[#"b",#"c"] 是[#"b",#"c",#"d"] 的子字符串，因为它是它的前缀，而不是因为它是等价的。使用s1 = s2，您最终会检查某些内容是有效的后缀，而不是有效的子字符串。所以你需要把s1 = s2改成更聪明的东西。

也许您可以构建一个辅助函数来确定一个列表是否是另一个列表的前缀并在这里使用它？

至于通过explode将你的字符串放入列表来解决这个练习：这是非常低效的，以至于标准ML的姊妹语言Ocaml从库中获得了explodeentirely removed：

函数 explode 和 implode 在旧版本的 Caml 中，但我们从 OCaml 中省略了它们，因为它们鼓励低效的代码。将字符串视为字符列表通常是个坏主意，而将其视为字符数组更适合实际实现。

首先，String.isSubstring already exists，所以这是一个已解决的问题。但是，如果不是，并且想要以组合方式编写此内容，并且String.sub 不是作弊（它正在访问字符串中的字符，类似于通过x::xs 匹配列表的头部和尾部的模式），那么让我鼓励您编写高效、可组合且功能强大的代码：

(* Check that a predicate holds for all (c, i) of s, where
 * s is a string, c is every character in that string, and
 * i is the position of c in s. *)
fun alli s p =
    let val stop = String.size s
        fun go i = i = stop orelse p (String.sub (s, i), i) andalso go (i + 1)
    in go 0 end

(* needle is a prefix of haystack from the start'th index *)
fun isPrefixFrom (needle, haystack, start) =
    String.size needle + start <= String.size haystack andalso
    alli needle (fn (c, i) => String.sub (haystack, i + start) = c)

(* needle is a prefix of haystack if it is from the 0th index *)
fun isPrefix (needle, haystack) =
    isPrefixFrom (needle, haystack, 0)

(* needle is a substring of haystack if is a prefix from any index *)
fun isSubstring (needle, haystack) =
    let fun go i =
            String.size needle + i <= String.size haystack andalso
            (isPrefixFrom (needle, haystack, i) orelse go (i + 1))
    in go 0 end

这里的总体思路，在构建使用列表递归而不是字符串索引递归的isSubstring 时可以重复使用，是抽象地构建算法：needle 可以定义为haystack 的子字符串简单来说，needle 是haystack 的前缀，从haystack 中的任何有效位置开始计算（当然不会超过haystack）。确定某事物是否是前缀要容易得多，使用列表递归更容易！

这个建议会给你留下一个模板，

fun isPrefix ([], _) = ...
  | isPrefix (_, []) = ...
  | isPrefix (x::xs, y::ys) = ...

fun isSubstring ([], _) = ...
  | isSubstring (xs, ys) = ... isPrefix ... orelse ...

至于优化字符串索引递归解决方案，您可以通过使isPrefixFrom 成为只有isPrefix 和isSubstring 可以访问的本地函数来避免isPrefixFrom 和isSubstring 中的双重边界检查；否则就不安全了。

对此进行测试，

- isSubstring ("bc", "bc");
> val it = true : bool
- isSubstring ("bc", "bcd");
> val it = true : bool
- isSubstring ("bc", "abc");
> val it = true : bool
- isSubstring ("bc", "abcd");
> val it = true : bool
- isSubstring ("bc", "");
> val it = false : bool

【讨论】：