如何在 Haskell 中拆分字符串？答案

【问题标题】：How to split a string in Haskell?如何在 Haskell 中拆分字符串？
【发布时间】：2011-06-26 02:25:56
【问题描述】：

在 Haskell 中是否有分割字符串的标准方法？

lines 和 words 在空格或换行符上分割效果很好，但肯定有标准的逗号分割方法吗？

我在 Hoogle 上找不到。

具体来说，我正在寻找split "," "my,comma,separated,list" 返回["my","comma","separated","list"] 的东西。

【问题讨论】：

我真的很想在Data.List 甚至Prelude 的未来版本中加入这样的功能。如果代码高尔夫球不可用，它是如此常见和讨厌。

标签： string haskell

【解决方案1】：

记住你可以查一下 Prelude 函数的定义！

http://www.haskell.org/onlinereport/standard-prelude.html

看那里，words的定义是，

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

因此，将其更改为接受谓词的函数：

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

然后用你想要的任何谓词调用它！

main = print $ wordsWhen (==',') "break,this,string,at,commas"

【讨论】：

【解决方案2】：

有一个名为split的包。

cabal install split

像这样使用它：

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

它带有许多其他功能，用于在匹配的分隔符上进行拆分或具有多个分隔符。

【讨论】：

酷。我不知道这个包裹。这是 the 最终的拆分包，因为它提供了对操作的很多控制（修剪结果中的空间、在结果中保留分隔符、删除连续的分隔符等...）。拆分列表的方法有很多，不可能有一个split 函数来满足所有需求，你真的需要那种包。
否则，如果可以接受外部包，MissingH 还提供了拆分功能：hackage.haskell.org/packages/archive/MissingH/1.2.0.0/doc/html/… 该包还提供了许多其他“不错的”功能，我发现相当多的包都依赖它.
拆分包现在是最新版本的 haskell 平台的一部分。
导入 Data.List.Split (splitOn) 并前往城镇。 splitOn :: Eq a => [a] -> [a] -> [[a]]
@RussAbbott 拆分包在您下载时包含在 Haskell 平台中 (haskell.org/platform/contents.html)，但在构建项目时不会自动加载。将 split 添加到 cabal 文件中的 build-depends 列表中，例如如果您的项目名为 hello，那么在 hello.cabal 文件中的 executable hello 行下方放置一行类似 `build-depends: base, split` 的行（注意两个空格缩进）。然后使用cabal build 命令构建。参照。 haskell.org/cabal/users-guide/…

【解决方案3】：

如果你使用Data.Text，还有splitOn：

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

这是在 Haskell 平台中构建的。

例如：

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

或：

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"

【讨论】：

@RussAbbott 可能您需要依赖于 text 包或安装它。不过属于另一个问题。
无法将类型“T.Text”与“Char”匹配预期类型：[Char] 实际类型：[T.Text]

【解决方案4】：

在模块Text.Regex（Haskell平台的一部分）中，有一个函数：

splitRegex :: Regex -> String -> [String]

根据正则表达式拆分字符串。 API 可以在Hackage 找到。

【讨论】：

Could not find module ‘Text.Regex’ Perhaps you meant Text.Read (from base-4.10.1.0)

【解决方案5】：

使用Data.List.Split，它使用split：

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]

【讨论】：

【解决方案6】：

试试这个：

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

仅适用于单个字符，但应易于扩展。

【讨论】：

【解决方案7】：

在不导入任何内容的情况下，直接用一个字符替换空格，words 的目标分隔符是一个空格。比如：

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

或

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

你可以把它变成一个带参数的函数。您可以消除参数 character-to-match 我的匹配很多，如：

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]

【讨论】：

这不区分新添加的空格和原来的空格，因此对于"my,comma separated,list"，它将看到 4 个部分而不是预期的 3 个。
@Yuri Kovalenko words 确实；试试words [if c == ',' then ' ' else c|c <- "my, comma, separated, list "]

【解决方案8】：

split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

例如

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

将删除单个尾随分隔符：

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]

【讨论】：

【解决方案9】：

我昨天开始学习 Haskell，如果我错了，请纠正我，但是：

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

给予：

*Main> split ' ' "this is a test"
["this","is","a","test"]

也许你想要

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

应该是：

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)

【讨论】：

我一直在寻找一个内置的split，被具有完善库的语言所宠坏。不过还是谢谢。
你在 6 月份写了这篇文章，所以我假设你已经在你的旅程中继续前进了 :) 作为练习，尝试在没有反向或长度的情况下重写这个函数，因为使用这些函数会导致算法复杂性损失，并且还可以防止应用到无限列表。玩得开心！

【解决方案10】：

我觉得这更容易理解：

split :: Char -> String -> [String]
split c xs = case break (==c) xs of 
  (ls, "") -> [ls]
  (ls, x:rs) -> ls : split c rs

【讨论】：

【解决方案11】：

我不知道如何对史蒂夫的回答添加评论，但我想推荐
GHC libraries documentation,
并在那里特别是
Sublist functions in Data.List

这比阅读简单的 Haskell 报告要好得多。

一般来说，带有规则的折叠规则也可以解决这个问题。

【讨论】：

【解决方案12】：

ghci 中的示例：

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]

【讨论】：

请不要使用正则表达式来分割字符串。谢谢。
@kirelagin，为什么这个评论？我正在学习 Haskell，我想知道您评论背后的原因。
@Andrey，为什么我什至无法在我的ghci 中运行第一行？
@EnricoMariaDeAngelis 正则表达式是一个强大的字符串匹配工具。当你匹配一些不平凡的东西时使用它们是有意义的。如果您只想将字符串拆分为像另一个固定字符串一样微不足道的内容，则绝对不需要使用正则表达式 - 它只会使代码更复杂，并且可能更慢。
"请不要使用正则表达式来拆分字符串。"卧槽，为什么不？？？用正则表达式拆分字符串是一件非常合理的事情。有很多琐碎的情况需要拆分字符串，但分隔符并不总是完全相同。

【解决方案13】：

除了答案中给出的高效和预先构建的函数之外，我将添加自己的函数，这些函数只是我编写的 Haskell 函数库的一部分，用于在自己的时间学习语言：

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

这些解决方案至少是尾递归的，因此它们不会导致堆栈溢出。

【讨论】：

【解决方案14】：

我迟到了，但如果您正在寻找一个不依赖任何臃肿包的简单解决方案，我想在这里为感兴趣的人添加它：

split :: String -> String -> [String]
split _ "" = []
split delim str =
  split' "" str []
  where
    dl = length delim

    split' :: String -> String -> [String] -> [String]
    split' h t f
      | dl > length t = f ++ [h ++ t]
      | delim == take dl t = split' "" (drop dl t) (f ++ [h])
      | otherwise = split' (h ++ take 1 t) (drop 1 t) f

【讨论】：

哦，来吧...最终重要的不是成千上万的人喜欢某样东西。我不是强迫你使用它。它只适用于那些感兴趣的人。听起来你不是他们中的一员。
你说“喜欢”——我说“经过实战考验”。如果你喜欢分享它很好。我的问题是标准的方法，这已经得到了回答。
Haskell 没有开箱即用的拆分功能。记得你问过一个函数，它用字符串 (String -> String -> [String]) 分割字符串，而不是按字符 (Char->String->[String])。您必须安装 split 软件包，这也不是标准方式。安装split 包还会包含一堆冗余功能。你只要求一个split 函数，我把它给了你，没有更多。