Haskell - 无法理解一小段代码答案

【问题标题】：Haskell - Having trouble understanding a small bit of codeHaskell - 无法理解一小段代码
【发布时间】：2026-01-25 16:15:02
【问题描述】：

我正在做一个学校任务，我得到了一些示例代码，以后可以使用。我理解这段代码的 90%，但是有一个小行/函数我一生都无法弄清楚它的作用（顺便说一句，我对 Haskell 很陌生）。

示例代码：

data Profile = Profile {matrix::[[(Char,Int)]], moleType::SeqType, nrOfSeqs::Int, nm::String} deriving (Show)

nucleotides = "ACGT"
aminoacids = sort "ARNDCEQGHILKMFPSTWYVX"

makeProfileMatrix :: [MolSeq] -> [[(Char, Int)]]
makeProfileMatrix [] = error "Empty sequence list"
makeProfileMatrix sl = res
  where 
    t = seqType (head sl)
    defaults = 
      if (t == DNA) then
        zip nucleotides (replicate (length nucleotides) 0) -- Row 1
      else 
        zip aminoacids (replicate (length aminoacids) 0)   -- Row 2
    strs = map seqSequence sl                              -- Row 3
    tmp1 = map (map (\x -> ((head x), (length x))) . group . sort)
               (transpose strs)                            -- Row 4
    equalFst a b = (fst a) == (fst b)
    res = map sort (map (\l -> unionBy equalFst l defaults) tmp1)

{-Row 1: 'replicate' creates a list of zeros that is equal to the length of the 'nucleotides' string. 
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the nucleotides-}

{-Row 2: 'replicate' creates a list of zeros that is equal to the length of the 'aminoacids' string.
This list is then 'zipped' (combines each element in each list into pairs/tuples) with the aminoacids-}

{-Row 3: The function 'seqSequence' is applied to each element in the 'sl' list and then returns a new altered list. 
In other words 'strs' becomes a list that contains the all the sequences in 'sl' (sl contains MolSeq objects, not strings)-}

{-Row 4: (transpose strs) creates a list that has each 'column' of sequences as a element (the first element is made up of each first element in each sequence etc.).
--}

我已经为代码中每个标记的行写了一个解释（我认为到目前为止是正确的）但是当我试图弄清楚第 4 行的作用时我被卡住了。我理解“转置”位，但我根本无法弄清楚内部映射函数的作用。据我所知，“地图”函数需要一个列表作为第二个参数才能运行，但内部地图函数只有一个匿名函数，但没有可操作的列表。非常清楚，我不明白整个内部行 map (\x -> ((head x), (length x))) . group . sort 做了什么。请帮忙！

奖金！：

这是另一段我想不通的示例代码（从未使用过 Haskell 中的类）：

class Evol object where
 name :: object -> String
 distance :: object -> object -> Double
 distanceMatrix :: [object] -> [(String, String, Double)]
 addRow :: [object] -> Int -> [(String, String, Double)]
 distanceMatrix [] = []
 distanceMatrix object =
  addRow object 0 ++ distanceMatrix (tail object)
 addRow object num  -- Adds row to distance matrix
  | num < length object = (name a, name b, distance a b) : addRow object (num + 1)
  | otherwise = [] 
  where  
        a = head object
        b = object !! num


 -- Determines the name and distance of an instance of "Evol" if the instance is a "MolSeq".
instance Evol MolSeq where
 name = seqName
 distance = seqDistance

 -- Determines the name and distance of an instance of "Evol" if the instance is a "Profile".
instance Evol Profile where
 name = profileName
 distance = profileDistance

尤其是这部分：

addRow object num  -- Adds row to distance matrix
  | num < length object = (name a, name b, distance a b) : addRow object (num + 1)
  | otherwise = [] 
  where  
        a = head object
        b = object !! num

如果你不想解释这个，你不必解释我只是对“addRow”实际上试图做什么有点困惑（详细）。

谢谢！

【问题讨论】：

很酷，您对 Haskell 感兴趣，但最好将“奖金”部分提取到一个单独的问题中，因为这两个问题彼此无关。

标签： list class dictionary haskell anonymous-function

【解决方案1】：

map (\x -> (head x, length x)) . group . sort 是一种生成直方图的惯用方式。当您看到类似的内容不理解时，请尝试将其分解为更小的部分并在样本输入上进行测试：

(\x -> (head x, length x)) "AAAA"
-- ('A', 4)

(group . sort) "CABABA"
-- ["AAA", "BB", "C"]

(map (\x -> (head x, length x)) . group . sort) "CABABA"
map (\x -> (head x, length x)) (group (sort "CABABA"))
-- [('A', 3), ('B', 2), ('C', 1)]

它以 point-free 样式编写为 3 个函数的组合，map (…)、group 和 sort，但也可以编写为 lambda：

\row -> map (…) (group (sort row))

对于转置矩阵中的每一行，它会生成该行中数据的直方图。通过格式化并打印出来，您可以得到更直观的表示：

let
  showHistogramRow row = concat
    [ show $ head row
    , ":\t"
    , replicate (length row) '#'
    ]
  input = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]

putStr
  $ unlines
  $ map showHistogramRow
  $ group
  $ sort input

-- 1:   ##
-- 2:   #
-- 3:   ##
-- 4:   #
-- 5:   ###
-- 6:   #
-- 9:   #

至于这个：

addRow object num  -- Adds row to distance matrix
  | num < length object = (name a, name b, distance a b) : addRow object (num + 1)
  | otherwise = [] 
  where  
        a = head object
        b = object !! num

addRow 列出了从object 中的第一个元素到其他每个元素的距离。它以一种不明显的方式使用对列表的索引，而更简单、更惯用的map 就足够了：

addRow object = map (\ b -> (name a, name b, distance a b)) object
  where a = head object

通常最好避免部分函数，例如head，因为它们可能会在某些输入上引发异常（例如head []）。不过这里没问题，因为如果输入列表为空，则永远不会使用a，因此永远不会调用head。

distanceMatrix 也可以用map 表示，因为它只是在列表的所有tails 上调用一个函数（addRow）并将它们与++ 连接在一起：

distanceMatrix object = concatMap addRow (tails object)

这也可以写成无点风格。 \x -> f (g x) 可以写成f . g；这里，f 是 concatMap addRow 和 g 是 tails：

distanceMatrix = concatMap addRow . tails

Evol 仅描述了可以为其生成distanceMatrix 的类型集，包括MolSeq 和Profile。请注意，addRow 和 distanceMatrix 不需要是此类的成员，因为它们完全是根据 name 和 distance 实现的，因此您可以将它们移至顶层：

distanceMatrix :: (Evol object) => [object] -> [(String, String, Double)]
distanceMatrix = concatMap addRow . tails

addRow :: (Evol object) => [object] -> Int -> [(String, String, Double)]
addRow object = map (\ b -> (name a, name b, distance a b)) object
  where a = head object

【讨论】：

感谢您如此详细的回答！现在我明白了！ :)

【解决方案2】：

内部map函数只有一个匿名函数，没有列表可以操作

假设有一个f 类型为a -> b -> c 的函数，它接受两个参数并返回一个c 类型的值。如果使用一个参数调用f，它将返回另一个b -> c 类型的函数，该函数将再接收一个参数并返回一个值。这称为柯里化。

这一行：

map (map (\x -> ((head x), (length x))) . group . sort) (transpose strs)

可以转化为：

map (\str -> (map (\x -> ((head x), (length x))) . group . sort) str)(transpose strs)

在这种形式中，它可能会被清除，实际上有一个列表可以操作。

这个函数

(map (\x -> ((head x), (length x))) . group . sort)

只是sort、group 和map (\x -> ((head x), (length x))) 的组合。

让我们看看它在[2,1,1,1,4] 上是如何工作的：

sort [2, 1, 1, 1, 4] => [1, 1, 1, 2, 4]

group [1, 1, 1, 2, 4] => [[1,1,1],[2],[4]]

map (\x -> ((head x), (length x))) => [(1,3),(2,1),(4,1)]

它只返回一个元组列表。每个元组都包含一个元素作为第一个元素，出现次数作为第二个元素。

【讨论】：