字符串和数组的集合差异答案

【问题标题】：Set-Difference for strings and arrays字符串和数组的集合差异
【发布时间】：2014-08-11 00:05:48
【问题描述】：

set-difference 用作过滤器功能，但仅适用于列表。数组和字符串是怎么回事？这些类型的数据是否有类似的功能？如果没有这样的功能，实现它们的正确方法是什么？

现在我使用这个宏来处理任何序列作为一个列表（有时它很有用）：

(defmacro treat-as-lists (vars &body body)
  (let ((type (gensym)))
    `(let ((,type (etypecase ,(car vars)
                    (string 'string)
                    (vector 'vector)
                    (list 'list)))
           ,@(mapcar (lambda (x) `(,x (coerce ,x 'list)))
                 vars))
       (coerce (progn ,@body) ,type))))

我的filter：

(defun filter (what where &key key (test #'eql))
  (treat-as-lists (what where)
    (set-difference where what :key key :test test)))

例子：

CL-USER> (filter "cat" "can you take this cat away?")
"n you ke his  wy?"
CL-USER> (filter #(0 1) #(1 5 0 1 9 8 3 0))
#(5 9 8 3)

【问题讨论】：

数组和字符串的标准中没有类似的函数，但正如您所发现的，您可以定义自己的函数。请注意，这些强制使数组和字符串的使用毫无意义。
我不喜欢对列表的强制，因为它涉及大量的 consing 只是为了能够表达集合差异的想法。

标签： arrays string lisp filtering common-lisp

【解决方案1】：

由于编写适用于所有序列类型的函数通常意味着为列表和向量编写单独的版本，因此值得尽可能使用对序列进行操作的标准函数。在这种情况下，我们可以使用position 和remove-if。我已经颠倒了你的参数的顺序，以使这个序列差异更像是从第一个参数中减去第二个参数的集合差异。

(defun sequence-difference (seq1 seq2 &key (start1 0) end1 (start2 0) end2
                                           key (key1 key) (key2 key)
                                           test test-not)
  "Returns a new sequence of the same type of seq1 that contains the
elements of the subsequence of seq1 designated by start1 and end1, and
in the same order, except for those that appear in the subsequence of
seq2 designated by start2 and end2. Test and test-not are used in the
usual way to elements produced by applying key1 (which defaults to
key) to elements from seq1 and by applying key2 (which defaults to
key) to elements from seq2."
  (flet ((in-seq2 (x)
           (not (null (position x seq2
                                :start start2 :end end2
                                :key key2
                                :test test :test-not test-not)))))
    (remove-if #'in-seq2 
               (subseq seq1 start1 end1)
               :key key1)))

(sequence-difference "can you take this cat away?" #(#\c #\a #\t))
;=> "n you ke his  wy?"

(sequence-difference "can you take this cat away?" #(#\c #\a #\t) :start1 3 :start2 1)
" you ke his c wy?"

请注意，该标准还包括find，它适用于任意序列，但 find 返回“序列的元素，或 nil”。如果 nil 是序列的成员，这会导致歧义。另一方面，位置返回一个索引（它将是一个数字，因此不是 nil）或 null，因此我们可以可靠地确定一个元素是否是序列中的 a。

这里有一个重要的区别，那就是您总是在此处获得一份副本。这样做的原因是主观的：由于序列函数通常采用开始和结束索引参数，因此最好在此处包含该功能。但是，如果我们要求(sequence-difference "foobar" "boa" :start1 2)，那么我们想从“foobar”的子序列“obar”中删除字符 b、o 和 a。我们应该返回什么？ “for”还是“r”？也就是说，我们是否包括索引之外的 seq1 部分？在这个解决方案中，我决定不这样做，因此我正在做(remove-if … (subseq seq1 …) …)，并且 subseq 总是制作一个副本。另一方面，如果合适，Set-difference 可能会返回其 list-1 或 list-2 参数。此实现通常不会返回 seq1 或 seq2，除非在某些病态情况下（例如，空列表）。

【讨论】：

如果seq2 变得足够大（超过几十个元素），可能值得对其进行预处理并使用更好的缩放内部函数。这种优化应该根据参数的长度来实现（但不要在列表中走得太远......）。
这很好。我认为像 remove-duplicates 和 set-difference 这样的一些实现实际上会在减数中创建元素的散列。这仅在 test 是四个标准哈希测试之一时才有效，但它会涵盖默认情况。删除重复项至少可以减少线性搜索时间。