在 Common Lisp 中逐行读取文件（低内存）答案

【问题标题】：reading from file line by line (low-memory) in Common Lisp在 Common Lisp 中逐行读取文件（低内存）
【发布时间】：2019-05-01 20:01:18
【问题描述】：

我正在寻找一种方法来一次从多个文件中读取 1 个 s 表达式（数据列表）。

问题是文件很大——数百兆字节或千兆字节。而且我需要 RAM 来进行计算。

对于输出文件，

(defun add-to-file (process-result file-path)
  (with-open-file (os file-path :direction :output
                                :if-exists :append
                                :if-does-not-exist :create)
    (print process-result os)))

可以很好地逐行追加结果字符串或 s 表达式。（我不知道——也许这不是最有效的方法？）。

前段时间，我要求一个宏，它可以使用with-open-file 打开任意数量的文件，并且我可以从正文访问我可以创建和提供其流变量的所有文件。然而，由于打开的输入文件和输出文件的数量是可变的，也许设计更容易用这样的调用者调用每个文件 - 打开它们 - 到达正确的位置 - 写入或读取 - 然后再次关闭它，我想。

对于输出，给定的函数完成这项工作。但是，对于输入，我希望有一个函数，每次我调用它时，它都会读取下一个 lisp 表达式（s 表达式）并且有一种内存，它在文件中最后一次读取并且每次我调用它- 重新打开文件并知道在哪里读取 - 并返回值 - 下次读取并返回下一个值等。类似于迭代器上的 Python 生成器 - 生成序列中的下一个值。

我想逐个处理 - 读入 - 文件表达式 - 以减少内存使用量。

你会如何攻击这样的任务？还是您有一个好的策略？

【问题讨论】：

打开（和关闭）文件是一项相对昂贵的操作。只需打开您需要的文件，处理它们，完成后关闭即可。
谢谢@sds - 你回答了'with-open-files'宏。是的，也许没有别的办法，只能用这个宏打开所有文件，然后继续……我只是好奇是否还有其他好的解决方案。
一个流已经在存储必要的状态：(defvar *stream* (open #P"/tmp/input"))，并且任何时候你需要新的数据，(read *stream*)。如果更新文件，则可以读取新内容。
@coredump 谢谢！非常真实。而且效率会更高。
(defun read-or-nil (stream) "Returns read. If second value is NIL, end of file is reached." (handler-case (values (read stream) T) (system::simple-end-of-file (c) (values nil nil))))

标签： file-io closures common-lisp generator state

【解决方案1】：

草图：

创建一个结构或类，存储最后读取的位置。

(defstruct myfile
  path
  (last-position 0))

(defmethod next-expression ((mf myfile))
  (with-open-file (s (myfile-path mf) :direction :input)
    (file-position s (myfile-last-position mf))
    (prog1
        (read s)
      (setf (myfile-last-position mf) (file-position s)))))

使用示例：

(defparameter *mf1* (make-myfile :path (pathname "/foo/bar.sexp")))

(print (next-expression *mf1*)) ;; get first s-expr from file
;; do sth else
(myfile-last-position *mf1*)  ;; check current position
;; do sth else
(print (next-expression *mf1*)) ;; gives next s-expr from file

然后编写一个方法来检查是否有新的 s-expression 可用。等等。

【讨论】：

谢谢@Rainer Joswig！我会试试这个！
谢谢！它工作得很好！非常优雅的解决方案！