将 mongo 请求添加到文件中并存档此文件答案

【问题标题】：Add a mongo request into a file and archive this file将 mongo 请求添加到文件中并存档此文件
【发布时间】：2015-10-30 23:50:22
【问题描述】：

我在尝试将流与 MongoDB 请求一起使用时遇到了一些麻烦。我想：

从集合中获取结果
将此结果放入文件中
将此文件放入 CSV 文件中

我使用archiver package 进行文件压缩。该文件包含 csv 格式的值，因此对于每一行，我必须以 CSV 格式解析它们。

我的函数带有一个 res（输出）参数，这意味着我可以直接将结果发送给客户端。目前，我可以将此结果放入没有流的文件中。我想我会遇到大量数据的内存问题，这就是我想使用流的原因。

这是我的代码（没有流）

function getCSV(res,query) {

  <dbRequest>.toArray(function(err,docs){
    var csv = '';
    if(docs !== null){
        for(var i = 0; i< docs.length; i++){
            var line = '';

            for(var index in docs[i]){
                if(docs[i].hasOwnProperty(index) && (index !== '_id' ) ){

                    if(line !== '') line+= ',';

                    line += docs[i][index];
                }
            }

                console.log("line",line);

            csv += line += '\r\n';
        }

    }
}.bind(this));

    fileManager.addToFile(csv);

    archiver.initialize();
    archiver.addToArchive(fileManager.getName());
    fileManager.deleteFile();

    archiver.sendToClient(res);
};

完成 csv 后，我将其保存到带有 Filemanager 对象的文件中。后者处理文件创建和操作。 addToArchive 方法将文件添加到当前存档中，sendToClient 方法通过输出发送存档（res 参数为函数）。

我正在使用 Express.js，所以我通过服务器请求调用此方法。

有时文件包含数据，有时它是空的，你能解释一下为什么吗？我想了解流的工作原理，如何在我的代码中实现它？

问候

【问题讨论】：

标签： node.js mongodb express

【解决方案1】：

我不太确定为什么有时会出现数据显示问题，但这里有一种通过流发送数据的方法。代码前的几点信息：

.stream({transform: someFunction})

从数据库中获取文档流，并在每个文档通过流时对其进行任何您想要的数据操作。我将此函数放入一个闭包中，以便更轻松地保留列标题，并允许您从文档中挑选哪些键用作列。这将允许您在不同的集合上使用它。

这是在每个文档通过时运行的函数：

// this is a closure containing knowledge of the keys you want to use,
// as well as whether or not to add the headers before the current line
function createTransformFunction(keys) {

    var hasHeaders = false;

    // this is the function that is run on each document 
    // as it passes through the stream
    return function(document) {

        var values = [];
        var line;

        keys.forEach(function(key) {

            // explicitly use 'undefined'. 
            // if using !key, the number 0 would get replaced
            if (document[key] !== "undefined") {
                values.push(document[key]);
            }
            else {
                values.push("");
            }
        });

        // add the column headers only on the first document
        if (!hasHeaders) {
            line = keys.join(",") + "\r\n";
            line += values.join(",");
            hasHeaders = true;
        }
        else {
            // add the line breaks at the beginning of each line 
            // to avoid having an extra line at the end
            line = "\r\n";
            line += values.join(",");
        }
        // return the document to the stream and move on to the next one
        return line;
    }
}

您将该函数传递给数据库流的转换选项。现在假设您有一组人的密钥_id, firstName, lastName：

function (req, res) {

    // create a transform function with the keys you want to keep
    var transformPerson = createTransformFunction(["firstName", "lastName"]);

    // Create the mongo read stream that uses your transform function
    var readStream = personCollection.find({}).stream({
        transform: transformPerson
    });

    // write stream to file
    var localWriteStream = fs.createWriteStream("./localFile.csv");
    readStream.pipe(localWriteStream);

    // write stream to download
    res.setHeader("content-type", "text/csv");
    res.setHeader("content-disposition", "attachment; filename=downloadFile.csv");
    readStream.pipe(res);
}

如果您点击此端点，您将在浏览器中触发下载并写入本地文件。我没有使用归档器，因为我认为它会增加一定程度的复杂性，并且会脱离实际发生的事情的概念。流都在那里，您只需要稍微摆弄一下就可以使用存档器。

【讨论】：

嗨，感谢您的解决方案，但文件的问题是我在内存方面受到限制。查询可能会产生多达 200 万行，并且文件太大而无法存储在磁盘中。这就是为什么我想使用归档器来压缩它。但是，我用 mongodb 做了一个流，但流永远不会结束。请参阅第 5 行的jsfiddle.net/w273a1zn