在 C# 中将 MongoDB 文档导出为 CSV答案

【问题标题】：Exporting MongoDB Documents to CSV in C#在 C# 中将 MongoDB 文档导出为 CSV
【发布时间】：2018-07-31 02:45:52
【问题描述】：

我想使用 C# 从 MongoDB.Driver 的 IMongoCollection 的项目中导出 CSV 表。

我怎样才能有效地做到这一点？我正在考虑通过从集合中检索文档并将它们转换为类似 JSON 的格式或使用 StringBuilder 创建 CSV 文件，并使用 PropertyInfo 数组来访问检索到的对象的字段。

有人可以举个例子说明我如何做到这一点吗？

【问题讨论】：

IMongoCollection 真的应该有一个导出方法..
集合in中究竟有哪些元素？获取列名并将它们用作 CSV 标题和列行并将它们写为附加的 CSV 行，匹配它们各自的标题列应该是一件简单的事情。
@Nyerguds 元素是具有各种简单类型属性的类的对象。目标是让用户能够选择他想要包含在 CSV 中的字段。
好吧，如果你能以某种方式获取列名，我会说，正如你自己建议的那样，首先使用 PropertyInfo 获取每个属性的值，然后为每个对象创建一个具有这些属性的数组。您可以使用简单的List<String[]> 结束列名并转换为 CSV。但是，请自己尝试一下，然后发布一些代码。

标签： c# mongodb csv export-to-csv

【解决方案1】：

似乎显而易见的方法是以某种方式获取所有标题数据（请参见下文），然后遍历集合，如果您要手动编写（人们不鼓励），字符串构建，写入文件分批（如果您的收藏量很大）。

HashSet<string> fields = new HashSet<string>();
BsonDocument query = BsonDocument.Parse(filter);
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());

// Populate fields with all unique fields, see below for examples how.

var csv = new StringBuilder();
string headerLine = string.Join(",", fields);
csv.AppendLine(headerLine);

foreach (var element in result.ToListAsync().Result)
{
    string line = null;
    foreach (var field in fields)
    {
        BsonValue value;
        if (field.Contains("."))
        {
            value = GetNestedField(element, field);
        }
        else
        {
            value = element.GetElement(field).Value;
        }

        // Example deserialize to string
        switch (value.BsonType)
        {
            case BsonType.ObjectId:
                line = line + value.ToString();
                break;
            case BsonType.String:
                line = line + value.ToString();
                break;
            case BsonType.Int32:
                line = line + value.AsInt32.ToString();
                break;
        }
        line = line + ",";
    }
    csv.AppendLine(line);
}
File.WriteAllText("D:\\temp.csv", csv.ToString());

对于您自己的对象，您必须使用自己的反序列化器。

但是如果可以的话，我建议使用 mongoexport 工具。您可以简单地从您的应用程序中运行 exe，并根据需要输入参数。但请记住，它需要显式字段。

ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName = "C:\mongodb\bin\mongoexport.exe";
startInfo.Arguments = "-d testDB -c testCollection --type csv --fields name,address.street,address.zipCode --out .\output.csv";
startInfo.UseShellExecute = false;

Process exportProcess= new Process();
exportProcess.StartInfo = startInfo;

exportProcess.Start();
exportProcess.WaitForExit();

有关 mongoexport 的更多信息，例如分页、附加查询和字段文件： https://docs.mongodb.com/manual/reference/program/mongoexport/

获取唯一的字段名称

为了找到所有字段名称，您可以通过多种方式执行此操作。使用 BsonDocument 作为通用数据示例。

递归遍历您的 IMongoCollection 结果。这必须贯穿整个集合，因此性能可能不会很好。

例子：

HashSet<string> fields = new HashSet<string>();
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
foreach (var element in result.ToListAsync().Result)
{
    ProcessTree(fields, element, "");
}

private void ProcessTree(HashSet<string> fields, BsonDocument tree, string parentField)
{
    foreach (var field in tree)
    {
        string fieldName = field.Name;
        if (parentField != "")
        {
                fieldName = parentField + "." + fieldName;
        }

        if (field.Value.IsBsonDocument)
        {
            ProcessTree(fields, field.Value.ToBsonDocument(), fieldName);
        }
        else
        {
            fields.Add(fieldName);
        }
    }

}

执行 MapReduce 操作以返回所有字段。然而，使用这种方法扫描嵌套字段变得更加复杂。见this。

例子：

string map = @"function() { 
    for (var key in this) { emit(key, null); }
}";
string reduce = @"function(key, stuff) { return null; }";
string finalize = @"function(key, value){
    return key;
}";
MapReduceOptions<BsonDocument, BsonValue> options = new MapReduceOptions<BsonDocument, BsonValue>();
options.Finalize = new BsonJavaScript(finalize);

var results = database.GetCollection<BsonDocument>(collection).MapReduceAsync(
    new BsonJavaScript(map),
    new BsonJavaScript(reduce),
    options).Result.ToListAsync().Result;
foreach (BsonValue result in results.Select(item => item["_id"]))
{
    Debug.WriteLine(result.AsString);
}

执行聚合操作。您需要根据需要展开多次才能获取所有嵌套字段。

例子：

string[] pipeline = new string[3];
pipeline[0] = "{ '$project':{ 'arrayofkeyvalue':{ '$objectToArray':'$$ROOT'}}}";
pipeline[1] = "{ '$unwind':'$arrayofkeyvalue'}";
pipeline[2] = "{ '$group':{'_id':null,'fieldKeys':{'$addToSet':'$arrayofkeyvalue.k'}}}";
var stages = pipeline.Select(s => BsonDocument.Parse(s)).ToList();
var result = await database.GetCollection<BsonDocument>(collection).AggregateAsync<BsonDocument>(stages);
foreach (BsonValue fieldName in result.Single().GetElement("fieldKeys").Value.AsBsonArray)
{
    Debug.WriteLine(fieldName.AsString);
}

这里没有完美的东西，我无法告诉你哪个最有效，但希望能有所帮助。

【讨论】：