【问题标题】:How to specify leading field names with mongoimport?如何使用 mongoimport 指定前导字段名称?
【发布时间】:2025-12-01 14:35:01
【问题描述】:

我正在向 mongodb 导入一个非常大的 csv 文件,格式如下:

"zzzàms@hotmail.com","12071988"
"zzzг ms@hotmail.com","12071988"
"zzпїѕпїѕmmbbii2@bk.ru","MA15042002"
"zzпїѕпїѕmmbbii2@list.ru","MA15042002"
"zzпїѕпїѕmmbbii2@rambler.ru","MA15042002"
"zzпїѕпїѕmmbbii2@yandex.ru","MA15042002"

但是,我不确定在电子邮件字段之后会有多少字段/列。

我已经使用这个命令导入了:

mongoimport -d emails -c second --file all.csv --type csv --fields email, number

但是,数字字段之后的任何字段/列都会发出默认值“field2”、“field3”等。

{ "_id" : ObjectId("5a5cd95e598f1e910d353e3b"), "email" : "00-amber-00@embarqmail.com", " number" : "number1", "field2" : "number2" }

如何在同一列中的数字字段之后放置任何内容,从而将其归类为“数字”?

有时,一个条目可能有 40 列。

除非确实有必要,否则我不希望修改 csv 文件。

抱歉,英语不是第一语言,谢谢。

【问题讨论】:

    标签: database mongodb csv import


    【解决方案1】:

    您可以使用Unix 之类的命令,如awk,按照逻辑将行解析为json,将stdin 解析为mongoimport

    示例文件

    saravana@ubuntu:~$ cat sample-doc.txt 
    "zzzàms@hotmail.com","12071988"
    "zzzг ms@hotmail.com","12071988"
    "zzпїѕпїѕmmbbii2@bk.ru","MA15042002"
    "zzпїѕпїѕmmbbii2@list.ru","MA15042002"
    "zzпїѕпїѕmmbbii2@rambler.ru","MA15042002","34534"
    "zzпїѕпїѕmmbbii2@yandex.ru","MA15042002","1232434","3435435","53534"
    

    awk 转换为json,电子邮件后跟数字

    saravana@ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }'
    { email :"zzzàms@hotmail.com", numbers : [ "12071988" ] } 
    { email :"zzzг ms@hotmail.com", numbers : [ "12071988" ] } 
    { email :"zzпїѕпїѕmmbbii2@bk.ru", numbers : [ "MA15042002" ] } 
    { email :"zzпїѕпїѕmmbbii2@list.ru", numbers : [ "MA15042002" ] } 
    { email :"zzпїѕпїѕmmbbii2@rambler.ru", numbers : [ "MA15042002","34534" ] } 
    { email :"zzпїѕпїѕmmbbii2@yandex.ru", numbers : [ "MA15042002","1232434","3435435","53534" ] } 
    saravana@ubuntu:~$ 
    

    mongoimport 使用stdin

    saravana@ubuntu:~$ cat sample-doc.txt | awk 'BEGIN{FS=","}{print "{ email :" $1 ", numbers : [ " substr($0,length($1)+2) " ] } " }' | mongoimport --type json --db test --collection emailnos -v
    2018-01-17T09:58:11.559+0530    reading from stdin
    2018-01-17T09:58:11.559+0530    using fields: 
    2018-01-17T09:58:11.561+0530    connected to: localhost
    2018-01-17T09:58:11.561+0530    ns: test.emailnos
    2018-01-17T09:58:11.561+0530    connected to node type: standalone
    2018-01-17T09:58:11.561+0530    using write concern: w='1', j=false, fsync=false, wtimeout=0
    2018-01-17T09:58:11.561+0530    using write concern: w='1', j=false, fsync=false, wtimeout=0
    2018-01-17T09:58:11.726+0530    imported 6 documents
    

    收藏

    > db.emailnos.find()
    { "_id" : ObjectId("5a5ed0dbead4f5f7ae68da90"), "email" : "zzzàms@hotmail.com", "numbers" : [ "12071988" ] }
    { "_id" : ObjectId("5a5ed0dbead4f5f7ae68da91"), "email" : "zzпїѕпїѕmmbbii2@list.ru", "numbers" : [ "MA15042002" ] }
    { "_id" : ObjectId("5a5ed0dbead4f5f7ae68da92"), "email" : "zzпїѕпїѕmmbbii2@rambler.ru", "numbers" : [ "MA15042002", "34534" ] }
    { "_id" : ObjectId("5a5ed0dbead4f5f7ae68da93"), "email" : "zzпїѕпїѕmmbbii2@yandex.ru", "numbers" : [ "MA15042002", "1232434", "3435435", "53534" ] }
    { "_id" : ObjectId("5a5ed0dbead4f5f7ae68da94"), "email" : "zzzг ms@hotmail.com", "numbers" : [ "12071988" ] }
    { "_id" : ObjectId("5a5ed0dbead4f5f7ae68da95"), "email" : "zzпїѕпїѕmmbbii2@bk.ru", "numbers" : [ "MA15042002" ] }
    > 
    

    【讨论】: