【问题标题】:Orientdb - CSV Import - Performance CSV Import EdgeOrientdb - CSV 导入 - 性能 CSV 导入边缘
【发布时间】:2016-07-19 23:47:22
【问题描述】:

我想将两个 csv 文件导入 Orientdb 数据库。第一个是顶点,有 100 万条记录。第二个是有 5900 万条记录的边

我有两个 json 文件要导入:

顶点

{
  "source": { "file": { "path": "../csvs/metodo01/pesquisador.csv" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {} },
    { "vertex": { "class": "Pesquisador" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/dbCemMilM01", 
       "dbType": "graph",
       "batchCommit": 1000,
       "classes": [
         {"name": "Pesquisador", "extends": "V"}
       ], "indexes": [
         {"class":"Pesquisador", "fields":["psq_id:integer"], "type":"UNIQUE" }
       ]
    }
  }
}

边缘

{
    "config": {
        "log": "info",
            "parallel": false
    },
    "source": {
        "file": {
            "path": "../csvs/metodo01/a10.csv"
        }
    },
    "extractor": {
        "row": {
        }
    },
    "transformers": [{
        "csv": {
            "separator": ",",
            "columnsOnFirstLine": true,
            "columns": ["psq_id_from:integer",
            "pub_id_to:integer",
            "ordem:integer"]
        }
    },
    {
        "command": {
            "command": "create edge PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to   (select from Publicacao  where pub_id = ${input.pub_id_to}) set  ordem = ${input.ordem} ",
            "output": "edge"
        }
    }],
    "loader": {
        "orientdb": {
            "dbURL": "remote:localhost/dbUmMilhaoM01", 
            "dbType": "graph",
            "standardElementConstraints": false,
            "batchCommit": 1000,
            "classes": [{
                "name": "PUBLICOU",
                "extends": "E"
            }]
        }
    }
}

在这个过程中,Orientdb 建议使用索引来加速这个过程。

我该怎么做?

命令只是创建边缘PUBLICOU from (select from Pesquisador where psq_id = ${input.psq_id_from}) to (select from Publicacao where pub_id = ${input.pub_id_to}) set ordem = ${input.ordem}

【问题讨论】:

标签: csv orientdb


【解决方案1】:

为了加快创建边缘的过程,您可能需要对两个属性 Pesquisador.psq_id(您已经拥有)和 Publicacao.pub_id 上的索引。

伊万

【讨论】:

    【解决方案2】:

    您可以直接在 ETL 配置中声明索引。来自 DBPedia 导入器的示例:

    "orientdb": {
      "dbURL": "plocal:/temp/databases/dbpedia",
      "dbUser": "importer",
      "dbPassword": "IMP",
      "dbAutoCreate": true,
      "tx": false,
      "batchCommit": 1000,
      "wal" : false,
      "dbType": "graph",
      "classes": [
        {"name":"Person", "extends": "V" },
        {"name":"Customer", "extends": "Person", "clusters":8 }
      ],
      "indexes": [
        {"class":"V", "fields":["URI:string"], "type":"UNIQUE" },
        {"class":"Person", "fields":["town:string"], "type":"NOTUNIQUE" ,
            metadata : { "ignoreNullValues" : false }
        }
      ]
    }
    

    欲了解更多信息,请查看:http://orientdb.com/docs/2.2/Loader.html

    【讨论】:

      【解决方案3】:

      为了加快加载过程,我的建议是在 plocal 模式下工作,然后将创建的数据库模式化为独立的 OrientDB 服务器。

      【讨论】:

        猜你喜欢
        • 2016-01-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-10-29
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多