【问题标题】:How to properly use context variables in OrientDB ETL configuration file?如何在 OrientDB ETL 配置文件中正确使用上下文变量?
【发布时间】:2023-08-20 03:04:01
【问题描述】:

总结

尝试了解OrientDB ETL配置json文件。

假设是一个 CSV 文件,其中:

  • 每一行都是一个顶点
  • “类”列给出了顶点的预期类
  • 顶点有多个类(Foo、Bar、Baz)

如何将顶点的类设置为“类”列的值?


排除故障的努力

我在 OrientDB ETL 文档中花了很多时间试图解决这个问题。我尝试了letblockcode 组件的许多不同组合。我试过像className$className${classname}这样的变量名。

当前结果

  • code 组件能够正确打印 `className' 的值,所以我知道它设置正确。
  • vertex 组件未正确引用变量,因此将每个顶点的类设置为 null

上下文

我在 localhost 上有一个新创建的数据库 (PLOCAL GRAPH),名为“deleteme”。

我有一个如下所示的顶点 CSV 文件 (nodes.csv):

id,name,class
1,Jack,Foo
2,Jill,Bar
3,Gephri,Baz

还有一个如下所示的 ETL 配置文件 (test.json):

{
  "config": {
    "log": "DEBUG"
  },
  "source": {"file": {"path": "nodes.csv"}},
  "extractor": {"csv": {}},
  "transformers": [
    {"block": {"let": {"name": "$className",
                       "value": "$input.class"}}},
    {"code": {"language": "Javascript",
              "code": "print(className + '\\n'); input;"}},
    {"vertex": {"class": "$className"}}
  ],
  "loader": {
    "orientdb": {
      "dbURL": "remote:localhost:2424/deleteme",
      "dbUser": "admin",
      "dbPassword": "admin",
      "dbType": "graph",
      "tx": false,
      "wal": false,
      "batchCommit": 1000,
      "classes": [
        {"name": "Foo", "extends": "V"},
        {"name": "Bar", "extends": "V"},
        {"name": "Baz", "extends": "V"}
      ]
    }
  }
}

当我运行 ETL 作业时,我的输出如下所示:

aj@host:~/bin/orientdb-community-2.1.13/bin$ ./oetl.sh test.json
OrientDB etl v.2.1.13 (build 2.1.x@r9bc1a54a4a62c4de555fc5360357f446f8d2bc84; 2016-03-14 17:00:05+0000) www.orientdb.com
BEGIN ETL PROCESSOR
[file] INFO Reading from file nodes.csv with encoding UTF-8
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Foo' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
+ extracted 0 rows (0 rows/sec) - 0 rows -> loaded 0 vertices (0 vertices/sec) Total time: 1001ms [0 warnings, 0 errors]
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Bar' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[orientdb] DEBUG - OrientDBLoader: created vertex class 'Baz' extends 'V'
[orientdb] DEBUG orientdb: found 0 vertices in class 'null'
[csv] DEBUG document={id:1,class:Foo,name:Jack}
[1:block] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
[1:block] DEBUG Transformer output: {id:1,class:Foo,name:Jack}
[1:code] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
Foo
[1:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:1,class:Foo,name:Jack}
[1:code] DEBUG Transformer output: {id:1,class:Foo,name:Jack}
[1:vertex] DEBUG Transformer input: {id:1,class:Foo,name:Jack}
[1:vertex] DEBUG Transformer output: v(null)[#3:0]
[csv] DEBUG document={id:2,class:Bar,name:Jill}
[2:block] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
[2:block] DEBUG Transformer output: {id:2,class:Bar,name:Jill}
[2:code] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
Bar
[2:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:2,class:Bar,name:Jill}
[2:code] DEBUG Transformer output: {id:2,class:Bar,name:Jill}
[2:vertex] DEBUG Transformer input: {id:2,class:Bar,name:Jill}
[2:vertex] DEBUG Transformer output: v(null)[#3:1]
[csv] DEBUG document={id:3,class:Baz,name:Gephri}
[3:block] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
[3:block] DEBUG Transformer output: {id:3,class:Baz,name:Gephri}
[3:code] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
Baz
[3:code] DEBUG executed code=OCommandExecutorScript [text=print(className); input;], result={id:3,class:Baz,name:Gephri}
[3:code] DEBUG Transformer output: {id:3,class:Baz,name:Gephri}
[3:vertex] DEBUG Transformer input: {id:3,class:Baz,name:Gephri}
[3:vertex] DEBUG Transformer output: v(null)[#3:2]
END ETL PROCESSOR
+ extracted 3 rows (4 rows/sec) - 3 rows -> loaded 3 vertices (4 vertices/sec) Total time: 1684ms [0 warnings, 0 errors]

哦,那DEBUG orientdb: found 0 vertices in class 'null' 是什么意思?

【问题讨论】:

    标签: orientdb


    【解决方案1】:

    试试这个。我也为此苦苦挣扎了一段时间,但下面的设置对我有用。

    请注意,设置@class 之前 vertex 转换器将使用适当的类初始化顶点。

    "transformers": [
        {"block": {"let": {"name": "$className",
                           "value": "$input.class"}}},
        {"code": {"language": "Javascript",
                  "code": "print(className + '\\n'); input;"}},
        { "field": {
            "fieldName": "@class",
            "expression": "$className"
          }
        },
        {"vertex": {}}
      ]
    

    【讨论】:

      【解决方案2】:

      要获得结果,您可以使用“ETL”将数据从 csv 导入名为“Generic”的 CLASS。 通过 JS 函数“separateClass()”,以从 csv 导入的属性 'Class' 的名称创建新类,并将 Generic 类的顶点放入新类。

      文件 json:

          {
          "source": { "file": {"path": "data.csv"}},
          "extractor": { "row": {}},
        "begin": [
         { "let": { "name": "$className", "value": "Generic"} }
        ],
          "transformers": [
              {"csv": {
                  "separator": ",",
                  "nullValue": "NULL",
                  "columnsOnFirstLine": true,
                  "columns": [
                      "id:Integer",
                      "name:String",
                      "class:String"
                      ]
                  }
              },
      
              {"vertex": {"class": "$className", "skipDuplicates": true}}
          ],
          "loader": {
              "orientdb": {
                  "dbURL": "remote:localhost/test",
                  "dbType": "graph"
              }
          }
      }
      

      从etl导入数据后,在javascript中创建函数

      var g = orient.getGraphNoTx(); 
      var queryResult= g.command("sql", "SELECT FROM Generic"); 
      
      //example filed vertex: ID, NAME, CLASS 
      if (!queryResult.length) {
      print("Empty");
      } else {
         //for each value create or insert in class
         for (var i = 0; i < queryResult.length; i++) {
           var className = queryResult[i].getProperty("class").toString();
      
           //chech is className is already created
           var countClass = g.command("sql","select from V where @class = '"+className+"'"); 
      
           if (!countClass.length) {
             g.command("sql","CREATE CLASS "+className+" extends V");  
             g.command("sql"," CREATE PROPERTY "+className+".id INTEGER");  
             g.command("sql"," CREATE PROPERTY "+className+".name STRING"); 
             g.commit();
           } 
      
            var id =  queryResult[i].getProperty("id").toString();
            var name = queryResult[i].getProperty("name").toString();
            g.command("sql","INSERT INTO "+className+ " (id, name) VALUES ("+id+",'"+name+"')"); 
            g.commit();
         }
      
        //remove class generic
        g.command("sql","truncate class Generic unsafe");
      }
      

      结果应该和图片一样。

      【讨论】:

        最近更新 更多