【问题标题】:Collecting data from JSON array into ClickHouse table将 JSON 数组中的数据收集到 ClickHouse 表中
【发布时间】:2026-02-15 07:40:01
【问题描述】:

我在 ClickHouse 表中有一些原始 JSON 数据(实际上是来自 netflow 收集器的 netflow V9) 它看起来像这样:

{"AgentID":"10.1.8.1",
       "Header":{"Version":9,"Count":2},
       "DataSets":[
            [{"I":2,"V":"231"},{"I":3,"V":"151"},{"I":8,"V":"109.195.122.130"}],
            [{"I":2,"V":"341"},{"I":3,"V":"221"},{"I":8,"V":"109.195.122.233"}]
       
       ]}'

我的任务是通过以下方式将 DataSets 数组转换为另一个 ClickHouse 表:

I2     I3    I8
-----------------------------
231    151   109.195.122.130
341    221   109.195.122.233
...
 

【问题讨论】:

    标签: json clickhouse netflow


    【解决方案1】:

    要解析 JSON,请考虑使用专门的 json functions

    SELECT
        toInt32(column_values[1]) AS I2,
        toInt32(column_values[2]) AS I3,
        column_values[3] AS I8
    FROM 
    (
        SELECT
            arrayJoin(JSONExtract(json, 'DataSets', 'Array(Array(Tuple(Int32, String)))')) AS row,
            arraySort(x -> (x.1), row) AS row_with_sorted_columns,
            arrayMap(x -> (x.2), row_with_sorted_columns) AS column_values
        FROM 
        (
            SELECT '{"AgentID":"10.1.8.1", "Header":{"Version":9,"Count":2}, "DataSets":[\n          [{"I":3,"V":"151"},{"I":8,"V":"109.195.122.130"},{"I":2,"V":"231"}],\n          [{"I":2,"V":"341"},{"I":3,"V":"221"},{"I":8,"V":"109.195.122.233"}]]}' AS json
        )
    )
    
    
    /*
    ┌─I2──┬─I3──┬─I8──────────────┐
    │ 231 │ 151 │ 109.195.122.130 │
    │ 341 │ 221 │ 109.195.122.233 │
    └─────┴─────┴─────────────────┘
    */
    

    (要了解有关 JSON 解析的更多信息,请参阅 How to extract json from json in clickhouse?


    上面的实现依赖于 Datasets-array 的固定结构。正如我在现实世界中所理解的那样,这种结构具有任意模式(https://www.iana.org/assignments/ipfix/ipfix.xhtml),例如:

    {
       "AgentID":"192.168.21.15",
       "Header":{},
       "DataSets":[
          [
             {"I":8, "V":"192.16.28.217"},
             {"I":12, "V":"180.10.210.240"},
             {"I":5, "V":2},
             {"I":4, "V":6},
             {"I":7, "V":443},
             {"I":6, "V":"0x10"}
          ]
       ]
    }
    

    因此出现了关于具有任意列数的表的问题。 ClickHouse 不支持此功能 - 看看在这种情况下如何呈现表格https://*.com/search?q=%5Bclickhouse%5D+pivot

    【讨论】: