【问题标题】:Clickhouse SQL: Reshape data from long format to wide formatClickhouse SQL:将数据从长格式重塑为宽格式
【发布时间】:2020-01-26 19:16:58
【问题描述】:

我正在使用 Clickhouse SQL 方言。数组分解后,我有以下格式的数据。

|----- |---------------------|----------------|------------------|
|  id  |      timestamp      |  property_key  |  property_value  |
|----- |---------------------|----------------|------------------|
|  01  | 2019-09-25 16:24:38 |     query      |     Palmera      |
|------|---------------------|----------------|------------------|
|  01  | 2019-09-25 16:24:38 |   found_items  |       10         |
|------|---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |     query      |     pigeo        |
|------|---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |   found_items  |        0         |
|------|---------------------|----------------|------------------|
|  03  | 2019-09-25 16:08:13 |     query      |     harmon       |
|------|---------------------|----------------|------------------|
|  03  | 2019-09-25 16:08:13 |   found_items  |       17         |
|------|---------------------|----------------|------------------|

我通过查询收到了这样的结果

SELECT id, timestamp, 
properties.key AS property_key, 
properties.value as property_value
FROM (
SELECT 
  rowNumberInAllBlocks() as id,
  timestamp,
  properties.key,
  properties.value
FROM database.table
WHERE timestamp BETWEEN toDateTime('2019-09-16 11:26:56') 
AND toDateTime('2019-09-26 11:26:56')
ORDER BY timestamp)
ARRAY JOIN properties
WHERE
properties.key IN ('query', 'found_items')

我需要提取 found_items 等于 0 的查询。我不知道如何将数据从长格式重塑为宽格式。所以,预期的结果如下。

|----- |---------------------|-----------------|---------------|
|  id  |      timestamp      |     query       |  found_items  |
|----- |---------------------|-----------------|---------------|
|  02  | 2019-09-25 13:11:09 |     pigeo       |       0       |
|------|---------------------|-----------------|---------------|
|  15  | 2019-09-25 16:08:13 |     coche       |       0       |
|------|---------------------|-----------------|---------------|
|  27  | 2019-09-16 13:19:46 | panitos pampers |       0       |
|------|---------------------|-----------------|---------------|

|----- |---------------------|----------------|------------------|
|  id  |      timestamp      |  property_key  |  property_value  |
|----- |---------------------|----------------|------------------|
|  02  | 2019-09-25 13:11:09 |     query      |     pigeo        |
|------|---------------------|----------------|------------------|
|  15  | 2019-09-25 16:08:13 |     query      |     coche        |
|------|---------------------|----------------|------------------|
|  27  | 2019-09-16 13:19:46 |     query      |  panitos pampers |
|------|---------------------|----------------|------------------|

【问题讨论】:

    标签: clickhouse


    【解决方案1】:

    试试这个查询:

    SELECT 
      id, 
      groupArray(timestamp)[1] timestamp,
      groupArray(properties.key)[1] property_key,
      groupArray(properties.value) property_value  
    FROM (
      SELECT 
        rowNumberInAllBlocks() as id,
        timestamp,
        properties.key,
        properties.value
      FROM test.test_011
      WHERE timestamp BETWEEN toDateTime('2019-09-16 11:26:56') AND toDateTime('2019-09-26 11:26:56') 
        AND properties.value[indexOf(properties.key, 'found_items')] = '0'
      ORDER BY timestamp)
    ARRAY JOIN properties
    WHERE properties.key IN ('query' /*, ..*/)
    GROUP BY id, properties.key
    ORDER BY id
    
    /* Result
    ┌─id─┬───────────timestamp─┬─property_key─┬─property_value────────┐
    │  0 │ 2019-09-25 13:11:09 │ query        │ ['pigeo']             │
    │  1 │ 2019-09-16 13:19:46 │ query        │ ['panitos','pampers'] │
    └────┴─────────────────────┴──────────────┴───────────────────────┘
    */
    
    /* prepare test data */
    
    CREATE TABLE test.test_011 (
      timestamp DateTime,
      properties Nested(key String, value String)
    ) ENGINE = Memory;
    
    INSERT INTO test.test_011
    VALUES 
      (toDateTime('2019-09-25 16:24:38'),  ['query', 'found_items'], ['Palmera', '10']),
      (toDateTime('2019-09-25 13:11:09'),  ['query', 'found_items'], ['pigeo', '0']),
      (toDateTime('2019-09-25 16:08:13'),  ['query', 'found_items'], ['harmon', '17']),
      (toDateTime('2019-09-16 13:19:46'), ['found_items', 'query', 'query'], ['0', 'panitos', 'pampers']),
      (toDateTime('2019-09-25 16:22:38'),  ['query', 'query'], ['test', 'test']);
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2022-01-11
      相关资源
      最近更新 更多