【问题标题】:How to get data with JSON format in Clickhouse-driver如何在 Clickhouse-driver 中获取 JSON 格式的数据
【发布时间】:2020-11-24 14:01:24
【问题描述】:

我正在尝试在我的 Django 项目中获取 Clickhouse 数据。我正在使用clickhouse_driver 和:

client.execute('SELECT * FROM myTable LIMIT 5 FORMAT JSON')

当我在 Clickhouse 服务器 SELECT * FROM myTable LIMIT 5 FORMAT JSON 中执行此命令时,它会以 JSON 格式输出。但是在 python 中,当我使用 clickhouse_driver 尝试它时,它只输出如下字段:

[('2020213','qwerty','asdfg'),('2030103','qweasd','asdxv')]

但是我想要key-value json格式..like

{"logdate":"2020213","host":"qwerty","cef":"asdfg"}

有解决此问题的建议吗?或者,也许我必须寻找替代 clickhouse_driver..

谢谢。

【问题讨论】:

  • 你试过 Pandas .to_JSONread_sql 吗?
  • 实际上它与类型无关。它与 clickhouse_driver 库有关。当我们尝试获取数据时,lib 不提供密钥,只提供值.. 如果我能获取密钥,我将使用它:) thx

标签: python clickhouse


【解决方案1】:

clickhouse-driver 忽略 FORMAT 子句(参见 Selecting data)。

可以通过将列名与相关值组合来手动完成:

from clickhouse_driver import Client
from json import dumps

client = Client(host='localhost')

data = client.execute_iter('SELECT * FROM system.functions LIMIT 5', with_column_types=True)
columns = [column[0] for column in next(data)]

for row in data:
    json = dumps(dict(zip(columns, [value for value in row])))
    print(f'''{json}''')

# Result:
# {"name": "fromUnixTimestamp64Nano", "is_aggregate": 0, "case_insensitive": 0, "alias_to": ""}
# {"name": "toUnixTimestamp64Nano", "is_aggregate": 0, "case_insensitive": 0, "alias_to": ""}
# {"name": "toUnixTimestamp64Micro", "is_aggregate": 0, "case_insensitive": 0, "alias_to": ""}
# {"name": "sumburConsistentHash", "is_aggregate": 0, "case_insensitive": 0, "alias_to": ""}
# {"name": "yandexConsistentHash", "is_aggregate": 0, "case_insensitive": 0, "alias_to": ""}

或使用熊猫:

from clickhouse_driver import Client
import pandas as pd

client = Client(host='localhost')

data = client.execute_iter('SELECT * FROM system.functions LIMIT 5', with_column_types=True)
columns = [column[0] for column in next(data)]

df = pd.DataFrame.from_records(data, columns=columns)
print(df.to_json(orient='records'))
# Result 
# [{"name":"fromUnixTimestamp64Nano","is_aggregate":0,"case_insensitive":0,"alias_to":""},{"name":"toUnixTimestamp64Nano","is_aggregate":0,"case_insensitive":0,"alias_to":""},{"name":"toUnixTimestamp64Micro","is_aggregate":0,"case_insensitive":0,"alias_to":""},{"name":"sumburConsistentHash","is_aggregate":0,"case_insensitive":0,"alias_to":""},{"name":"yandexConsistentHash","is_aggregate":0,"case_insensitive":0,"alias_to":""}]

【讨论】:

  • 是的,同时我找到了解决方案:))我在问题下给出了我的解决方案-->
【解决方案2】:

我没有尝试 Vladimir 解决方案,但这是我的解决方案:

client.execute 命令为我们提供了“with_column_types=True”参数。它为我们提供了表的元数据。之后:

result , columns = client.execute('SELECT * FROM myTbl LIMIT 5',with_column_types=True)
df=pandas.DataFrame(result,columns=[tuple[0] for tuple in columns])
dfJson=df.to_json(orient='records')

这给了我们想要的东西。

谢谢你的建议:)

【讨论】:

    猜你喜欢
    • 2017-07-31
    • 1970-01-01
    • 2018-07-03
    • 2017-06-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多