【发布时间】:2020-10-27 20:15:18
【问题描述】:
我有一个来自 api 的响应,其中包含来自加热系统的数据集,其结构为具有嵌套字典的字典,其中包含包含字典的列表。
例如
sample = {"zoneType": "HEATING",
"interval": {"from": "2020-10-23T22:45:00.000Z", "to": "2020-10-24T23:15:00.000Z"},
"hoursInDay": 24,
"measuredData": {
"measuringDeviceConnected": {
"timeSeriesType": "dataIntervals",
"valueType": "boolean",
"dataIntervals": [{
"from": "2020-10-23T22:45:00.000Z", "to": "2020-10-24T23:15:00.000Z", "value": True}]
},
"insideTemperature": {
"timeSeriesType": "dataPoints",
"valueType": "temperature",
"min": {
"celsius": 19.34,
"fahrenheit": 66.81},
"max": {
"celsius": 20.6,
"fahrenheit": 69.08},
"dataPoints": [
{"timestamp": "2020-10-23T22:45:00.000Z", "value": {"celsius": 20.6, "fahrenheit": 69.08}},
{"timestamp": "2020-10-23T23:00:00.000Z", "value": {"celsius": 20.55, "fahrenheit": 68.99}},
{"timestamp": "2020-10-23T23:15:00.000Z", "value": {"celsius": 20.53, "fahrenheit": 68.95}},
{"timestamp": "2020-10-23T23:30:00.000Z", "value": {"celsius": 20.51, "fahrenheit": 68.92}},
{"timestamp": "2020-10-23T23:45:00.000Z", "value": {"celsius": 20.48, "fahrenheit": 68.86}},
{"timestamp": "2020-10-24T00:00:00.000Z", "value": {"celsius": 20.48, "fahrenheit": 68.86}},
{"timestamp": "2020-10-24T00:15:00.000Z", "value": {"celsius": 20.44, "fahrenheit": 68.79}}]
},
"humidity": {
"timeSeriesType": "dataPoints",
"valueType": "percentage",
"percentageUnit": "UNIT_INTERVAL",
"min": 0.615,
"max": 0.627,
"dataPoints": [
{"timestamp": "2020-10-23T22:45:00.000Z", "value": 0.615},
{"timestamp": "2020-10-23T23:00:00.000Z", "value": 0.615},
{"timestamp": "2020-10-23T23:15:00.000Z", "value": 0.619},
{"timestamp": "2020-10-23T23:30:00.000Z", "value": 0.620},
{"timestamp": "2020-10-23T23:45:00.000Z", "value": 0.621},
{"timestamp": "2020-10-24T00:00:00.000Z", "value": 0.623},
{"timestamp": "2020-10-24T00:15:00.000Z", "value": 0.627}]
}
}}
我想从上面提取 ['insideTemperature']['datapoints'] 时间戳和摄氏度值(实际数据跨越更多时间段)并将它们作为列放在新的 pd.DataFrame 中以及来自 '湿度'的关键。在适当的时候,我想将其与来自具有类似结构的单独 API 调用的数据合并,尽管可能没有一致的时间戳值。
许多顶级字典包含摘要数据(例如最小值和最大值),因此可以忽略。同样,如果需要,从摄氏度到 f 等的转换很容易,所以我不想提取这些数据。
什么是干净地创建一个数据文件的最佳方法,该文件列出了来自该查询的时间戳、摄氏温度和湿度,然后我可以将其与其他查询输出连接?
到目前为止,我一直在使用以下内容:
import pandas as pd
df = pd.DataFrame(sample['measuredData']['insideTemperature']['dataPoints'])
## remove column that contains dictionary data, leaving time data
df.drop(labels='value', axis=1, inplace=True)
## get temp data into new column
input_data_point = sample['measuredData']['insideTemperature']['dataPoints']
temps = []
for i in input_data_point:
temps.append(i['value']['celsius'])
df['inside_temp_c'] = pd.DataFrame(temps)
## repeat for humidity
input_data_point = sample['measuredData']['humidity']['dataPoints']
temps = []
for i in input_data_point:
temps.append(i['value'])
df['humidity_pct'] = pd.DataFrame(temps)
作为 python 编码的新手,我想知道是否有更快的方法从原始下载数据中提取数据,直接进入干净的 Pandas DataFrame?感谢您的任何建议。
【问题讨论】:
标签: python pandas dataframe dictionary