【发布时间】:2014-01-08 03:03:32
【问题描述】:
经过几周的改进,我得到了以下代码,这要感谢 SO 上的优秀人员,他们根据需要生成数据帧,但我不确定如何将程序中的数据帧合并为最终数据帧对象变量。我只是将 concat 语句分配给一个变量,然后我最终只得到最后一个数据帧。
{
"zipcode":"08989",
"current" {"canwc":null,"cig":4900,"class":"observation","clds":"OVC","day_ind":"D","dewpt":19,"expireTimeGMT":1385486700,"feels_like":34,"gust":null,"hi":37,"humidex":null,"icon_code":26,"icon_extd":2600,"max_temp":37,"wxMan":"wx1111"},
"triggers":[53,31,9,21,48,7,40,178,55,179,176,26,103,175,33,51,20,57,112,30,50,113]
}
{
"zipcode":"08990",
"current":{"canwc":null,"cig":4900,"class":"observation","clds":"OVC","day_ind":"D","dewpt":19,"expireTimeGMT":1385486700,"feels_like":34,"gust":null,"hi":37,"humidex":null,"icon_code":26,"icon_extd":2600,"max_temp":37, "wxMan":"wx1111"},
"triggers":[53,31,9,21,48,7,40,178,55,179,176,26,103,175,33,51,20,57,112,30,50,113]
}
def lines_per_n(f, n):
for line in f:
yield ''.join(chain([line], itertools.islice(f, n - 1)))
def series_chunk(chunk):
try:
jfile = json.loads(chunk)
zipcode = jfile['zipcode']
datetime = jfile['current']['proc_time']
triggers = jfile['triggers']
return pd.Series([jfile['zipcode'], jfile['current']['proc_time'],\
jfile['triggers']])
except ValueError, e:
pass
else:
pass
for fin in glob.glob('*.txt'):
with open(fin) as f:
print pd.concat([series_chunk(chunk) for chunk in lines_per_n(f, 5)], axis=1).T
上述程序的输出,我需要将其连接为一个数据帧:
0 1 2
0 08988 20131126102946 []
1 08989 20131126102946 [53, 31, 9, 21, 48, 7, 40, 178, 55, 179, 176, ...
0 1 2
0 08988 20131126102946 []
1 08989 20131126102946 [53, 31, 9, 21, 48, 7, 40, 178, 55, 179, 176, ...
终于把这件事折服了。这是完成我需要的最终代码:
dfs = []
for fin in glob.glob('*.txt'):
with open(fin) as f:
df = pd.concat([series_chunk(chunk) for\
chunk in lines_per_n(f, 7)], axis=1).T
dfs.append(df)
df = pd.concat(dfs, ignore_index=True)
【问题讨论】:
-
见这里:pandas.pydata.org/pandas-docs/dev/…;只需将 df 附加到列表中,将它们附加到列表中,然后在最后连接,例如
result = pd.concat([list_of_frames]) -
你可以直接通过:pandas.pydata.org/pandas-docs/dev/io.html#json 做一些这样的事情(它们也是 0.13 中用于嵌套 json 的规范化部分)
-
@Jeff 我试过这样做并得到
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering., :S -
@AndyHayden 自己从未使用过规范化...
-
@Jeff 我没看到它被实现了!认为有一些代码我可以减少混乱。语法看起来很神奇。