【问题标题】:How to parse a json column embedded in a pandas dataframe python如何解析嵌入在熊猫数据框python中的json列
【发布时间】:2015-03-12 14:17:16
【问题描述】:

我有一个 pandas 数据框 (raw csv file here),其中包含几列存储为 json (d1 & d2)。如何解析这些列以提供所需的输出:

2015-02-12,user1,05:15 | 20,16:30 | 20.0,22:00 | 10.0

我意识到我必须在成功解析后转置输出,但我在读取数据框列中包含的 json 数据时遇到问题。任何帮助表示赞赏!谢谢

>>> test = pd.read_csv('schedsample.csv',sep=',', header=0)
>>> test.head()
         date username                                                 d1  \
0  2015-02-12    user1  {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...   
1  2015-02-12    user1  {"d2":[{"tm":"06:15","t":"20.0"},{"tm":"08:00"...   
2  2015-02-12    user1  {"d3":[{"tm":"07:15","t":"20.0"},{"tm":"09:00"...   
3  2015-02-12    user1  {"d4":[{"tm":"08:15","t":"20.0"},{"tm":"07:00"...   

                                                  d2  
0  {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...  
1  {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...  
2  {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...  
3  {"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30"...  
>>> import json as js
>>> js.loads(test['d1'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/khurampervez/anaconda/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
  File "/Users/khurampervez/anaconda/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer`

【问题讨论】:

  • 因为是字符串表示,我不能随便复制你的数据来玩。试试test.to_dict(),看看它是否复制了整个json条目?
  • @cphlewis 我试过了,它会复制整个 json 条目,即见下文&gt;&gt;&gt; d1=sched['d1'].to_dict() &gt;&gt;&gt; d1 {0: '{"d1":[{"tm":"05:15","t":"20.0"},{"tm":"16:30","t":"20.0"},{"tm":"22:00","t":"10.0"}]}', 1: '{"d2":[{"tm":"06:15","t":"20.0"},{"tm":"08:00","t":"10.0"},{"tm":"22:00","t":"10.0"}]}', 2: '{"d3":[{"tm":"07:15","t":"20.0"},{"tm":"09:00","t":"10.0"},{"tm":"22:00","t":"10.0"}]}', 3: '{"d4":[{"tm":"08:15","t":"20.0"},{"tm":"07:00","t":"10.0"},{"tm":"22:00","t":"10.0"}]}'}
  • @cphlewis 但是尝试运行 json.loads 命令给出以下&gt;&gt;&gt; json.loads(sched['d1'].to_dict()) Traceback (most recent call last): File "&lt;stdin&gt;", line 1, in &lt;module&gt; File "/Users/khurampervez/anaconda/lib/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/Users/khurampervez/anaconda/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) TypeError: expected string or buffer

标签: python json pandas dataframe


【解决方案1】:

您的 test.d1 列包含所有 d1 到 d4 对象,因此如果您执行 json.loads(test['d1']) 将导致错误,但如果您执行 json_normalize(json.loads(test['d1'][0])['d1']),它将为您提供所需的 d1 数据帧。所以我猜你不仅需要读入 d1 和 d2 列,还需要 d3 和 d4 列,这会产生一些空单元格。

【讨论】:

    猜你喜欢
    • 2021-08-29
    • 1970-01-01
    • 2019-08-01
    • 2018-07-31
    • 2023-02-03
    • 2019-01-01
    • 2021-05-02
    • 2017-11-28
    • 1970-01-01
    相关资源
    最近更新 更多