json文件的正确格式...然后到数据框答案

【问题标题】：correct format for json file...then to dataframejson文件的正确格式...然后到数据框
【发布时间】：2020-11-14 19:36:07
【问题描述】：

我有一个记事本文件，我将它保存为 json 文件，我正在尝试在 pandas 数据框中读取它。

我的 json 文件如下所示：

{
  "date" : "2000-01-01",
  "i" : "1387",
  "xxx" : "aaaa",
}, 
{
  "fecha" : "2000-01-02",
  "indicativo" : "1387",
  "xxx" : "aaaa",
}, 
{
  "data" : "2000-01-03",
  "indicativo" : "1387",
}, 
{
  "date" : "2000-01-04",
  "i" : "1387",
  "xxx" : "aaaa",
}, 
{
  "fecha" : "2000-01-05",
  "indicativo" : "1387",
  "xxx" : "aaaa",
}

如何使用代码将其更改为正确的 json 格式？（请记住，我只是发布了一些行，实际的 json 文件有数百行，所以我手动执行它是不切实际的）

然后，一旦我有了该文件，代码将是：

import pandas as pd
from pandas.io.json import json_normalize
name = pd.read_json(r"file.json", lines=True, orient='records')

我尝试使用 json 文件运行上述代码，但不断得到：

ValueError: Expected object or value.

经过多次试验和错误，我认为这是由于它不是正确的 json 格式，所以如果有人至少在第一部分帮助我，我将不胜感激。

【问题讨论】：

标签： json pandas dataframe jupyter-notebook notepad

【解决方案1】：

问题解决了如何使用代码将其更改为正确的 json 格式？
鉴于文件中显示的内容为逗号行和\n 分隔的字典。
通过将[ 添加到文件开头并将] 添加到文件末尾来读取和修复文件。
- 文件修复后，无需再次修复。
用pandas.read_json读回文件
- 可以将字典列表加载到 pandas 中，但每个 dict 中的 keys 不同，因此可能需要进行一些额外的清理。

import json
import pandas as pd
from pathlib import Path

# path to file
p = Path('e:/PythonProjects/stack_overflow/test.json')

# read and fix the file
with p.open('r+') as f:
    file = f.read()  # reads the file in as a long string
    file = '[' + file + ']'  # add characters to beginning and end of string
    f.seek(0)  # find the beginning of the file
    f.write(file)  # write the new data back to the file
    f.truncate()  # remove the old data

# after fixing the file with code 
df = pd.read_json(p)

# display(df)
         date     i   xxx       fecha indicativo        data
0  2000-01-01  1387  aaaa         NaN        NaN         NaN
1         NaN   NaN  aaaa  2000-01-02       1387         NaN
2         NaN   NaN   NaN         NaN       1387  2000-01-03
3  2000-01-04  1387  aaaa         NaN        NaN         NaN
4         NaN   NaN  aaaa  2000-01-05       1387         NaN

【讨论】：

【解决方案2】：

我认为你的 json 文件的开头和结尾应该有 []。

【讨论】：

谢谢我添加了，但我得到了这个错误：JSONDecodeError: Expecting property name 用双引号括起来：第 5 行第 1 列（char 62）
@mariancatholic 我复制了相同的 json 并将 [] 放在我的文件 abcd.json 中，然后像 name = pd.read_json("abcd.json") 一样读取文件。它有效，name 对象具有数据框也将 json 直接放入代码中，就像pd.DataFrame(json.loads('[{ "date" : "2000-01-01", "i" : "1387", "xxx" : "aaaa"},{ "date" : "20200-01-01", "i" : "1387", "xxx" : "aa2aa"}]')) 为我工作。

【解决方案3】：

我认为您的数据文件是一个字典列表，但缺少左方括号和右方括号。（文件不是 JSON，因为有字典（值），但没有键）。

上面的响应显示了如何添加“[”和“]”。

做完之后，就可以直接调用DataFrame的构造函数了：

data = [
    {
      "date" : "2000-01-01",
      "i" : "1387",
      "xxx" : "aaaa",
    }, 
    {
      "fecha" : "2000-01-02",
      "indicativo" : "1387",
      "xxx" : "aaaa",
    }, 
    # remaining dictionaries, omitted, to save space
]

pd.DataFrame(data)

【讨论】：