【问题标题】:How to fix "TypeError: cannot serialize '_io.BufferedReader' object" when writing pandas dataframe to_pickle?将熊猫数据帧写入到_pickle时如何修复“TypeError:无法序列化'_io.BufferedReader'对象”?
【发布时间】:2019-12-08 23:56:56
【问题描述】:

我有一个简单的代码,我正在读取一个 excel .xlsx 文件作为数据框,然后使用 to_pickle 将其作为 pickle 文件写回。几个月来,我一直在使用相同的代码来读写新的 excel 文件。但是,这一次当我尝试我的代码时,由于某种原因,它给了我 TypeError: cannot serialize '_io.BufferedReader' object 错误。这是代码,

# Path to .xlsx
MasterItem = MonthlyFolder + "MasterItem__Nov2019.xlsx"

# Function to read the excel file
def ReadExcel(filename, sheetname=None, header=0):
    from openpyxl import load_workbook

    wb = load_workbook(filename, read_only=True)

    if sheetname is None:  # If sheetname is not provided then grab the first sheet
        print("\t Reading " + wb.sheetnames[0])
        ws = wb[wb.sheetnames[0]]
    else:
        print("\t Reading " + sheetname)
        ws = wb[sheetname]

    data = ws.values

    if header is None:
        columns = None
    elif header > 0:
        # Skip non header rows
        for i in range(0, header):
            next(data)
        # Save header row
        columns = next(data)[0:]
    else:
        columns = next(data)[0:]

    # Create a DataFrame based on the subsequent lines of data
    df_Out = pd.DataFrame(data, columns=columns)

    return df_Out

# Reading .xlsx and writing as pickle
RawMasterItem = ReadExcel(MasterItem)
pd.to_pickle(RawMasterItem, MonthlyFolder+"RawMasterItem.pkl") # This fails to run

以下是我得到的输出和错误,

    ../Data/2019Nov/MasterItem__Nov2019.xlsx
         Reading Sheet1
    Traceback (most recent call last):
      File "C:\Users\Eulhaq\AppData\Local\conda\conda\envs\DataScience\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-10-07041bb51f98>", line 3, in <module>
        pd.to_pickle(RawMasterItem, MonthlyFolder+"RawMasterItem.pkl")
      File "C:\Users\Eulhaq\AppData\Local\conda\conda\envs\DataScience\lib\site-packages\pandas\io\pickle.py", line 76, in to_pickle
        f.write(pickle.dumps(obj, protocol=protocol))
    TypeError: cannot serialize '_io.BufferedReader' object

【问题讨论】:

  • 我知道来自multiprocessing 的错误。您需要找到并删除(清空可能就足够了)缓冲区,但我不能告诉您它在您的情况下的位置。必须是从 Excel 文件中读取的剩余部分。

标签: python python-3.x pandas pickle


【解决方案1】:

因此,经过一整天的调试,结果发现对于我的 excel 文件中的一些 空白单元格,openpyxl 正在返回 &lt;ReadOnlyCell 'Sheet1'.D2&gt; 对象。当我尝试将数据框写为泡菜时,此单元格会进一步产生问题。即使列的数据类型是“str”,但是当我再次将数据类型显式更改为“str”时,它解决了问题。

RawMasterItem['Column'] = RawMasterItem['Column'].astype('str')

显然,openpyxl 没有正确读取和返回空值/空白,而是返回了一些奇怪的对象,这些对象随后无法序列化。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-11-25
    • 2014-06-25
    • 1970-01-01
    • 2018-07-23
    • 2019-04-20
    • 2020-05-12
    • 2014-03-28
    相关资源
    最近更新 更多