【问题标题】:can't unpickle class that inherits from pandas DataFrame无法取消从 pandas DataFrame 继承的类
【发布时间】:2019-07-28 03:14:37
【问题描述】:

我正在尝试腌制从 pandas.DataFrame 继承的对象。 我添加到数据框中的属性在酸洗/解酸过程中消失了。有一些明显的解决方法,但是......我做错了什么,还是这是一个错误?

import pandas as pd
import pickle

class Foo(pd.DataFrame):
    def __init__(self,tag,df):
        super().__init__(df)
        self._tag = tag

foo = Foo('mytag', pd.DataFrame({'a':[1,2,3],'b':[4,5,6]}))
print(foo)
print(foo._tag)

print("-------------------------------------")

with open("foo.pkl", "wb") as pkl:
    pickle.dump(foo, pkl)

with open("foo.pkl", "rb") as pkl:
    foo1 = pickle.load(pkl)

print(type(foo1))
print(foo1)
print(foo1._tag)

这是我的输出:

   a  b
0  1  4
1  2  5
2  3  6
mytag
-------------------------------------
<class '__main__.Foo'>
   a  b
0  1  4
1  2  5
2  3  6
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-1e7e89e199c8> in <module>
     21 print(type(foo1))
     22 print(foo1)
---> 23 print(foo1._tag)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5065             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5066                 return self[name]
-> 5067             return object.__getattribute__(self, name)
   5068 
   5069     def __setattr__(self, name, value):

AttributeError: 'Foo' object no attribute '_tag'

(python 3.7,pandas 0.24.2,pickle.format_version 4.0)

【问题讨论】:

标签: python pandas pickle


【解决方案1】:

Michael 的回答与我在查看他们的代码时的发现相符。 DataFrame 继承自 NDFrame,它也覆盖了 __setattr__,因此这也可能导致了这个问题。

这里最直接的解决方案是创建一个使用数据框作为属性的类,以便您自己的属性是可设置的。

class Foo:
    def __init__(self, tag, df):
        self.df = df
        self._tag = tag

*另外:如果原生 pickle 无法腌制像这样的复杂对象,我会考虑尝试 dill。在$ pip install dill 之后,您需要做的就是import dill as pickle,因为它与pickle 具有相同的方法名称。

【讨论】:

  • 在这里使用组合而不是继承的唯一问题是,Pandas DataFrames 有大量您可能想要包装的方法。如果你必须一直写my_object.df[slice]而不是my_object[slice],那很容易忘记,而且会导致代码更乱。
  • 是的,封装大量方法是个问题。 dill 不幸的是没有成功。我只会腌制元组并在我解开时重新创建 Foos。生活是无情的。
【解决方案2】:

真奇怪,我发了a similar question at almost the same time。在后续评论中,我发现了一些更基本的东西:您在 DataFrame 子类中定义自己的元数据甚至无法在 SLICING 操作中生存。

创建 foo 实例后,打印它并打印 foo._tag,试试这个:

bar = foo[1:]
print(bar)
print(bar._tag)

这也返回一个AttributeError,与您的pickle-unpickle 操作相同。

切片时可能有充分的理由更改甚至删除元数据。但是您可能很想保留它。我不知道 Pandas 代码中是否有一个点会影响切片和酸洗,但我怀疑有。

【讨论】:

    【解决方案3】:

    我认为这是 Pandas 如何处理属性的问题。即使是简化的继承尝试也行不通:

    class Foo(pd.DataFrame):
        def __init__(self, tag, df):
            self._tag = tag
    
    Traceback (most recent call last):
      File "c:\Users\Michael\.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\ptvsd_launcher.py", line 43, in <module>
        main(ptvsdArgs)
      File "c:\Users\Michael\.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\lib\python\ptvsd\__main__.py", line 434, in main
        run()
      File "c:\Users\Michael\.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\lib\python\ptvsd\__main__.py", line 312, in run_file
        runpy.run_path(target, run_name='__main__')
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 263, in run_path
        pkg_name=pkg_name, script_name=fname)
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 96, in _run_module_code
        mod_name, mod_spec, pkg_name, script_name)
      File "C:\Users\Michael\AppData\Local\Programs\Python\Python37-32\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "c:\Users\Michael\Desktop\sandbox\sandbox.py", line 8, in <module>
        foo = Foo('mytag', pd.DataFrame({'a':[1,2,3],'b':[4,5,6]}))
      File "c:\Users\Michael\Desktop\sandbox\sandbox.py", line 6, in __init__
        self._tag = tag
      File "c:\Users\Michael\Desktop\sandbox\venv\lib\site-packages\pandas\core\generic.py", line 5205, in __setattr__
        existing = getattr(self, name)
      File "c:\Users\Michael\Desktop\sandbox\venv\lib\site-packages\pandas\core\generic.py", line 5178, in __getattr__
        if self._info_axis._can_hold_identifiers_and_holds_name(name):
      File "c:\Users\Michael\Desktop\sandbox\venv\lib\site-packages\pandas\core\generic.py", line 5178, in __getattr__
        if self._info_axis._can_hold_identifiers_and_holds_name(name):
      File "c:\Users\Michael\Desktop\sandbox\venv\lib\site-packages\pandas\core\generic.py", line 5178, in __getattr__
        if self._info_axis._can_hold_identifiers_and_holds_name(name):
      [Previous line repeated 487 more times]
      File "c:\Users\Michael\Desktop\sandbox\venv\lib\site-packages\pandas\core\generic.py", line 489, in _info_axis
        return getattr(self, self._info_axis_name)
      File "c:\Users\Michael\Desktop\sandbox\venv\lib\site-packages\pandas\core\generic.py", line 5163, in __getattr__
        def __getattr__(self, name):
      File "c:\Users\Michael\.vscode\extensions\ms-python.python-2019.6.24221\pythonFiles\lib\python\ptvsd\_vendored\pydevd\_pydevd_bundle\pydevd_trace_dispatch_regular.py", line 362, in __call__
        is_stepping = pydev_step_cmd != -1
    RecursionError: maximum recursion depth exceeded in comparison
    

    我认为这是他们对__getattribute__() 的使用,它在找到未知属性时会引发错误。他们是overriding the default __getattr__() behavior,我猜这与继承有关。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-10-23
      • 2012-07-04
      • 2013-01-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多