【发布时间】:2025-12-14 23:55:01
【问题描述】:
我有一个大型 pandas DataFrame,其中包含复杂的 numpy 数组的单个元素。请参阅下面的最小代码示例来重现该场景:
d = {f'x{i}': [] for i in range(4)}
df = pd.DataFrame(data=d).astype(object)
for K in range(4):
for i in range(4):
df.loc[f'{K}', f'x{i}'] = np.random.random(size=(2,2)) + np.random.random(size=(2,2)) * 1j
df
保存这些并再次加载以供以后使用的最佳方法是什么?
我遇到的问题是,当我增加存储的矩阵的大小和元素的数量时,当我尝试将其保存为.h5 文件时,我得到一个OverflowError,如下所示:
import pandas as pd
size = (300,300)
xs = 1500
d = {f'x{i}': [] for i in range(xs)}
df = pd.DataFrame(data=d).astype(object)
for K in range(10):
for i in range(xs):
df.loc[f'{K}', f'x{i}'] = np.random.random(size=size) + np.random.random(size=size) * 1j
df.to_hdf('test.h5', key="df", mode="w")
load_test = pd.read_hdf("test.h5", "df")
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-124-8cb8df1a0653> in <module>
12 df.loc[f'{K}', f'x{i}'] = np.random.random(size=size) + np.random.random(size=size) * 1j
13
---> 14 df.to_hdf('test.h5', key="df", mode="w")
15
16
~/PQKs/pqks/lib/python3.6/site-packages/pandas/core/generic.py in to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
2447 data_columns=data_columns,
2448 errors=errors,
-> 2449 encoding=encoding,
2450 )
2451
~/PQKs/pqks/lib/python3.6/site-packages/pandas/io/pytables.py in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)
268 path_or_buf, mode=mode, complevel=complevel, complib=complib
269 ) as store:
--> 270 f(store)
271 else:
272 f(path_or_buf)
~/PQKs/pqks/lib/python3.6/site-packages/pandas/io/pytables.py in <lambda>(store)
260 data_columns=data_columns,
261 errors=errors,
--> 262 encoding=encoding,
263 )
264
~/PQKs/pqks/lib/python3.6/site-packages/pandas/io/pytables.py in put(self, key, value, format, index, append, complib, complevel, min_itemsize, nan_rep, data_columns, encoding, errors, track_times)
1127 encoding=encoding,
1128 errors=errors,
-> 1129 track_times=track_times,
1130 )
1131
~/PQKs/pqks/lib/python3.6/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, axes, index, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, nan_rep, data_columns, encoding, errors, track_times)
1799 nan_rep=nan_rep,
1800 data_columns=data_columns,
-> 1801 track_times=track_times,
1802 )
1803
~/PQKs/pqks/lib/python3.6/site-packages/pandas/io/pytables.py in write(self, obj, **kwargs)
3189 # I have no idea why, but writing values before items fixed #2299
3190 blk_items = data.items.take(blk.mgr_locs)
-> 3191 self.write_array(f"block{i}_values", blk.values, items=blk_items)
3192 self.write_index(f"block{i}_items", blk_items)
3193
~/PQKs/pqks/lib/python3.6/site-packages/pandas/io/pytables.py in write_array(self, key, value, items)
3047
3048 vlarr = self._handle.create_vlarray(self.group, key, _tables().ObjectAtom())
-> 3049 vlarr.append(value)
3050
3051 elif empty_array:
~/PQKs/pqks/lib/python3.6/site-packages/tables/vlarray.py in append(self, sequence)
526 nparr = None
527
--> 528 self._append(nparr, nobjects)
529 self.nrows += 1
530
~/PQKs/pqks/lib/python3.6/site-packages/tables/hdf5extension.pyx in tables.hdf5extension.VLArray._append()
OverflowError: value too large to convert to int
【问题讨论】:
-
可能重复/在此回答:*.com/a/57133759/8896855
-
你可以用pickle把它保存为二进制文件。示例:docs.python.org/3/library/pickle.html#examples
标签: python pandas dataframe save numpy-ndarray