将新行分配给大熊猫 DataFrame 导致 OOM答案

【问题标题】：assign new row to large pandas DataFrame causes OOM将新行分配给大熊猫 DataFrame 导致 OOM
【发布时间】：2021-07-12 18:51:02
【问题描述】：

我有一个 DataFrame df 像这样：

      0 1 2 3 4 5 ... 1154161
1     a b c d e f ... A
2     g h i j k l ... B
3     m n o p q r ... C
...
86405 Q V W X Y Z ... ZY

这是一个86405 rows × 1154161 columns DataFrame。请注意，索引从1 开始。我正在尝试使用index=0 分配一行：

df.loc[0] = 0

但是我遇到了错误：

MemoryError: Unable to allocate 372. GiB for an array with shape (99725281205,) 和数据类型 float32

我希望它看起来像：

      0 1 2 3 4 5 ... 1154161
0     0 0 0 0 0 0 ... 0       <--- add this row
1     a b c d e f ... A
2     g h i j k l ... B
3     m n o p q r ... C
...
86405 Q V W X Y Z ... ZY

还有其他方法可以在不耗尽内存的情况下进行分配吗？也许是大块的（最好不是）？

编辑：根据@hpaulj 请求添加 DataFrame 信息。

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1154161 entries, 0 to 1154160
Columns: 86405 entries, 1 to 86405
dtypes: float32(86405)
memory usage: 371.5 GB

EDIT2：注意示例DataFrame中的字母实际上是数字（float32）

【问题讨论】：

看看stackoverflow.com/questions/57507832/…
@AnuragDabas 的链接，有没有办法暂时做到这一点？（我使用 linux）
这是一个巨大的数据框，理想情况下，不鼓励任何数据模型中的这么多列。不过你可以试试看，arr = np.vstack((np.zeros(df.shape[1]),df.to_numpy())) 然后pd.DataFrame(arr,columns=df.columns)
任何增加框架的尝试都需要制作一个全新的框架。看起来该请求是针对数据部分的。 (99725281205,) 是新维度的产物吗？
要进一步讨论，请显示df.info 和完整的错误回溯。

标签： python pandas dataframe numpy

【解决方案1】：

1.https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#setting-with-enlargement

df.loc[len(df)] = 0
print (df)

2.https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

df = df.append(pd.Series(0, index=df.columns), ignore_index=True)

来源： Append an empty row in dataframe using pandas

【讨论】：

您的建议也会遇到同样的问题。它们都需要制作一个新的更大的数据框（以及用于存储该数据的 numpy 数组）。