将行附加到 Pandas DataFrame 添加 0 列答案

【问题标题】：Appending row to Pandas DataFrame adds 0 column将行附加到 Pandas DataFrame 添加 0 列
【发布时间】：2014-05-19 23:20:59
【问题描述】：

我正在创建一个 Pandas DataFrame 来存储数据。不幸的是，我无法提前知道我将拥有的数据行数。所以我的方法如下。

首先，我声明一个空的DataFrame。

df = DataFrame(columns=['col1', 'col2'])

然后，我追加一行缺失值。

df = df.append([None] * 2, ignore_index=True)

最后，我可以一次向这个 DataFrame 中插入一个单元格。（为什么我必须一次只做一个单元格是一个很长的故事。）

df['col1'][0] = 3.28

这种方法工作得很好，除了 append 语句在我的 DataFrame 中插入了一个额外的列。在该过程结束时，我输入 df 时看到的输出如下所示（包含 100 行数据）。

<class 'pandas.core.frame.DataFrame'>
Data columns (total 2 columns):
0            0  non-null values
col1         100  non-null values
col2         100  non-null values

df.head() 看起来像这样。

      0   col1   col2
0  None   3.28      1
1  None      1      0
2  None      1      0
3  None      1      0
4  None      1      1

有什么想法导致这个 0 列出现在我的 DataFrame 中吗？

【问题讨论】：

标签： python pandas append dataframe

【解决方案1】：

您可以使用Series 进行行插入：

df = pd.DataFrame(columns=['col1', 'col2'])
df = df.append(pd.Series([None]*2), ignore_index=True)
df["col1"][0] = 3.28

df 看起来像：

   col1 col2
0  3.28  NaN

【讨论】：

【解决方案2】：

追加尝试将列追加到您的数据框。它试图附加的列没有命名，并且其中有两个 None/Nan 元素，pandas 将（默认情况下）命名为名为 0 的列。

为了成功执行此操作，数据框追加的列名必须与当前数据框列名一致，否则将创建新列（默认情况下）

#you need to explicitly name the columns of the incoming parameter in the append statement
df = DataFrame(columns=['col1', 'col2'])
print df.append(Series([None]*2, index=['col1','col2']), ignore_index=True)


#as an aside

df = DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
dfRowImproper = [1,2,3,4]
#dfRowProper = DataFrame(arange(4)+1,columns=['A','B','C','D']) #will not work!!! because arange returns a vector, whereas DataFrame expect a matrix/array#
dfRowProper = DataFrame([arange(4)+1],columns=['A','B','C','D']) #will work


print df.append(dfRowImproper) #will make the 0 named column with 4 additional rows defined on this column

print df.append(dfRowProper) #will work as you would like as the column names are consistent

print df.append(DataFrame(np.random.randn(1,4))) #will define four additional columns to the df with 4 additional rows


print df.append(Series(dfRow,index=['A','B','C','D']), ignore_index=True) #works as you want

【讨论】：