【发布时间】:2018-02-22 23:36:44
【问题描述】:
for i in [train1,test1]:
df_dummies = pd.get_dummies(i['Name'], prefix='Name',dummy_na=False)
#print(df_dummies.head())
#i.drop('Name',1,inplace=True)
i = pd.concat([i,df_dummies],axis=1)
print(i.head())
输出:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare \
0 892 3 Mr. 1 34.5 0 0 330911 7.8292
1 893 3 Mrs. 0 47.0 1 0 363272 7.0000
2 894 2 Mr. 1 62.0 0 0 240276 9.6875
3 895 3 Mr. 1 27.0 0 0 315154 8.6625
4 896 3 Mrs. 0 22.0 1 1 3101298 12.2875
Embarked Name_Dr. Name_Master. Name_Miss. Name_Mr. Name_Mrs. \
0 2 0 0 0 1 0
1 0 0 0 0 0 1
2 2 0 0 0 1 0
3 0 0 0 0 1 0
4 0 0 0 0 0 1
Name_Rev. Name_other
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
但是当在for 循环之外再次验证时,我没有得到虚拟变量
print(test1.head())
输出:
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare \
0 892 3 Mr. 1 34.5 0 0 330911 7.8292
1 893 3 Mrs. 0 47.0 1 0 363272 7.0000
2 894 2 Mr. 1 62.0 0 0 240276 9.6875
3 895 3 Mr. 1 27.0 0 0 315154 8.6625
4 896 3 Mrs. 0 22.0 1 1 3101298 12.2875
Embarked
0 2
1 0
2 2
3 0
4 0
很明显我在这里遗漏了一些东西,请帮我找出错误,我认为它与数据帧的副本/地址有关
【问题讨论】:
-
您将输出分配给
i,而不是test1。也许需要test1 = pd.concat([i,df_dummies],axis=1) -
好的
i在第二次迭代中是test1
标签: python list pandas loops dataframe