Pandas dataframe.append 给出错误：计划形状未对齐答案

【问题标题】：Pandas dataframe.append giving Error: Plan shapes are not alignedPandas dataframe.append 给出错误：计划形状未对齐
【发布时间】：2018-03-03 14:51:18
【问题描述】：

我有两个数据框，其中包含下面提到的列。当我尝试将第二个附加到第一个时，出现 ValueError: Plan shapes are not aligned 错误。

Df1 列：

Index([                    u'asin',        u'view_publish_data',

                u'data_viewer',      u'relationship_viewer',
             u'parent_task_id',            u'submission_id',
                     u'source',            u'creation_date',
                 u'created_by',              u'vendor_code',
                       u'week',                u'processor',
                 u'brand_name',           u'brand_name_new',
               u'bullet_point',               u'cost_price',
          u'country_of_origin',                 u'cpu_type',
               u'cpu_type_new',                u'item_name',
          u'item_type_keyword',               u'list_price',
     u'minimum_order_quantity',                    u'model',
           u'product_category', u'product_site_launch_date',
        u'product_subcategory',          u'product_tier_id',
     u'replenishment_category',      u'product_description',
                 u'style_name',                       u'vc',
                u'vendor_code',     u'warranty_description'],
  dtype='object')

df2 列：

Index([                         u'asin',             u'view_publish_data',

                     u'data_viewer',           u'relationship_viewer',
                  u'parent_task_id',                 u'submission_id',
                          u'source',                 u'creation_date',
                      u'created_by',                   u'vendor_code',
                            u'week',                    u'brand_name',
                 u'bullet_features',                    u'color_name',
                             u'itk',                     u'item_name',
                      u'list_price',                     u'new_brand',
                u'product_catagory',          u'product_sub_catagory',
                 u'product_tier_id',        u'replenishment_category',
                       u'size_name',                    u'cost_price',
               u'item_type_keyword',                     u'our_price',
          u'is_shipped_from_vendor',      u'manufacturer_vendor_code',
             u'product_description',                  u'vendor_code'],
  dtype='object')

【问题讨论】：

怎么追加？
它们是具有不同列的两个不同数据框，您将如何附加这些？
我想根据列名附加它。如果两个数据框中都存在列名，则最终数据框将包含包含两者的列。如果该列不存在于其中之一中，则最终数据框中的该列应该具有 NAN 用于没有该列的那一列。

标签： python python-2.7 pandas sklearn-pandas

【解决方案1】：

您可以将concat 与align 一起使用，返回对齐的DataFrames 的元组：

cols1 = pd.Index([ u'asin', u'view_publish_data',

                u'data_viewer',      u'relationship_viewer',
             u'parent_task_id',            u'submission_id',
                     u'source',            u'creation_date',
                 u'created_by',              u'vendor_code',
                       u'week',                u'processor',
                 u'brand_name',           u'brand_name_new',
               u'bullet_point',               u'cost_price',
          u'country_of_origin',                 u'cpu_type',
               u'cpu_type_new',                u'item_name',
          u'item_type_keyword',               u'list_price',
     u'minimum_order_quantity',                    u'model',
           u'product_category', u'product_site_launch_date',
        u'product_subcategory',          u'product_tier_id',
     u'replenishment_category',      u'product_description',
                 u'style_name',                       u'vc',
                u'vendor_code',     u'warranty_description'])

cols2 = pd.Index([ u'asin', u'view_publish_data',

                     u'data_viewer',           u'relationship_viewer',
                  u'parent_task_id',                 u'submission_id',
                          u'source',                 u'creation_date',
                      u'created_by',                   u'vendor_code',
                            u'week',                    u'brand_name',
                 u'bullet_features',                    u'color_name',
                             u'itk',                     u'item_name',
                      u'list_price',                     u'new_brand',
                u'product_catagory',          u'product_sub_catagory',
                 u'product_tier_id',        u'replenishment_category',
                       u'size_name',                    u'cost_price',
               u'item_type_keyword',                     u'our_price',
          u'is_shipped_from_vendor',      u'manufacturer_vendor_code',
             u'product_description',                  u'vendor_code'])

df1 = pd.DataFrame([range(len(cols1))], columns=cols1)
df2 = pd.DataFrame([range(len(cols2))], columns=cols2)

df = pd.concat(list(df1.align(df2)), ignore_index=True)
print (df)

   asin  brand_name  brand_name_new  bullet_features  bullet_point  \
0     0          12            13.0              NaN          14.0   
1     0          11             NaN             12.0           NaN   

   color_name  cost_price  country_of_origin  cpu_type  cpu_type_new  ...   \
0         NaN          15               16.0      17.0          18.0  ...    
1        13.0          23                NaN       NaN           NaN  ...    

   style_name  submission_id    vc  vendor_code  vendor_code  vendor_code  \
0        30.0              5  31.0            9            9           32   
1         NaN              5   NaN            9           29            9   

   vendor_code  view_publish_data  warranty_description  week  
0           32                  1                  33.0    10  
1           29                  1                   NaN    10  

[2 rows x 46 columns]

【讨论】：

如果我有一个数据框列表，我该如何使用align 来完成它们？我很难弄清楚。
我发了一个question，有时间可以去看看。