【问题标题】:PANDAS - converting a column with lists as values to dummy variablesPANDAS - 将列表作为值的列转换为虚拟变量
【发布时间】:2019-12-07 12:52:42
【问题描述】:

我正在处理一个Airbnb列表数据集。其中一列称为舒适设施, 并包含列表必须提供的所有便利设施。 几个例子:

[Internet, Wifi, Paid parking off premises]

[Internet, Wifi, Kitchen]

[Wifi, Smoking allowed, Heating]

我想用几个二进制列替换这个列,一个用于各种便利性。 例如,其中之一将是:

wifi --> 0,0,0,1,1,0,1,1,0,1,0,1 

我找到了一种使用 for 循环实现此目的的方法:

all_amenities = []
for row in amenities:
    all_amenities += row

all_amenities = set(all_amenities)
for col in all_amenities:
    df[col] = 0

for i,amenities_of_listing in enumerate(amenities):
    for amenity in amenities_of_listing:
        df.loc[i,amenity] = 1

但这需要很长时间才能运行 - 这里有人能想出一个更热情的方法吗?

【问题讨论】:

    标签: python pandas data-processing


    【解决方案1】:

    我相信你需要MultiLabelBinarizer 如果大DataFrame 会很好用:

    print (df)
                                       amenisities
    0  [Internet, Wifi, Paid parking off premises]
    1                    [Internet, Wifi, Kitchen]
    2             [Wifi, Smoking allowed, Heating]
    
    from sklearn.preprocessing import MultiLabelBinarizer
    
    mlb = MultiLabelBinarizer()
    df1 = pd.DataFrame(mlb.fit_transform(df['amenisities']),columns=mlb.classes_)
    print (df1)
       Heating  Internet  Kitchen  Paid parking off premises  Smoking allowed  \
    0        0         1        0                          1                0   
    1        0         1        1                          0                0   
    2        1         0        0                          0                1   
    
       Wifi  
    0     1  
    1     1  
    2     1 
    

    【讨论】:

      【解决方案2】:

      IIUC,你也可以试试pd.get_dummies()series.str.get_dummies()

      pd.get_dummies(s.explode()).max(level=0)
      

      或者:

      s.str.join('|').str.get_dummies()
      

      s 替换为df['column_name']


         Heating  Internet  Kitchen  Paid parking off premises  Smoking allowed  \
      0        0         1        0                          1                0   
      1        0         1        1                          0                0   
      2        1         0        0                          0                1   
      
         Wifi  
      0     1  
      1     1  
      2     1  
      

      【讨论】:

        猜你喜欢
        • 2017-07-10
        • 2020-11-28
        • 2021-07-18
        • 1970-01-01
        • 2018-03-22
        • 2015-07-28
        • 1970-01-01
        • 1970-01-01
        • 2017-09-18
        相关资源
        最近更新 更多