【问题标题】:convert pandas dataframe to dictionary with multiple keys将熊猫数据框转换为具有多个键的字典
【发布时间】:2019-02-11 00:21:51
【问题描述】:

我正在尝试将数据框转换为具有四个键的字典,这些键都来自列。我还有多个列,我想使用从这四列构建的键返回值。我使用循环的方式工作,但最终出现内存错误。我很好奇有没有更有效的方法呢?

数据框如下所示:

    Service Bill Weight Zone    Resi    UPS FedEx   USPS    DHL
    1DEA           1       2    N      33.02    9999    9999    9999
    1DEA           2       2    N      33.02    9999    9999    9999
    1DEA           3       2    N      33.02    9999    9999    9999

我希望每个运营商都有一个这样的密钥:

    price[('1DEA', '1', '2', 'N', 'UPS')]=33.02
    price[('1DEA', '1', '2', 'N', 'FedEx')]=9999

我试过这个:

    price = {}
    carriers = ['UPS', 'FedEx', 'USPS','DHL'] 
    for carrier in carriers:
        for row in rate_keys.to_dict('records'):
              key = (row['Service'], row['Bill Weight'], row['Zone'], 
              row['Resi'], carrier)
              rate_keys[key] = row[carrier]

【问题讨论】:

    标签: pandas loops dictionary key jupyter


    【解决方案1】:

    IIUC,具有这样的列表理解:

    carriers = ['UPS', 'FedEx', 'USPS','DHL']
    price = {(row['Service'], row['Bill Weight'], row['Zone'], row['Resi'], c):row[c]
         for c in carriers for _, row in df.iterrows()}
    

    [输出]

    {('1DEA', 1, 2, 'N', 'UPS'): 33.02,
     ('1DEA', 2, 2, 'N', 'UPS'): 33.02,
     ('1DEA', 3, 2, 'N', 'UPS'): 33.02,
     ('1DEA', 1, 2, 'N', 'FedEx'): 9999,
     ('1DEA', 2, 2, 'N', 'FedEx'): 9999,
     ('1DEA', 3, 2, 'N', 'FedEx'): 9999,
     ('1DEA', 1, 2, 'N', 'USPS'): 9999,
     ('1DEA', 2, 2, 'N', 'USPS'): 9999,
     ('1DEA', 3, 2, 'N', 'USPS'): 9999,
     ('1DEA', 1, 2, 'N', 'DHL'): 9999,
     ('1DEA', 2, 2, 'N', 'DHL'): 9999,
     ('1DEA', 3, 2, 'N', 'DHL'): 9999}
    

    【讨论】:

      【解决方案2】:

      如果你这样做

      df = df.set_index(['Service', 'Bill','Weight','Zone'])
      

      你基本上有同样的东西

      输出

      print(df.loc[('1DEA', 1, 2, 'N')]['UPS'])
      
      9999.0
      

      【讨论】:

        【解决方案3】:

        您可能不应该在循环时更新rate_keys。我猜你的示例脚本的最后一行应该是

        price[key] = row[carrier]
        

        【讨论】:

          【解决方案4】:

          首先,

          temp = df.set_index(['Service', 'Bill', 'Weight', 'Zone']).to_dict()
          

          然后,我们进行字典推导以获得所需的输出,

          dict(((k+(i,)), a[i][k]) for i in temp for (k) in temp[i] )
          

          【讨论】:

            【解决方案5】:

            将索引设置为除载体列之外的所有索引,然后堆叠。

            df.set_index(['Service', 'Bill Weight', 'Zone', 'Resi']).stack().to_dict()
            
            {('1DEA', 1, 2, 'N', 'DHL'): 9999.0,
             ('1DEA', 1, 2, 'N', 'FedEx'): 9999.0,
             ('1DEA', 1, 2, 'N', 'UPS'): 33.02,
             ('1DEA', 1, 2, 'N', 'USPS'): 9999.0,
             ('1DEA', 2, 2, 'N', 'DHL'): 9999.0,
             ('1DEA', 2, 2, 'N', 'FedEx'): 9999.0,
             ('1DEA', 2, 2, 'N', 'UPS'): 33.02,
             ('1DEA', 2, 2, 'N', 'USPS'): 9999.0,
             ('1DEA', 3, 2, 'N', 'DHL'): 9999.0,
             ('1DEA', 3, 2, 'N', 'FedEx'): 9999.0,
             ('1DEA', 3, 2, 'N', 'UPS'): 33.02,
             ('1DEA', 3, 2, 'N', 'USPS'): 9999.0}
            

            理解

            {(*r[:4], c): v for r in df.values for c, v in zip(df.columns[4:], r[4:])}
            
            {('1DEA', 1, 2, 'N', 'DHL'): 9999,
             ('1DEA', 1, 2, 'N', 'FedEx'): 9999,
             ('1DEA', 1, 2, 'N', 'UPS'): 33.02,
             ('1DEA', 1, 2, 'N', 'USPS'): 9999,
             ('1DEA', 2, 2, 'N', 'DHL'): 9999,
             ('1DEA', 2, 2, 'N', 'FedEx'): 9999,
             ('1DEA', 2, 2, 'N', 'UPS'): 33.02,
             ('1DEA', 2, 2, 'N', 'USPS'): 9999,
             ('1DEA', 3, 2, 'N', 'DHL'): 9999,
             ('1DEA', 3, 2, 'N', 'FedEx'): 9999,
             ('1DEA', 3, 2, 'N', 'UPS'): 33.02,
             ('1DEA', 3, 2, 'N', 'USPS'): 9999}
            

            【讨论】:

              猜你喜欢
              • 2019-07-18
              • 2018-07-14
              • 2019-05-07
              • 2020-11-19
              • 2020-12-01
              • 2021-06-18
              • 2020-03-28
              • 2017-12-12
              相关资源
              最近更新 更多