【问题标题】:creating a python dictionary from two pandas dataframe从两个熊猫数据框创建一个python字典
【发布时间】:2021-11-11 03:16:05
【问题描述】:

我正在尝试从两个 pandas 数据帧创建一个字典,以下是假设保存键的数据帧的快照:

C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000007.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000012.jpg

以下数据帧快照是字典的值:

324,339,263,211,9
253,372,165,264,9
67,374,5,244,9
295,299,241,194,9

所以我想将每两行作为键和值附加到一个字典中 这是我尝试过的:

import pandas as pd
import numpy as np
image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')

image_anno_dict={}

for image_file, row in zip(image_files,file.iterrows()):
    image_anno_dict[image_file]=np.array(row)

我的预期输出:

{'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [324,339,263,211,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [253,372,165,264,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [67,374,5,244,9]
.
.
.
}

但是代码只适用于第一行,有什么解决方案的建议吗?

打印(image_files.head(5)):

C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
0  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
1  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
2  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
3  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
4  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...

打印(file.head(5)):

     0    1    2    3  4
0  324  339  263  211  9
1  253  372  165  264  9
2   67  374    5  244  9
3  295  299  241  194  9
4  312  220  277  186  9

【问题讨论】:

  • 请立即查看,谢谢
  • 预期输出,因为字典始终具有唯一键。
  • 哦,明白了,有什么办法可以解决这个问题吗?
  • 可以使用元组列表来代替dict。
  • 这是因为我在一张图片中有多个对象,所以我必须多次重复同一张图片。

标签: python-3.x pandas dataframe loops dictionary


【解决方案1】:

您可以使用 pandas Series 组合两个数据帧,然后通过调用 to_dict 方法进行转换。这里是working sample code

import pandas as pd

 
df1 = pd.DataFrame({'df1Keys':['ab','bc','c','df','efg']})
df2 = pd.DataFrame({'df2Vlues':[1,25,3,84,545]})

#method 1
print(pd.Series(df2.df2Vlues.values,index=df1.df1Keys).to_dict())

#method 2
print(dict(zip(df1.df1Keys,df2.df2Vlues))) 

【讨论】:

    【解决方案2】:
    import pandas as pd
    import numpy as np
    
    image_files = pd.read_csv('image_files.csv', header=None)
    file = pd.read_csv('Training_dataset.csv')
    
    image_anno_list = list(zip(image_files[0], file.apply(np.array, axis=1)))
    

    输出:

    >>> image_anno_list
    
    [('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
      array([324, 339, 263, 211,   9])),
     ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
      array([253, 372, 165, 264,   9])),
     ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
      array([ 67, 374,   5, 244,   9])),
     ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
      array([295, 299, 241, 194,   9])),
     ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
      array([312, 220, 277, 186,   9]))]
    

    如果你使用字典,你会得到这个:

    image_anno_dict = dict(zip(image_files[0], file.apply(np.array, axis=1)))
    
    >>> image_anno_dict
    
    {'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg':
     array([312, 220, 277, 186,   9])}
    

    【讨论】:

      【解决方案3】:

      您可以使用collections.defaultdictlist 默认创建dictionary,如下所示:

      from collections import defaultdict
      import pandas as pd
      import numpy as np
      
      image_files=pd.read_csv('image_files.csv')
      file = pd.read_csv('Training_dataset.csv')
      
      image_anno_dict=defaultdict(list)
      
      for image_file, row in zip(image_files,file.iterrows()):
          image_anno_dict[image_file].append(np.array(row))
      

      输出:

      {'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg' :
       [
          [324,339,263,211,9], [253,372,165,264,9] , [67,374,5,244,9], ...
       ]
       ,
       ...
       , 
       'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg' : 
       [
           [253,372,165,264,9] , [67,374,5,244,9], ...
       ], 
       ...
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2018-11-18
        • 2016-01-14
        • 2019-11-12
        • 2014-11-22
        • 1970-01-01
        • 2019-01-11
        • 2018-02-26
        相关资源
        最近更新 更多