【问题标题】:Find occurences of list in a list of list in a dataframe column在数据框列中的列表列表中查找列表的出现
【发布时间】:2019-10-31 06:50:52
【问题描述】:

我有一个数据框 df,它有一列。

data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                    ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
df = pd.DataFrame(data,columns= ['details'])
df

我想将数据框拆分为不同的列,并得到一个看起来像这样的数据框 -

data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                    ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']],
        'brand': ['honda', 'toyota', 'honda', 'toyota'],
        'car': ['city','innova','','corolla'],
        'colour': ['black','','red','white'],
        'type': ['','','','sedan']
        }
df2 = pd.DataFrame(data,columns= ['details', 'brand', 'car', 'colour', 'type'])
df2

我尝试了以下方法,但没有成功 -

a2 = []
b2 = []
c2 = []
d2 = []
for i in df['details']:
    for j in range(len(i)):
        if 'brand :' in i[j]:
            print 'lalala'
            a1 = i[j]
            a2.append(a1)
        else:
            a1 = ''
            a2.append(a1)
        if 'car :' in i[j]:
            print 'lalala'
            b1 = i[j]
            b2.append(b1)
        else:
            b1 = ''
            b2.append(b1)
        if 'colour :' in i[j]:
            c1 = i[j]
            c2.append(c1)
        else:
            c1 = ''
            c2.append(c1)
        if 'type :' in i[j]:
            d1 = i[j]
            d2.append(d1)
        else:
            d1 = ''
            d2.append(d1)
df['brand'] = a2
df['car'] = b2
df['colour'] = c2
df['type'] = d2

请帮忙,因为我遇到了一个重大障碍。

【问题讨论】:

  • 先根据key将字典详情转换成列表,然后传入数据会容易很多。
  • 了解问题中列出的语言和库版本等详细信息会有所帮助。

标签: python string list dataframe


【解决方案1】:

假设详细信息类型已知,您可以尝试以下操作:

details_types = ['brand', 'car', 'colour', 'type']

for x in details_types :
    df[x] = None

for idx, value in df.iterrows(): 
    for col_details in df.iloc[idx, 0]:
        feature = col_details.replace(' ', '').split(':')[0]
        value = col_details.replace(' ', '').split(':')[1]
        df.iloc[idx, list(df.columns).index(feature)] = value

输出

|   |                      details                      | brand  |   car   | colour | type  |
|---|---------------------------------------------------|--------|---------|--------|-------|
| 0 | [brand : honda, car : city, colour : black]       | honda  | city    | black  | None  |
| 1 | [brand : toyota, car : innova]                    | toyota | innova  | None   | None  |
| 2 | [brand : honda, colour : red]                     | honda  | None    | red    | None  |
| 3 | [brand : toyota, car : corolla, colour : white... | toyota | corolla | white  | sedan |

【讨论】:

    【解决方案2】:

    一个稍微简单的方法可能如下 -

    data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                        ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
    
    #takes a string and returns a dict based on ':'
    def fix(l):
        return dict(s.split(':') for s in l)
    
    #flatten and fix the lists of lists to get a list of dicts
    dicts = [fix(i) for sublist in data.values() for i in sublist]
    
    #Add the lists into a single dataframe (optional add the 'Details' column)
    df = pd.DataFrame.from_dict(dicts)
    df['details'] = pd.DataFrame.from_dict(data)  #adding 'Details' col
    print(df)
    
        brand       car  colour    type   \
    0    honda      city   black     NaN   
    1   toyota    innova     NaN     NaN   
    2    honda       NaN     red     NaN   
    3   toyota   corolla   white   sedan   
    
                                                 details  
    0        [brand : honda, car : city, colour : black]  
    1                     [brand : toyota, car : innova]  
    2                      [brand : honda, colour : red]  
    3  [brand : toyota, car : corolla, colour : white...  
    

    【讨论】:

      【解决方案3】:
      import pandas as pd
      from collections import ChainMap
      data = {'details': [['brand : honda', 'car : city', 'colour : black'],['brand : toyota', 'car : innova'],
                      ['brand : honda', 'colour : red'], ['brand : toyota', 'car : corolla', 'colour : white', 'type : sedan']]}
      #STEP_1
      lists=[[{y.split(':')[0]:y.split(':')[1]} for y in x] for x in data['details']]
      #STEP_2
      data_df = [dict(ChainMap(*x)) for x in lists]
      #STEP_3
      data_df=pd.DataFrame(data_df)
      #STEP_4
      data_df['details']=data['details']
      print(data_df)
      '''Explanation:
      STEP_1: It creates list of lists with dictionary elements
      
      [[{'brand ': ' honda'}, {'car ': ' city'}, {'colour ': ' black'}],
      [{'brand ': ' toyota'}, {'car ': ' innova'}],
      [{'brand ': ' honda'}, {'colour ': ' red'}],
      [{'brand ': ' toyota'},
      {'car ': ' corolla'},
      {'colour ': ' white'},
      {'type ': ' sedan'}]]
      
      STEP_2: It is to convert list of lists to list of dictionaries
      
      [{'colour ': ' black', 'car ': ' city', 'brand ': ' honda'},
      {'car ': ' innova', 'brand ': ' toyota'},
      {'colour ': ' red', 'brand ': ' honda'},
      {'type ': ' sedan',
      'colour ': ' white',
      'car ': ' corolla',
      'brand ': ' toyota'}]
      
      STEP_3: As we can directly create a dataframe from list of 
      dictionaries, it creates a dataframe with 4 columns that are brand, 
      car, color & type
      
      STEP_4: Add the column 'details' using the 'data' variable'''
      

      【讨论】:

      • 虽然此代码可以解决问题,including an explanation 说明如何以及为什么解决问题将真正有助于提高您的帖子质量,并可能导致更多的赞成票。请记住,您正在为将来的读者回答问题,而不仅仅是现在提问的人。请编辑您的答案以添加解释并说明适用的限制和假设。
      【解决方案4】:

      用途:

      代码

      # extract the patterns
      pattern = r"(?:brand : (?P<brand>\w+))|(?:car : (?P<car>\w+))|(?:colour : (?P<colour>\w+))|(?:type : (?P<type>\w+))"
      expanded = df.explode("details")["details"].str.extract(pattern)
      
      # convert to expected format after extracting the patterns
      new = expanded.groupby(level=0).first().fillna("")
      print(new)
      

      输出

          brand      car colour   type
      0   honda     city  black       
      1  toyota   innova              
      2   honda             red       
      3  toyota  corolla  white  sedan
      

      在您可以通过以下方式将所有内容连接在一起之后:

      result = pd.concat([df, new], axis=1)
      print(result)
      

      输出 (完整)

                                                   details   brand  ... colour   type
      0        [brand : honda, car : city, colour : black]   honda  ...  black       
      1                     [brand : toyota, car : innova]  toyota  ...              
      2                      [brand : honda, colour : red]   honda  ...    red       
      3  [brand : toyota, car : corolla, colour : white...  toyota  ...  white  sedan
      
      [4 rows x 5 columns]
      

      【讨论】:

        猜你喜欢
        • 2018-05-08
        • 1970-01-01
        • 2018-11-11
        • 2015-10-08
        • 2017-03-26
        • 2023-01-17
        • 2017-07-09
        • 2011-06-18
        • 1970-01-01
        相关资源
        最近更新 更多