【问题标题】:converting list like column values into multiple rows using Pandas DataFrame使用 Pandas DataFrame 将列表等列值转换为多行
【发布时间】:2018-09-18 12:38:36
【问题描述】:

CSV 文件:(sample1.csv)

Location_City, Location_State, Name, hobbies
Los Angeles,   CA,             John, "['Music', 'Running']"
Texas,         TX,             Jack, "['Swimming', 'Trekking']"

我想将 CSV 的 hobbies 列转换为以下输出

Location_City, Location_State, Name, hobbies
Los Angeles,   CA,             John, Music
Los Angeles,   CA,             John, Running
Texas,         TX,             Jack, Swimming
Texas,         TX,             Jack, Trekking

我已将csv读入dataframe,但不知道如何转换?

 data = pd.read_csv("sample1.csv") 
 df=pd.DataFrame(data)
 df

【问题讨论】:

  • 你能澄清一下爱好列中的值是列表还是字符串?
  • 当它进入数据帧时,它显示 dtype:object

标签: python pandas dataframe


【解决方案1】:

您可以使用findallextractallhobbies 列中获取列表,然后用chain.from_iterable 展平并重复其他列:

a = df['hobbies'].str.findall("'(.*?)'").astype(np.object)
lens = a.str.len()

from itertools import chain

df1 = pd.DataFrame({
    'Location_City' : df['Location_City'].values.repeat(lens),
    'Location_State' : df['Location_State'].values.repeat(lens),
    'Name' : df['Name'].values.repeat(lens),
    'hobbies' : list(chain.from_iterable(a.tolist())), 
})

或者创建Series,删除第一级和join为原来的DataFrame

df1 = (df.join(df.pop('hobbies').str.extractall("'(.*?)'")[0]
               .reset_index(level=1, drop=True)
               .rename('hobbies'))
         .reset_index(drop=True))

print (df1)

  Location_City Location_State  Name   hobbies
0   Los Angeles             CA  John     Music
1   Los Angeles             CA  John   Running
2         Texas             TX  Jack  Swimming
3         Texas             TX  Jack  Trekking

【讨论】:

    【解决方案2】:

    我们可以使用0.25.0 版本中引入的pandas.DataFrame.explode 函数来解决这个问题,如果您有相同或更高版本,您可以使用以下代码。
    爆破函数参考:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html

    import pandas as pd
    import ast
    
    data = {
        'Location_City': ['Los Angeles','Texas'],
        'Location_State': ['CA','TX'],
        'Name': ['John','Jack'],
        'hobbies': ["['Music', 'Running']", "['Swimming', 'Trekking']"]
    }
    df = pd.DataFrame(data)
    
    # Converting a string representation of a list into an actual list object
    
    list_eval = lambda x: ast.literal_eval(x)
    df['hobbies'] = df['hobbies'].apply(list_eval)
    
    # Exploding the list
    df = df.explode('hobbies')
    
    print(df)
    
      Location_City Location_State  Name   hobbies
    0   Los Angeles             CA  John     Music
    0   Los Angeles             CA  John   Running
    1         Texas             TX  Jack  Swimming
    1         Texas             TX  Jack  Trekking
    

    【讨论】:

      猜你喜欢
      • 2021-03-01
      • 2019-11-28
      • 2013-11-04
      • 2013-02-13
      • 2014-01-05
      • 2021-11-28
      • 1970-01-01
      相关资源
      最近更新 更多