【问题标题】:Multi Looping and Multi Splitting of Pandas DataFramePandas DataFrame 的多循环和多拆分
【发布时间】:2021-01-25 04:21:33
【问题描述】:

我有一个包含 22000 行作者姓名的 CSV 文件。

  1. 每一行都有多个作者姓名,用“;”分隔。
  2. 一行中的每个作者姓名都按“姓,名”的顺序排列。

我想将它们拆分并附加到新列,如下所示。

原始数据集预览

+------------------------------------+
|           author_full_name         |
+------------------------------------+
| Kahana, M J; Adler, M              |
|Gautam, H; Potdar, G G; Vidya, T N C|
+------------------------------------+

预期输出

+------------------------------------+------------------------------------------+
|           author_full_name         | author_first_names| author_last_names    |
+------------------------------------+------------------------------------------+
| Kahana, M J; Adler, M              |      M J; M       | Kahana; Adler        |
|Gautam, H; Potdar, G G; Vidya, T N C|     H; G G; T N C | Gautam; Potdar; Vidya|
+------------------------------------+------------------------------------------+

如何使用 pandas 完成此任务?

【问题讨论】:

    标签: python pandas csv data-science data-cleaning


    【解决方案1】:

    这里的逻辑本质上是先用;分割,然后用,分割每个值,并将它们的第一个值作为;ast name,第二个值作为名字

    >>> [x.split(",")[0] for x in "Gautam, H; Potdar, G G; Vidya, T N C".split(";")]
    >>> ['Gautam', ' Potdar', ' Vidya']
    

    在 pandas 中使用 apply:

    import pandas as pd 
    df = pd.DataFrame({"Name":["Gautam, H; Potdar, G G; Vidya, T N C","Kahana, M J; Adler, M "]})
    df['author_last_names'] = df['Name'].apply(lambda x: ";".join([ele.split(",")[1] for ele in x.split(";")]))
    df['author_first_names'] = df['Name'].apply(lambda x: ";".join([ele.split(",")[0] for ele in x.split(";")]))
    
    df
    

    输出:

    ------------------------------------|-----------------|------------------------
    Gautam, H; Potdar, G G; Vidya, T N C  H; G G; T N C      Gautam; Potdar; Vidya
    Kahana, M J; Adler, M                 M J; M             Kahana; Adler
    ------------------------------------|-----------------|------------------------
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-11-17
      • 1970-01-01
      • 2022-12-29
      • 2017-05-05
      • 1970-01-01
      • 2018-05-28
      • 1970-01-01
      • 2018-06-08
      相关资源
      最近更新 更多