【问题标题】:Regular Expression search/replace on columns with python pandas使用 python pandas 对列进行正则表达式搜索/替换
【发布时间】:2021-06-09 15:38:20
【问题描述】:

以下是我尝试对其进行一些数据操作的 .csv 文件的一个小示例。每个“comment”列都有自己的列,由半冒号分隔(“date;user;comment”)。我的目标是在用户前面加上“gp-”

原文:

issue_key,summary,comment,comment,comment,comment,resolution
ABC-1234,summary1,"03/11/2021 12:18;user1;a text comment","03/10/2021 11:18;user2,a text comment",,,Unresolved
ABC-4321,summary2,"03/08/2021 12:10;user7;a text comment","03/10/2021 11:18;user5,a text comment",,,Unresolved
ABC-2214,summary3,"03/09/2021 12:20;user9;a text comment",,"03/10/2021 11:18;user3,a text comment",,Unresolved

我希望它变成什么:

issue_key,summary,comment,comment,comment,comment,resolution
ABC-1234,summary1,"03/11/2021 12:18;gp-user1;a text comment","03/10/2021 11:18;gp-user2,a text comment",,,Unresolved
ABC-4321,summary2,"03/08/2021 12:10;gp-user7;a text comment","03/10/2021 11:18;gp-user5,a text comment",,,Unresolved
ABC-2214,summary3,"03/09/2021 12:20;gp-user9;a text comment",,"03/10/2021 11:18;gp-user3,a text comment",,Unresolved

到目前为止我的代码。我想我很接近了:

with open(destination_filename) as f:
    orig_header = f.readline()
orig_header = orig_header.split(",")
orig_header[-1] = orig_header[-1].strip()
csv_data = pd.read_csv(destination_filename)
cols = csv_data.columns[csv_data.columns.str[:7]=='Comment']
csv_data[cols] = csv_data[cols].apply(lambda x: re.sub(r'(\d+\/\d+\/\d\d\d\d \d+:\d+);(\S+);(.*)', r'\1;gp-\2;\3', str(x)))
csv_data.to_csv(f"{destination_filename}", index = False, header=orig_header)

【问题讨论】:

    标签: python regex pandas csv


    【解决方案1】:

    一种方法是使用内置的csv 库。它还可以用于将注释字段处理为; 分隔的csv 行。

    例如:

    import io
    import csv
    
    def replace_user(entry):
        if len(entry):
            values = next(csv.reader(io.StringIO(entry, newline=''), delimiter=';'))
            values[1] = f'gp-{values[1]}'
            entry = ';'.join(values)
        return entry
    
    
    with open('input.csv', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
        csv_input = csv.reader(f_input)
        csv_output = csv.writer(f_output)
        csv_output.writerow(next(csv_input)) # copy the header
        
        for row in csv_input:
            row[2:6] = [replace_user(v) for v in row[2:6]]
            csv_output.writerow(row)
    

    给你一个output.csv 包含:

    issue_key,summary,comment,comment,comment,comment,resolution
    ABC-1234,summary1,03/11/2021 12:18;gp-user1;a text comment,"03/10/2021 11:18;gp-user2,a text comment",,,Unresolved
    ABC-4321,summary2,03/08/2021 12:10;gp-user7;a text comment,"03/10/2021 11:18;gp-user5,a text comment",,,Unresolved
    ABC-2214,summary3,03/09/2021 12:20;gp-user9;a text comment,,"03/10/2021 11:18;gp-user3,a text comment",,Unresolved
    

    如果 cmets 也可以有引号或换行符,则可以使用额外的 csv.writer() 代替 join()

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2010-10-30
      • 1970-01-01
      • 2022-06-10
      • 1970-01-01
      • 2019-09-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多