【问题标题】:How to filter rows between two dates from CSV file using python and redirect to another file?如何使用python过滤CSV文件中两个日期之间的行并重定向到另一个文件?
【发布时间】:2019-02-04 09:26:43
【问题描述】:

我是 Python 的新手。我有一个带有以下数据的 CSV 文件作为示例。我想跳过特定日期范围(2018-08-01 到 2018-08-28)之间的行并将输出重定向到单独的 CSV 文件。请注意,标题“LAST USE”中有一个空格。

NUMBER,MAIL,COMMENT,COUNT,LAST USE,PERCENTAGE,TEXTN
343,user1@example.com,"My comment","21577",2018-08-06,80.436%,
222,user2@example.com,"My comment","31181",2018-07-20,11.858%,
103,user3@example.com,"My comment",540,2018-06-14,2.013%,
341,user4@example.com,"My comment",0,N/A,0.000%,

任何想法将不胜感激。

【问题讨论】:

    标签: python python-3.x pandas csv datetime


    【解决方案1】:

    使用 Pandas,这很简单:

    import pandas as pd
    
    # read file
    df = pd.read_csv('file.csv')
    
    # convert to datetime
    df['LAST USE'] = pd.to_datetime(df['LAST USE'])
    
    # calculate mask
    mask = df['LAST USE'].between('2018-08-01', '2018-08-28')
    
    # output masked dataframes
    df[~mask].to_csv('out1.csv', index=False)
    df[mask].to_csv('out2.csv', index=False)
    

    您还可以组合布尔数组来构造mask。例如:

    m1 = df['LAST USE'] >= (pd.to_datetime('now') - pd.DateOffset(days=30))
    m2 = df['LAST USE'] <= pd.to_datetime('now')
    mask = m1 & m2
    

    【讨论】:

    • 绝对完美的先生。只有一个查询,如何使日期动态?我试过 - mask = df['LAST USE'].between(date.today() - timedelta(30), date.today()) 但似乎不行。
    • @SamironMallick,有 许多 方法可以解决 那个 问题,看看其他答案,例如herehere。如果您仍然卡住,请询问a new question,准确显示您卡住的位置。
    【解决方案2】:

    dict 阅读器文档:https://docs.python.org/3/library/csv.html#csv.DictReader

    Strptime 文档:https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

    基本上,我们首先将 CSV 文件作为一组 python 字典打开 - 每行一个,然后遍历 CSV 中的所有行。

    对于每一行,我们将日期/时间字符串转换为实际的日期/时间对象,然后 python 可以将其与您的日期范围进行比较。如果值在范围内,我们会将整行写入单独的 CSV 文件。

    import datetime, csv
    
    #define all the fieldnames in the input CSV file (for use in creating / appending to output CSV file)
    fieldnames = ['NUMBER','MAIL','COMMENT','COUNT','LAST USE','PERCENTAGE','TEXTN']
    
    #open input CSV file as readonly
    with open("input.csv", "r") as fin:
        #create a CSV dictionary reader object
        csv_dreader = csv.DictReader(fin)
        #iterate over all rows in CSV dict reader
        for row in csv_dreader:
            #check for invalid Date values
            if 'N/A' not in row['LAST USE']:
                #convert date string to a date object
                datetime_val = datetime.datetime.strptime(row['LAST USE'], '%Y-%m-%d')
                #check if date falls within requested range
                if datetime_val > datetime.datetime(2018, 8, 1) and datetime_val < datetime.datetime(2018, 8, 28):                
                    #if it does, open output CSV file for appending
                    with open("output.csv", "a") as fout:
                        #create a csv writer object using the fieldnames defined above
                        csv_writer = csv.DictWriter(fout, fieldnames=fieldnames)
                        #write the current row (from the input CSV) to the output CSV file
                        csv_writer.writerow(row)
    

    【讨论】:

      猜你喜欢
      • 2022-01-13
      • 2021-11-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-10-13
      相关资源
      最近更新 更多