【问题标题】:Convert for loop which calls different functions to a generator将调用不同函数的 for 循环转换为生成器
【发布时间】:2018-12-06 15:44:50
【问题描述】:

我正在读取 csv 列表、执行计算并将输出写入驱动器。数据集很大(16gb RAM 上的 2gb csv),计算成本很高,输出也很大。因此我想使用发电机;这样我就可以一次写一个输出文件。我使用的功能很大,因此不在此处粘贴。我认为带有最终 for 循环的生成器可以完成这项工作。

#read csv with indexing, uecols, dtypes ~ 1 DF is approx 50mb in memory
def reader_1sec_csv(filepath,.....):
    return pd.read_csv(filepath,........)

#processing df
def booleans (dataframe):
    dataframe = dataframe[....]
    ....
    ....
    return dataframe

#processing, row by row operation
def activity (row):
    val = "Unknown"
    ...
    ...
    return val

#apply the functions to a list of filepaths and write to outdir
def csv_to_result (filepath, outdir =".."):
    dataframe = booleans(reader_1sec_csv(filepath))
    dataframe["Activity"] = dataframe.apply(activity, axis=1)
    out = dataframe[["Activity"]]
    out.to_csv(os.path.join(outdir, os.path.splitext(os.path.basename(filepath))[0]+'_A.csv'))

#Can I make this into a generator? Keeping the exceptions list?
exceptions = []
for i in filepaths:
    try:
        csv_to_outcsv(i)
    except:
        exceptions.append(i)
        continue

【问题讨论】:

    标签: python-3.x pandas generator


    【解决方案1】:

    我正在附加例外列表。所以我使用time 模块将程序设置为每5分钟运行一次。因为在生成器中定义所有异常并不容易。

    exceptions_list = []
    def gen_out(paths):
    
        for i in paths:
            try:
                yield csv_to_outcsv(i)
            except:
                exceptions_list.append(i)
                continue
    
    def periodic_work(interval):
        while True:
            next(gen_out(filepaths)
            time.sleep(interval)
    

    【讨论】:

      猜你喜欢
      • 2011-11-08
      • 1970-01-01
      • 1970-01-01
      • 2020-06-03
      • 1970-01-01
      • 2013-05-31
      • 1970-01-01
      • 2021-09-16
      相关资源
      最近更新 更多