Python根据日期从CSV中删除行答案

【问题标题】：Python delete line from CSV based on datePython根据日期从CSV中删除行
【发布时间】：2018-12-14 17:38:15
【问题描述】：

我正在使用 python 收集温度数据，但只想存储最近 24 小时的数据。

我目前正在用这个生成我的 .csv 文件

while True:
    tempC = mcp.temperature
    tempF = tempC * 9 / 5 + 32
    timestamp = datetime.datetime.now().strftime("%y-%m-%d %H:%M   ")

    f = open("24hr.csv", "a")
    f.write(timestamp)
    f.write(',{}'.format(tempF))
    f.write("\n")
    f.close()

.csv 看起来像这样

这个输出的 .csv 看起来像这样

18-12-13 10:58   ,44.7125
18-12-13 11:03   ,44.6
18-12-13 11:08   ,44.6
18-12-13 11:13   ,44.4875
18-12-13 11:18   ,44.6
18-12-13 11:23   ,44.4875
18-12-13 11:28   ,44.7125

我不想翻车，只保留最近 24 小时的数据。由于我每 5 分钟采样一次数据，因此 24 小时后我的 CSV 文件中应该有 144 行。所以如果我使用 readlines() 我可以知道我有多少行但是我如何摆脱任何超过 24 小时的行？这是我想出的，显然行不通。有什么建议吗？

f = open("24hr.csv","r")
lines = f.readlines()
f.close()

if lines => 144:
   f = open("24hr.csv","w")
   for line in lines:
       if line <= "timestamp"+","+"tempF"+\n":
           f.write(line)
           f.close()

【问题讨论】：

请详细说明“显然不起作用”：它与您想要的有什么不同？
你能给我们看一个文件样本吗？
我认为你的数学计算有误，24 小时每 5 分钟一共 288 行。为您编写解决方案，坚持下去
是的，你说得对，应该是 288 而不是 144

标签： python timestamp readlines

【解决方案1】：

您已经完成了大部分工作。我有几个建议。

使用with。这意味着如果您的程序中途出现错误并引发异常，该文件将被正确关闭。
从文件中解析时间戳并将其与当前时间进行比较。
使用len 检查list 的长度。

这是修改后的程序：

import datetime

with open("24hr.csv","r") as f:
    lines = f.readlines()  # read out the contents of the file

if len(lines) >= 144:
   yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
   with open("24hr.csv","w") as f:
       for line in lines:
           line_time_string = line.split(",")[0]
           line_time = datetime.datetime.strptime(line_time_string, "%y-%m-%d %H:%M   ")

           if line_time > yesterday:  # if the line's time is after yesterday
               f.write(line)  # write it back into the file

这段代码不是很干净（不符合 PEP-8），但您可以看到一般流程。

【讨论】：

【解决方案2】：

你用的是 linux 吗？如果你只需要最后 144 行，你可以试试

tail -n 144 file.csv

你也可以找到 Windows 的尾巴，我用 CMDer 找到了一个。如果您必须使用 python 并且您有适合 RAM 的小文件，请使用 readlines() 将其加载到列表中，将其剪切 (lst = lst[:144]) 并重写。如果你不知道你有多少行 - 用 https://docs.python.org/3.7/library/csv.html 解析它，将时间解析为 python 日期时间（它类似于你原来写的时间）并按条件写行

【讨论】：

【解决方案3】：

如果你是Linux或者喜欢，正确的做法是实现logrotaion

【讨论】：

不，我特别不想轮换日志。除非那里有我不熟悉的用法。我想要最后 24 小时，而不是上次轮换时的 24 小时 +n。
@SR。这仍然是正确的方法。您以后可能需要这些日志。

【解决方案4】：

鉴于 288 行不会占用太多内存，我认为只需读取行、截断文件并放回所需的行就可以了：

# Unless you are working in a system with limited memory
# reading 288 lines isn't much
def remove_old_entries(file_):
    file_.seek(0)  # Just in case go to start
    lines = file_.readlines()[-288:]  # Read the last 288 lines
    file_.truncate(0)  # Empty the file
    file_.writelines(lines)  # Put back just the desired lines

    return _file

while True:
    tempC = mcp.temperature
    tempF = tempC * 9 / 5 + 32
    timestamp = datetime.datetime.now().strftime("%y-%m-%d %H:%M   ")

    with open("24hr.csv", "r+") as file_:
        file_ = remove_old_entries(file_)  # Consider that the function will return the file at the end
        file_.write('{},{}\n'.format(timestamp, tempF))

    # I hope mcp.temperature is blocking or you are sleeping out the 5min
    # else this file reading in an infinite loop will get out of hand
    # time.sleep(300)  # Call me maybe

【讨论】：

这实际上读取了所有行，然后丢弃除最后 288 行之外的所有行。它一次将它们全部存储在内存中。
鉴于程序的性质，丢弃的行数不应超过 1 行，我打算使用 [1:] 但如果有一个少于 288 行的新文件，它将无法按预期工作, 如果少于 288 行，则使用负切片，无论如何都会将它们全部占用
我想到了一种性能更高的方法，方法是以二进制模式打开文件并从末尾开始搜索，但这依赖于固定长度的行 {.04f} 的温度，但不确定 OP 是否可以改变那种格式
你仍然可以这样做，计算 \n 字符的数量（允许第一个字符是或不是新行，以防文件格式错误）直到你得到 288 行，到尽量减少需要在内存中的文件量。