在文件中的特定行之后查找包含时间戳的第一行答案

【问题标题】：Find the first line containing a time stamp after a specific line in a file在文件中的特定行之后查找包含时间戳的第一行
【发布时间】：2015-07-16 08:35:14
【问题描述】：

我正在尝试将时间戳从文件中添加到我的搜索结果中。

我的代码是：

def findIcommingStats():
    #read the result file
    replication_file = open("result.log", "r")

    #create a new temp file for all the prints we will find
    tempFile = open("incomingTemp.txt", "w")

    #loop over the file and move all relevant lines to another temp file
    for line in replication_file:
            if ((line.find('STATISTICS') >= 0) & ( line.find('DeltaMarkerIncomingData') > 0 ) & ( line.find('Counter') == -1  ) &
                     ( line.find('0.00e+00') == -1 ) & ( line.find('0.00') == -1 ) & ( line.find('description') == -1 ) ):
                            tempFile.write(line)
    #cleanup
    replication_file.close()
    tempFile.close()

这为我提供了我在文件中搜索的字符串，如下所示： “统计信息：name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 秒窗口：速率：3.53e-06 MB/秒”

在此之前的时间戳约为 20-30 行。我怎样才能让它们在字符串之前打印在 \ 行中？

时间戳看起来像“2015/07/08 10:08:00.079”

文件看起来像：

2015/07/08 10:14:46.971 - #2 - 4080/4064 - AccumulatorManager: ProcessID= RAW STATS:

<statistics>

STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 924 sec window: Rate: 0.00e+00 MB/sec
STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec
STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 63612 sec window: Rate: 4.23e-06 MB/sec

<more statistics>

我想在 RAW STATS 行中获取那个时间戳，所以它看起来像：

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 924 sec window: Rate: 0.00e+00 MB/sec

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData kVolSlot=0 GroupCopy(26764 SiteUID(0x3d1d0445) 0) 0 8582 sec window: Rate: 3.53e-06 MB/sec

【问题讨论】：

您正在使用二进制和运算符。这不是您想要的，请改用and。参见例如bool(1 and 2) 和 bool(1 & 2).
我将 '&' 更改为 'and'。给出了同样的结果。它有什么变化？ bool(1 and 2) 和 bool(1 & 2) 有什么区别？
当然它给出了相同的结果。它碰巧适用于您的示例，但我提供了两个不同的示例。您在这里编程（错误的）C 而不是 Python。在 C 中，出于同样的原因，存在 && 运算符，Python 的等效运算符是 and。但这只是对您的实际代码的评论，而不是试图解决您的问题。那必须等待另一个答案/评论。
如果您正在寻找时间戳，您必须找到一个标准来找到它们。你能提供更多数据吗？根据您写的内容，我认为我们不可能使用这些时间戳...
@adrianus 我认为类似：

标签： python string search

【解决方案1】：

这基本上应该可以完成工作：

def stat_entry(line):
    return line.startswith('STATISTICS')

def date_entry(line):
    return line.startswith('20')

def findIcommingStats():
    date = ''
    with open("result.log", "r") as replication_file:
        with open("incomingTemp.txt", "w") as tempFile:
            for line in replication_file:
                if date_entry(line):
                    date = ' '.join(line.split(' ')[:2]) # set new date
                elif stat_entry(line):
                    tempFile.write(date  + ' ' + line) # write to tempfile

findIcommingStats()

输出：

2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...
2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...
2015/07/08 10:14:46.971 STATISTICS: name=gridDeltaMarkerIncomingData...

如您所见，我分解了 stat_entry 和 date_entry 函数；您可能想要更改这些并添加一些更好的标准来检查给定的行是日期还是统计条目。

【讨论】：

【解决方案2】：

你可以使用正则表达式来解决这个问题。

首先你需要找到时间戳

 regexTimeStamp = re.complie('\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}.\d{3}')

那么你可以使用

match = regexTimeStamp.match(Str)

这里我使用 Str 作为文件中的一行。然后使用TimeStamp = match.group() 获取您的时间戳

现在类似地使用正则表达式来查找

regexStat = re.compile('STATISTICS:')

match1 = regexStat.match(str)
match1.start()

将为您提供 STATISTICS 的起始索引：您可以在此之前附加您的时间戳。

here is a guide on regex

and here is for hit and try

【讨论】：