【发布时间】:2017-09-12 18:01:39
【问题描述】:
这是我的数据格式:
[Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
这是我试图按日期显示行数的代码:
# datecount.py
import sys, collections
# sys.argv is the list of command-line arguments
# sys.arg[0] is the name of the program itself
# sys.arg[1] is optional and will be the file name
# set input based on number of arguments
if len(sys.argv) == 1:
f = sys.stdin
elif len(sys.argv) == 2:
try:
f = open(sys.argv[1])
except IOError:
print "Cannot open", sys.argv[1]
sys.exit()
else:
print "USAGE: python datecount [FILE]"
sys.exit()
dateCounts = collections.Counter()
# for every line passed into the script
for line in f:
# find indices of date section
start = line.find("[")
if start >= 0 :
end = line.find("]", start)
# graph just the date
date = line[start+21: end] #by YEAR
dateCounts[date]=dateCounts[date]+1
#print top dates
for date in dateCounts.most_common():
sys.stdout.write(str(date) + "\n")`
现在的输出是:
('2017', 738057)
('2016', 446204)
('2015', 9995)
('2014', 706)
但我只想按日期计算,例如:
('May 02 2016', 128)
('May 03 2016', 105)
('May 04 2016', 99)
正在考虑实现正则表达式,但不知道如何实现。
如何去掉日期中间的时间戳?
【问题讨论】:
标签: python python-2.7 collections sys