【问题标题】:Python: How to count by date without timestampPython:如何在没有时间戳的情况下按日期计数
【发布时间】:2017-09-12 18:01:39
【问题描述】:

这是我的数据格式:

    [Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml

这是我试图按日期显示行数的代码:

    # datecount.py
    import sys, collections

    # sys.argv is the list of command-line arguments
    # sys.arg[0] is the name of the program itself
    # sys.arg[1] is optional and will be the file name

    # set input based on number of arguments
    if len(sys.argv) == 1:
        f = sys.stdin
    elif len(sys.argv) == 2:
        try:
            f = open(sys.argv[1])
        except IOError:
            print "Cannot open", sys.argv[1]
            sys.exit()
    else:
        print "USAGE: python datecount [FILE]"
        sys.exit()

    dateCounts = collections.Counter()
    # for every line passed into the script
    for line in f:
        # find indices of date section
        start = line.find("[")
        if start >= 0 :
            end = line.find("]", start)
            # graph just the date
            date = line[start+21: end] #by YEAR
            dateCounts[date]=dateCounts[date]+1

    #print top dates
    for date in dateCounts.most_common():
        sys.stdout.write(str(date) + "\n")`

现在的输出是:

    ('2017', 738057)
    ('2016', 446204)
    ('2015', 9995)
    ('2014', 706)

但我只想按日期计算,例如:

    ('May 02 2016', 128)
    ('May 03 2016', 105)
    ('May 04 2016', 99)

正在考虑实现正则表达式,但不知道如何实现。

如何去掉日期中间的时间戳?

【问题讨论】:

    标签: python python-2.7 collections sys


    【解决方案1】:

    我们可以使用下面的代码得到预期的结果。我希望这会有所帮助。

     # datecount.py
    import sys, collections
    
    # sys.argv is the list of command-line arguments
    # sys.arg[0] is the name of the program itself
    # sys.arg[1] is optional and will be the file name
    
    # set input based on number of arguments
    if len(sys.argv) == 1:
        f = sys.stdin
    elif len(sys.argv) == 2:
        try:
            f = open(sys.argv[1])
        except IOError:
            print "Cannot open", sys.argv[1]
            sys.exit()
    else:
        print "USAGE: python datecount [FILE]"
        sys.exit()
    
    dateCounts = collections.Counter()
    # for every line passed into the script
    for line in f:
        # find indices of date section
        start = line.find("[")
        if start >= 0 :
            end = line.find("]", start)
            # graph just the date
            date = line[start+5:11] +' '+ line[start+21:end] #by Date and YEAR
            dateCounts[date]=dateCounts[date]+1
    
    #print top dates
    for date in dateCounts.most_common():
        sys.stdout.write(str(date) + "\n")`
    

    【讨论】:

      【解决方案2】:

      用正则表达式实现:

      import sys        
      import collections
      import re
      
      dateCounts = collections.Counter()             
      input_str = """
      [Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
      [Mon May 03 15:38:50 2017] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
      [Mon May 02 15:38:50 2016] [error] [client XX.XX.XX.XX] File does not exist: /home/XXX/XXXX/XXX/XXX/XXX.shtml
      """
      
      found = re.findall("\[(.*)\].*\[.*\].*\[.*\].*", input_str, re.MULTILINE)
      
      for date in found:                        
         dateCounts[date] = dateCounts[date] + 1
      
      for date in dateCounts.most_common(): 
          sys.stdout.write(str(date) + "\n")
      

      输出:

      ('Mon May 02 15:38:50 2016', 2)
      ('Mon May 03 15:38:50 2017', 1)
      

      【讨论】:

        猜你喜欢
        • 2021-04-01
        • 1970-01-01
        • 2023-03-25
        • 1970-01-01
        • 2021-12-23
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多