【问题标题】:Parsing apache logs using PostgreSQL使用 PostgreSQL 解析 apache 日志
【发布时间】:2010-10-17 11:33:53
【问题描述】:

This O'Reilly article 给出了一个解析 Apache 日志行的 PostgreSQL 语句示例:

 INSERT INTO http_log(log_date,ip_addr,record)
     SELECT CAST(substr(record,strpos(record,'[')+1,20) AS date),
            CAST(substr(record,0,strpos(record,' ')) AS cidr),
            record
 FROM tmp_apache;

显然这只会提取 IP 和时间戳字段。是否有从典型的组合日志格式记录中提取所有字段的规范语句?如果没有,我会写一个,我保证在这里发布结果!

【问题讨论】:

    标签: apache parsing postgresql logging


    【解决方案1】:

    好的,这是我的解决方案:

    insert into accesslog
    select m[1], m[2], m[3],
        (to_char(to_timestamp(m[4], 'DD/Mon/YYYY:HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS ')
            || split_part(m[4], ' ',2))::timestamp with time zone,
         m[5], m[6]::smallint, (case m[7] when '-' then '0' else m[7] end)::integer, m[8], m[9] from (
        select regexp_matches(record,
     E'(.*) (.*) (.*) \\[(.*)\\] "(.*)" (\\d+) (.*) "(.*)" "(.*)"')
     as m from tmp_apache) s;
    

    它从 tmp_apache 表中获取原始日志行并将字段(使用正则表达式)提取到一个数组中。

    【讨论】:

      【解决方案2】:

      这是我比较完整的解决方案。

      apache 日志文件不应包含无效字符或反斜杠。如有必要,您可以使用以下命令从日志文件中删除这些内容:

      cat logfile | strings | grep -v '\\' > cleanedlogfile
      

      然后将日志文件复制并解析到postgres中(m[1]到m[7]对应于regexp_matches函数中的正则表达式组):

      -- sql for postgres:
      drop table if exists rawlog;
      create table rawlog (record varchar);
      -- import data from log file
      copy rawlog from '/path/to/your/apache/cleaned/log/file';
      -- parse the rawlog into table accesslog
      drop table if exists accesslog;
      create table accesslog as
      (select m[1] as clientip,
        (to_char(to_timestamp(m[4], 'DD/Mon/YYYY:HH24:MI:SS'), 'YYYY-MM-DD HH24:MI:SS ')
              || split_part(m[4], ' ',2))::timestamp with time zone as "time",
        split_part(m[5], ' ', 1) as method,
        split_part(split_part(m[5], ' ', 2), '?', 1) as uri,
        split_part(split_part(m[5], ' ', 2), '?', 2) as query,
        m[6]::smallint as status,
        m[7]::bigint bytes
          from
      (select 
        regexp_matches(record, E'(.*) (.*) (.*) \\[(.*)\\] "(.*)" (\\d+) (\\d+)') as m 
         from rawlog) s);
      -- optionally create indexes
      create index accesslogclientipidx on accesslog(clientip);
      create index accesslogtimeidx on accesslog(time);
      create index accessloguriidx on accesslog(uri);
      

      【讨论】:

        猜你喜欢
        • 2021-01-27
        • 1970-01-01
        • 2014-03-23
        • 1970-01-01
        • 2011-02-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多