【发布时间】:2020-12-17 08:46:39
【问题描述】:
我正在尝试将文本日志格式化为 csv 文件 文本日志文件格式。每个以前缀 ("t=%m p=%ph=%h db=%d u=%u x=%x") 开头的条目都被视为一行。它可能包含 \n 和 \r 转义序列。
t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.011 ms execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)
t=2020-08-25 15:00:00.000 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 DETAIL: parameters: $1 = '9187372'
t=2020-08-25 15:00:00.001 +03 p=36001 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.005 ms bind S_1: COMMIT
t=2020-08-25 15:00:00.001 +03 p=36001 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.004 ms execute S_1: COMMIT
t=2020-08-25 15:00:00.001 +03 p=16205 h=127.0.0.1 db=test u=test_app x=0 LOG: duration: 0.018 ms bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)
t=2019-12-19 17:00:00.102 +03 p=58042 h= db= u= x=0 LOG: automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0
pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983
buffer usage: 90 hits, 0 misses, 0 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
在 SQL 语句的前缀之后,和往常一样,它们是不固定的。
如果可能没有前缀就完美了,每一行的格式应该如下:
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","LOG:"," duration: 0.011 ms execute S_40: SELECT ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID FROM DB_LOG WHERE (ID = $1)"
"2020-08-25 15:00:00.000 +03","16205","127.0.0.1","test","test_app","0","DETAIL:"," parameters: $1 = '9187372'"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG:"," duration: 0.005 ms bind S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","36001","127.0.0.1","test","test_app","0","LOG:"," duration: 0.004 ms execute S_1: COMMIT"
"2020-08-25 15:00:00.001 +03","16205","127.0.0.1","test","test_app","0","LOG:"," duration: 0.018 ms bind S_41: INSERT INTO DB_LOG (ID, EKLEME_ZAMANI, EKLEYEN_KULLANICI_ID, GORULME_DURUMU, GUNCELLEME_ZAMANI, GUNCELLEYEN_KULLANICI_ID, IP_ADRESI, ISLEM_ZAMANI, ISLEMI_YAPAN_KULLANICI_ID, METOD, PARAMETRE_JSON, UYGULAMA_ID, VERSIYON, DURUM_ID) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)"
"2019-12-19 17:00:00.102 +03","58042","","","","0","LOG:"," automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983 buffer usage: 90 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
正则表达式101: https://regex101.com/r/R3vADD/4
但我不确定当将 csv 文件复制到 db 时,预期行的最后一部分会出现一些问题,因为“table”有双引号。
" automatic vacuum of table "postgres.pgagent.pga_job": index scans: 0pages: 0 removed, 9 remain, 0 skipped due to pins, 0 skipped frozen tuples: 0 removed, 493 remain, 472 are dead but not yet removable, oldest xmin: 20569983 buffer usage: 90 hits, 0 misses, 0 dirtied avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s"
谢谢大家。
【问题讨论】: