【发布时间】:2018-12-09 23:39:21
【问题描述】:
我想从日志文本中提取 JSON/字典。
示例日志文本:
2018-06-21 19:42:58 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'locations', 'CLOSESPIDER_TIMEOUT': '14400', 'FEED_FORMAT': 'geojson', 'LOG_FILE': '/geojson_dumps/21_Jun_2018_07_42_54/logs/coastalfarm.log', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'locations.spiders', 'SPIDER_MODULES': ['locations.spiders'], 'TELNETCONSOLE_ENABLED': '0', 'USER_AGENT': 'Mozilla/5.0'}
2018-06-21 19:43:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 369,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 1718,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 6, 21, 14, 13, 0, 841666),
'item_scraped_count': 4,
'log_count/INFO': 8,
'memusage/max': 56856576,
'memusage/startup': 56856576,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2018, 6, 21, 14, 12, 58, 499385)}
2018-06-21 19:43:00 [scrapy.core.engine] INFO: Spider closed (finished)
我尝试将(\{.+$\}) 作为正则表达式,但它给了我单行上的字典,{'BOT_NAME': 'locations',..., 'USER_AGENT': 'Mozilla/5.0'},这不是预期的。
我要提取的 json/字典: 注意:字典不会有相同的键,可能会有所不同。
{'downloader/request_bytes': 369,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 1718,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 6, 21, 14, 13, 0, 841666),
'item_scraped_count': 4,
'log_count/INFO': 8,
'memusage/max': 56856576,
'memusage/startup': 56856576,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2018, 6, 21, 14, 12, 58, 499385)}
【问题讨论】:
-
如果你能从日志中提取正确的字符串,然后使用json模块解析它,见stackoverflow.com/questions/4917006/…,你会得到字典对象。
标签: python regex python-3.x