【发布时间】:2021-04-22 13:21:51
【问题描述】:
我的 S3 中有这种类型的数据:
{"version":"0","id":"c1d9e9a4-25a2-a0d8-2fa4-b062efec98c4","detail-type":"OneTypeee","source":"OneSource","account":"123456789","time":"2021-01-17T12:35:17Z","region":"eu-central-1","resources":[],"detail":{"Key1":"Value1"}}
{"version":"0","id":"c13879a4-2h32-a0d8-9m33-b03jsh3cxxj4","detail-type":"OtherType","source":"SomeMagicSource","account":"123456789","time":"2021-01-17T12:36:17Z","region":"eu-central-1","resources":[],"detail":{"Key2":"Value2", "Key22":"Value22"}}
{"version":"0","id":"gi442233-3y44a0d8-9m33-937rjd74jdddj","detail-type":"MoreTypes","source":"SomeMagicSource2","account":"123456789","time":"2021-01-17T12:45:17Z","region":"eu-central-1","resources":[],"detail":{"MagicKey":"MagicValue", "Foo":"Bar"}}
请注意,我添加了新行以使其更具可读性。实际上,Kinesis Firehose 生成这些批次时没有换行符。
当我尝试对这种类型的数据运行 AWS Glue 爬虫时,它只爬取第一个 JSON 行,仅此而已。我知道这一点,因为当我运行 Athena SQL 查询时,我总是只得到一个(第一个)结果。
如何使胶水爬虫正确地爬取这些数据并创建正确的架构,以便我可以查询所有这些数据?
【问题讨论】:
标签: amazon-web-services amazon-s3 aws-glue amazon-athena amazon-kinesis-firehose