【问题标题】:Loading JSON from S3 to Redshift将 JSON 从 S3 加载到 Redshift
【发布时间】:2020-05-18 19:15:35
【问题描述】:

我在 S3 存储桶中有以下 JSON 数据:

{
"campaigns": [
{"campaign_reach": 123456, 
"campaign_spend": 123456.0, 
"campaign_goal": 12345678, 
"id": "cda05a432b3b44c18c009a4a961f644a", 
"campaign_name": "Campaign1", 
"publisher_name": "PublisherA", 
"campaign_impressions": 123456}], 
"line_items": [], 
"podcasts": [
{"podcast_name": "PodcastA", "id": "86edbca2dc644ba8960c8f4bd55bdc19"}, 
{"podcast_name": "PodcastB", "id": "fc3f2dc4c20949edaaf2186613ec7e47"}]
}

我正在使用 COPY 将“活动”部分加载到 Redshift 中的表中。

我尝试过使用 jsonpaths 加载

query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 's3://mybucket/campaign_jsonpaths.json'
;"""

我的 jsonpaths 文件“campaign_jsonpaths.json”:

{
    "jsonpaths": [
        "$['id']",
        "$['campaign_name']",
        "$['campaign_reach'][0]",
        "$['campaign_spend']",
        "$['campaign_goal']",
        "$['campaign_impressions']",
        "$['publisher_name']",
    ]
}

我也尝试过使用 json 'auto':

query_copy = """copy myschema.campaigns
from 's3://mybucket/mapping.json'
credentials 'aws_access_key_id=""" + acc + """;aws_secret_access_key=""" + sh + """'
json 'auto’
;"""

两者都成功运行,但 Redshift 中的表是空的。 stl_load_errors 中没有错误。

我在这里找到了类似的帖子,但没有提供答案: Redshift: copy command Json data from s3

任何帮助将不胜感激。

【问题讨论】:

    标签: json amazon-s3 amazon-redshift


    【解决方案1】:

    通过执行以下操作,我能够成功加载表格:

    1. 根据您的 JSON 数据创建活动表:

      create table campaigns ( id varchar(100), campaign_name varchar(100), campaign_reach int, campaign_spend float, campaign_goal int, campaign_impressions int, publisher_name varchar(100) );

    2. 使用您的 JSON 数据创建了一个 mapping.json 文件

    3. 如下创建了一个campaigns_jsonpaths.json:

      { "jsonpaths": [ "$['campaigns'][0]['id']", "$['campaigns'][0]['campaign_name']", "$['campaigns'][0]['campaign_reach']", "$['campaigns'][0]['campaign_spend']", "$['campaigns'][0]['campaign_goal']", "$['campaigns'][0]['campaign_impressions']", "$['campaigns'][0]['publisher_name']" ] }

    4. 跑副本:

      copy campaigns from 's3://<bucket>/mapping.json' iam_role 'arn:aws:iam::1234567890:role/Redshift-Role' json 's3://<bucket>/campaigns_jsonpaths.json' ;

    记录已成功加载到活动表中。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-12-28
      • 1970-01-01
      • 2017-10-22
      • 1970-01-01
      • 1970-01-01
      • 2016-08-08
      • 1970-01-01
      相关资源
      最近更新 更多