【问题标题】:Parse a json file to get the right columns to insert into bigquery解析 json 文件以获取要插入 bigquery 的正确列
【发布时间】:2019-08-27 09:43:22
【问题描述】:

我对 Python 比较陌生,我正在尝试从 ECB 免费 api 获取一些汇率数据:

获取https://api.exchangeratesapi.io/latest?base=GBP

我希望最终在 bigquery 表中得到这些数据。将数据加载到 BQ 很好,但在将数据发送到 BQ 之前将其转换为正确的列/行格式是个问题。

我想最终得到一张这样的表格:

Currency    Rate      Date
CAD         1.629..   2019-08-27
HKD         9.593..   2019-08-27
ISK         152.6..   2019-08-27
...         ...       ...

我已经尝试了一些东西,但还没有完全做到:

# api-endpoint
URL = "https://api.exchangeratesapi.io/latest?base=GBP"

# sending get request and saving the response as response object
r = requests.get(url=URL)

# extracting data in json format
data = r.json()

with open('data.json', 'w') as outfile:
    json.dump(data['rates'], outfile)

a_dict = {'date': '2019-08-26'}

with open('data.json') as f:
    data = json.load(f)

data.update(a_dict)

with open('data.json', 'w') as f:
    json.dump(data, f)

print(data)

这是原始的json文件:

{  
   "rates":{  
      "CAD":1.6296861353,
      "HKD":9.593490542,
      "ISK":152.6759753684,
      "PHP":64.1305429339,
      "DKK":8.2428443501,
      "HUF":363.2604778172,
      "CZK":28.4888284523,
      "GBP":1.0,
      "RON":5.2195062629,
      "SEK":11.8475893558,
      "IDR":17385.9684034803,
      "INR":87.6742617713,
      "BRL":4.9997236134,
      "RUB":80.646191945,
      "HRK":8.1744110201,
      "JPY":130.2223254066,
      "THB":37.5852652759,
      "CHF":1.2042718318,
      "EUR":1.1055465269,
      "MYR":5.1255348081,
      "BGN":2.1622278974,
      "TRY":7.0550451616,
      "CNY":8.6717964026,
      "NOK":11.0104695256,
      "NZD":1.9192287707,
      "ZAR":18.6217151449,
      "USD":1.223287232,
      "MXN":24.3265563331,
      "SGD":1.6981194654,
      "AUD":1.8126540855,
      "ILS":4.3032293014,
      "KRW":1482.7479464473,
      "PLN":4.8146551248
   },
   "base":"GBP",
   "date":"2019-08-23"
}

【问题讨论】:

    标签: python json google-bigquery


    【解决方案1】:

    欢迎!怎么样,作为解决问题的一种方法。

    # import the pandas library so we can use it's from_dict function:
    import pandas as pd
    
    # subset the json to a dict of exchange rates and country codes:
    d = data['rates']
    
    # create a dataframe from this data, using pandas from_dict function:
    df = pd.DataFrame.from_dict(d,orient='index')
    
    # add a column for date (this value is taken from the json data):
    df['date'] = data['date']
    
    # name our columns, to keep things clean
    df.columns = ['rate','date']
    

    这给了你:

        rate    date
    CAD 1.629686    2019-08-23
    HKD 9.593491    2019-08-23
    ISK 152.675975  2019-08-23
    PHP 64.130543   2019-08-23
    ...      
    

    在这种情况下,货币是数据框的索引,如果您希望它作为它自己的列,只需添加: df['currency'] = df.index

    然后,您可以将此数据帧写入 .csv 文件,或将其写入 BigQuery。

    为此,我建议您查看The BigQuery Client library,起初可能有点难以理解,因此您可能还想查看pandas.DataFrame.to_gbq,它更容易,但更少健壮(有关客户端库与 pandas 函数的更多详细信息,请参阅this link

    【讨论】:

    • 感谢您的帮助,效果很好! Pandas 很好地获得了我想要的格式。我最终将数据框写入 csv 并加载到没有熊猫的 BQ 表中。我将在下面的评论中为任何感兴趣的人发布我的最终脚本。并感谢您的欢迎!
    【解决方案2】:

    感谢 Ben P 的帮助。

    这是我的脚本,适用于感兴趣的人。它使用我的团队用于 BQ 加载的内部库,但其余的是 pandas 和请求:

    from aa.py.gcp import GCPAuth, GCPBigQueryClient
    from aa.py.log import StandardLogger
    import requests, os, pandas as pd
    
    # Connect to BigQuery
    logger = StandardLogger('test').logger
    auth = GCPAuth(logger=logger)
    credentials_path = 'XXX'
    credentials = auth.get_credentials(credentials_path)
    gcp_bigquery = GCPBigQueryClient(logger=logger)
    gcp_bigquery.connect(credentials)
    
    # api-endpoint
    URL = "https://api.exchangeratesapi.io/latest?base=GBP"
    
    # sending get request and saving the response as response object
    r = requests.get(url=URL)
    
    # extracting data in json format
    data = r.json()
    
    # extract rates object from json
    d = data['rates']
    
    # split currency and rate for dataframe
    df = pd.DataFrame.from_dict(d,orient='index')
    
    # add date element to dataframe
    df['date'] = data['date']
    
    #column names
    df.columns = ['rate', 'date']
    
    # print dataframe
    print(df)
    
    # write dateframe to csv
    df.to_csv('data.csv', sep='\t', encoding='utf-8')
    
    #########################################
    # write csv to BQ table
    file_path = os.getcwd()
    file_name = 'data.csv'
    dataset_id = 'Testing'
    table_id = 'Exchange_Rates'
    
    response = gcp_bigquery.load_file_into_table(file_path, file_name, dataset_id, table_id, source_format='CSV', field_delimiter="\t", create_disposition='CREATE_NEVER', write_disposition='WRITE_TRUNCATE',skip_leading_rows=1)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-04-10
      • 1970-01-01
      • 1970-01-01
      • 2016-06-24
      • 1970-01-01
      相关资源
      最近更新 更多