【问题标题】:Python GAE - How to export data from a backup to Big Query programmatically?Python GAE - 如何以编程方式将数据从备份导出到 Big Query?
【发布时间】:2016-05-18 19:34:35
【问题描述】:

我已经在谷歌上搜索了很长时间,但没有找到一种方法将我的备份(在存储桶内)导出到 Big Query,而无需手动执行...

可以这样做吗?

非常感谢!

【问题讨论】:

    标签: python google-app-engine google-bigquery


    【解决方案1】:

    您应该可以通过python-bigquery api 这样做。

    首先,您需要连接到 BigQuery 服务。这是我用来这样做的代码:

    class BigqueryAdapter(object):
        def __init__(self, **kwargs):
            self._project_id = kwargs['project_id']
            self._key_filename = kwargs['key_filename']
            self._account_email = kwargs['account_email']
            self._dataset_id = kwargs['dataset_id']
            self.connector = None
            self.start_connection()
    
        def start_connection(self):
            key = None
            with open(self._key_filename) as key_file:
                key = key_file.read()
            credentials = SignedJwtAssertionCredentials(self._account_email,
                                                        key,
                                                        ('https://www.googleapis' +
                                                         '.com/auth/bigquery'))
            authorization = credentials.authorize(httplib2.Http())
            self.connector = build('bigquery', 'v2', http=authorization)
    

    之后,您可以使用self.connector 运行jobsin this answer 您会找到一些示例)。

    要从 Google Cloud Storage 获取备份,您必须像这样定义 configuration

    body = "configuration": {
      "load": {
        "sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
        "fieldDelimiter": "," #(if it's comma separated)
        "destinationTable": {
          "projectId": #your_project_id
          "tableId": #your_table_to_save_the_data
          "datasetId": #your_dataset_id
        },
        "writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
        "sourceUris": [
            #the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
        ],
        "schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
          "fields": [ # Describes the fields in a table.
            {
              "fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
                # Object with schema name: TableFieldSchema
              ],
              "type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
              "description": "A String", # [Optional] The field description. The maximum length is 16K characters.
              "name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
              "mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
            },
          ],
        },
      },
    

    然后运行:

    self.connector.jobs().insert(body=body).execute()
    

    希望这就是您想要的。如果您遇到任何问题,请告诉我们。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-08-11
      • 1970-01-01
      • 2017-06-16
      • 1970-01-01
      • 1970-01-01
      • 2011-04-26
      • 1970-01-01
      相关资源
      最近更新 更多