【发布时间】:2016-05-18 19:34:35
【问题描述】:
我已经在谷歌上搜索了很长时间,但没有找到一种方法将我的备份(在存储桶内)导出到 Big Query,而无需手动执行...
可以这样做吗?
非常感谢!
【问题讨论】:
标签: python google-app-engine google-bigquery
我已经在谷歌上搜索了很长时间,但没有找到一种方法将我的备份(在存储桶内)导出到 Big Query,而无需手动执行...
可以这样做吗?
非常感谢!
【问题讨论】:
标签: python google-app-engine google-bigquery
您应该可以通过python-bigquery api 这样做。
首先,您需要连接到 BigQuery 服务。这是我用来这样做的代码:
class BigqueryAdapter(object):
def __init__(self, **kwargs):
self._project_id = kwargs['project_id']
self._key_filename = kwargs['key_filename']
self._account_email = kwargs['account_email']
self._dataset_id = kwargs['dataset_id']
self.connector = None
self.start_connection()
def start_connection(self):
key = None
with open(self._key_filename) as key_file:
key = key_file.read()
credentials = SignedJwtAssertionCredentials(self._account_email,
key,
('https://www.googleapis' +
'.com/auth/bigquery'))
authorization = credentials.authorize(httplib2.Http())
self.connector = build('bigquery', 'v2', http=authorization)
之后,您可以使用self.connector 运行jobs(in this answer 您会找到一些示例)。
要从 Google Cloud Storage 获取备份,您必须像这样定义 configuration:
body = "configuration": {
"load": {
"sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
"fieldDelimiter": "," #(if it's comma separated)
"destinationTable": {
"projectId": #your_project_id
"tableId": #your_table_to_save_the_data
"datasetId": #your_dataset_id
},
"writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
"sourceUris": [
#the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
],
"schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
"fields": [ # Describes the fields in a table.
{
"fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
# Object with schema name: TableFieldSchema
],
"type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
"description": "A String", # [Optional] The field description. The maximum length is 16K characters.
"name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
"mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
},
],
},
},
然后运行:
self.connector.jobs().insert(body=body).execute()
希望这就是您想要的。如果您遇到任何问题,请告诉我们。
【讨论】: