【问题标题】:Load a json file from biq query command line从 bigquery 命令行加载 json 文件
【发布时间】:2012-09-19 08:00:21
【问题描述】:

是否可以使用 Big Query 命令行工具从 json 文件(不仅仅是 csv)加载数据?我可以使用 GUI 加载一个简单的 json 文件,但是,命令行假定为 csv,我没有看到任何关于如何指定 json 的文档。

这是我正在使用的简单 json 文件

{"col":"value"}

有架构 列:字符串

【问题讨论】:

  • 今天宣布了对 JSON 的额外支持。 googledevelopers.blogspot.com/2012/10/…
  • 太好了!但是,自 9/11 以来,命令行 api 似乎没有被修改,而当前版本 2.0.9 告诉我它现在没有关于 source_format 标志。
  • 我们一次只能加载一个分区吗?需要一个包装脚本来加载所有内容?

标签: json google-bigquery


【解决方案1】:

从 2.0.12 版开始,bq 允许上传以换行符分隔的 JSON 文件。这是一个完成这项工作的示例命令:

bq load --source_format NEWLINE_DELIMITED_JSON datasetName.tableName data.json schema.json

如上所述,“bq help load”将为您提供所有详细信息。

【讨论】:

    【解决方案2】:

    1) 是的,你可以

    2) 文档是 here 。转到第 3 步:上传文档中的表格。

    3) 您必须使用 --source_format 标志告诉 bq 您正在上传 JSON 文件而不是 csv。

    4) 完整的命令结构是

    bq load [--source_format=NEWLINE_DELIMITED_JSON] [--project_id=your_project_id] destination_data_set.destination_table data_source_uri table_schema
    
    bq load --project_id=my_project_bq dataset_name.bq_table_name gs://bucket_name/json_file_name.json path_to_schema_in_your_machine
    

    5) 您可以通过

    找到其他 bq 负载变体
    bq help load   
    

    【讨论】:

      【解决方案3】:

      它不支持 JSON 格式的数据加载。 这是最新 bq 版本 2.0.9 的 loadcommand 的文档 (bq help load):

      USAGE: bq [--global_flags] <command> [--command_flags] [args]
      
      
      load     Perform a load operation of source into destination_table.
      
           Usage:
           load <destination_table> <source> [<schema>]
      
           The <destination_table> is the fully-qualified table name of table to create, or append to if the table already exists.
      
           The <source> argument can be a path to a single local file, or a comma-separated list of URIs.
      
           The <schema> argument should be either the name of a JSON file or a text schema. This schema should be omitted if the table already has one.
      
           In the case that the schema is provided in text form, it should be a comma-separated list of entries of the form name[:type], where type will default
           to string if not specified.
      
           In the case that <schema> is a filename, it should contain a single array object, each entry of which should be an object with properties 'name',
           'type', and (optionally) 'mode'. See the online documentation for more detail:
           https://code.google.com/apis/bigquery/docs/uploading.html#createtable
      
           Note: the case of a single-entry schema with no type specified is
           ambiguous; one can use name:string to force interpretation as a
           text schema.
      
           Examples:
           bq load ds.new_tbl ./info.csv ./info_schema.json
           bq load ds.new_tbl gs://mybucket/info.csv ./info_schema.json
           bq load ds.small gs://mybucket/small.csv name:integer,value:string
           bq load ds.small gs://mybucket/small.csv field1,field2,field3
      
           Arguments:
           destination_table: Destination table name.
           source: Name of local file to import, or a comma-separated list of
           URI paths to data to import.
           schema: Either a text schema or JSON file, as above.
      
           Flags for load:
      
      /usr/local/bin/bq:
        --[no]allow_quoted_newlines: Whether to allow quoted newlines in CSV import data.
        -E,--encoding: <UTF-8|ISO-8859-1>: The character encoding used by the input file. Options include:
          ISO-8859-1 (also known as Latin-1)
          UTF-8
        -F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.
        --max_bad_records: Maximum number of bad records allowed before the entire job fails.
          (default: '0')
          (an integer)
        --[no]replace: If true erase existing contents before loading new data.
          (default: 'false')
        --schema: Either a filename or a comma-separated list of fields in the form name[:type].
        --skip_leading_rows: The number of rows at the beginning of the source file to skip.
          (an integer)
      
      gflags:
        --flagfile: Insert flag definitions from the given file into the command line.
          (default: '')
        --undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name.
          IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
          (default: '')
      

      【讨论】:

      • 感谢您的回复,您知道为什么GUI支持它吗? GUI 是否将 json 转换为 csv?如果必须转换 100MB 的文件(ui 文件大小限制),这似乎对性能造成了很大影响
      • 我猜是 GUI 自己完成了转换工作。文件大小限制是 4GB 而不是 100MB?
      • 嗯,我可以发誓我看到 GUI 文件限制为 100MB,但现在我找不到了。
      猜你喜欢
      • 1970-01-01
      • 2016-04-11
      • 2014-12-15
      • 2016-01-30
      • 2014-01-04
      • 1970-01-01
      • 1970-01-01
      • 2014-07-20
      • 1970-01-01
      相关资源
      最近更新 更多