Bigquery 中的 WRITE_TRUNCATE答案

【问题标题】：WRITE_TRUNCATE in BigqueryBigquery 中的 WRITE_TRUNCATE
【发布时间】：2020-06-04 18:15:22
【问题描述】：

我正在尝试使用 write_truncate 截断 Bigquery 中的表，但它没有发生，而是像 write_append 一样工作。它正在附加数据但不截断表格。

有人可以帮忙解决这个问题吗？

我的代码：

        with beam.Pipeline(options=Pipeline options()) as p:
           read=(p|"Read BQ">>beam.io.Read(beam.io.BigQuerySource(
    query='select empid from'\'`PRoject_Id.data_set.emp_details`',
use_standard_sql=True))|"process">>beam.Map(lambda ele:{'EMPID':ele['EMPID']})|
        "Write">>beam.io.WriteToBigQuery(
    'PROJECT_ID:data_set.emp_out',
    schema='EMPID:STRING',
    write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE,
create_dispositiom=beam.io.BigQueryDisposition.CREATE_IF_NEDED))
        if __name__="__main__":
          run().wait_until_finish()

【问题讨论】：

标签： google-bigquery google-cloud-dataflow apache-beam

【解决方案1】：

在流式管道的情况下，不支持 WRITE_TRUNCATE，如文档 here 所述。

For streaming pipelines WriteTruncate can not be used.

您可以将管道转换为 Batch 并使用 WRITE_TRUNCATE 选项。要将写入转换为批处理，您可以将method 参数设置为FILE_LOADS。默认此参数设置为STREAMING_INSERTS。

【讨论】：

感谢您的回复。我使用源作为 Bigquery，据我所知，这是批量加载。我没有在这里使用任何类型的流。如果我错了，请纠正我。
现在，表被截断但没有添加新行。表是空的。使用的代码是：beam.io.WriteToBigQuery(table_details,schema='EMPID:STRING,ENAME: STRING ', write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE, create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED)