【问题标题】:Load data from MySQL to BigQuery using DataflowLoad data from MySQL to BigQuery using Dataflow
【发布时间】:2022-12-02 08:20:38
【问题描述】:

I want to load data from MySQL to BigQuery using Cloud Dataflow. Anyone can share article or work experience about load data from MySQL to BigQuery using Cloud Dataflow with Python language?

Thank you

【问题讨论】:

  • Have you many transformations to apply or you only want to copy data ?
  • I only want to copy data from Mysql to BigQuery

标签: python mysql google-bigquery etl google-cloud-dataflow


【解决方案1】:

You can use apache_beam.io.jdbc to read from your MySQL database, and the BigQuery I/O to write on BigQuery.

Beam knowledge is expected, so I recommend looking at Apache Beam Programming Guide first.

If you are looking for something pre-built, we have the JDBC to BigQuery Google-provided template, which is open-source (here), but it is written in Java.

【讨论】:

    【解决方案2】:

    If you only want to copy data from MySQL to BigQuery, you can firstly export your MySql data to Cloud Storage, then load this file to a BigQuery table.

    I think no need using Dataflow in this case because you don't have complex transformations and business logics. It only corresponds to a copy.

    Export the MySQL data to Cloud Storage via a sql query and gcloud cli :

    gcloud sql export csv INSTANCE_NAME gs://BUCKET_NAME/FILE_NAME 
    --database=DATABASE_NAME 
    --offload 
    --query=SELECT_QUERY 
    --quote="22" 
    --escape="5C" 
    --fields-terminated-by="2C" 
    --lines-terminated-by="0A"
    

    Load the csv file to a BigQuery table via gcloud cli and bq :

    bq load 
      --source_format=CSV 
      mydataset.mytable 
      gs://mybucket/mydata.csv 
      ./myschema.json
    

    ./myschema.json is the BigQuery table schema.

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-12-27
      • 1970-01-01
      • 2022-12-31
      • 2022-12-19
      • 2022-01-18
      • 2022-12-02
      相关资源
      最近更新 更多