【问题标题】:provisioning bigquery datasets using terraform使用 terraform 配置 bigquery 数据集
【发布时间】:2021-05-16 05:13:10
【问题描述】:

我是 GCP 和 Terraform 的新手。我正在开发 terraform 脚本来提供大约 50 个 BQ 数据集,每个数据集至少有 10 个表。所有表都没有相同的架构。

我已开发脚本来创建数据集和表,但我面临向表添加模式的挑战,我需要帮助。我正在使用 terraform 变量来构建脚本。

这是我的代码。我需要集成逻辑来为表创建模式。

var.tf

variable "test_bq_dataset" {
  type = list(object({
    id       = string
    location = string
  }))
}

variable "test_bq_table" {
  type = list(object({
    dataset_id = string
    table_id   = string
  }))
}

terraform.tfvars

test_bq_dataset = [{
  id       = "ds1"
  location = "US"
  },
  {
    id       = "ds2"
    location = "US"
  }
]

test_bq_table = [{
  dataset_id = "ds1"
  table_id   = "table1"
  },
  {
    dataset_id = "ds2"
    table_id   = "table2"
  },
  {
    dataset_id = "ds1"
    table_id   = "table3"
  }
]

main.tf

resource "google_bigquery_dataset" "dataset" {
  count      = length(var.test_bq_dataset)
  dataset_id = var.test_bq_dataset[count.index]["id"]
  location   = var.test_bq_dataset[count.index]["location"]
  labels = {
    "environment" = "development"
  }
}


resource "google_bigquery_table" "table" {
  count = length(var.test_bq_table)
  dataset_id = var.test_bq_table[count.index]["dataset_id"]
  table_id   = var.test_bq_table[count.index]["table_id"]
  labels = {
    "environment" = "development"
  }
  depends_on = [
    google_bigquery_dataset.dataset,
  ]
}

我尝试了所有可能为数据集中的表创建模式。但是没有任何效果。

【问题讨论】:

    标签: google-cloud-platform google-bigquery terraform terraform-provider-gcp


    【解决方案1】:

    大概你所有的表都应该有相同的架构......

    我会尝试这种方式

    resource "google_bigquery_table" "table"

    就在标签之后,例如,您可以添加:

    schema = file("${path.root}/subdirectories-path/table_schema.json")

    在哪里

    • ${path.root} - 是你改造文件的地方
    • subdirectories-path - 零个或多个子目录
    • table_schema.json - 带有架构的 json 文件

    ==> 2021 年 14 月 2 日更新

    根据请求显示表架构不同的示例...对原始问题的最小修改。

    variables.tf

    variable "project_id" {
      description = "The target project"
      type        = string
      default     = "ishim-sample"
    }
    
    variable "region" {
      description = "The region where resources are created => europe-west2"
      type        = string
      default     = "europe-west2"
    }
    
    variable "zone" {
      description = "The zone in the europe-west region for resources"
      type        = string
      default     = "europe-west2-b"
    }
    
    # ===========================
    variable "test_bq_dataset" {
      type = list(object({
        id       = string
        location = string
      }))
    }
    
    variable "test_bq_table" {
      type = list(object({
        dataset_id = string
        table_id   = string
        schema_id  = string
      }))
    }
    

    terraform.tfvars

    test_bq_dataset = [
      {
        id       = "ds1"
        location = "EU"
      },
      {
        id       = "ds2"
        location = "EU"
      }
    ]
    
    test_bq_table = [
      {
        dataset_id = "ds1"
        table_id   = "table1"
        schema_id  = "table-schema-01.json"
      },
      {
        dataset_id = "ds2"
        table_id   = "table2"
        schema_id  = "table-schema-02.json"
      },
      {
        dataset_id = "ds1"
        table_id   = "table3"
        schema_id  = "table-schema-03.json"
      },
      {
        dataset_id = "ds2"
        table_id   = "table4"
        schema_id  = "table-schema-04.json"
      }
    ]
    

    json 模式文件示例 - table-schema-01.json

    [
      {
        "name": "table_column_01",
        "mode": "REQUIRED",
        "type": "STRING",
        "description": ""
      },
      {
        "name": "_gcs_file_path",
        "mode": "REQUIRED",
        "type": "STRING",
        "description": "The GCS path to the file for loading."
      },
      {
        "name": "_src_file_ts",
        "mode": "REQUIRED",
        "type": "TIMESTAMP",
        "description": "The source file modification timestamp."
      },
      {
        "name": "_src_file_name",
        "mode": "REQUIRED",
        "type": "STRING",
        "description": "The file name of the source file."
      },
        {
        "name": "_firestore_doc_id",
        "mode": "REQUIRED",
        "type": "STRING",
        "description": "The hash code (based on the file name and its content, so each file has a unique hash) used as a Firestore document id."
      },
      {
        "name": "_ingested_ts",
        "mode": "REQUIRED",
        "type": "TIMESTAMP",
        "description": "The timestamp when this record was processed during ingestion into the BigQuery table."
      }
    ]
    

    ma​​in.tf

    provider "google" {
      project = var.project_id
      region  = var.region
      zone    = var.zone
    }
    
    resource "google_bigquery_dataset" "test_dataset_set" {
      project    = var.project_id
      count      = length(var.test_bq_dataset)
      dataset_id = var.test_bq_dataset[count.index]["id"]
      location   = var.test_bq_dataset[count.index]["location"]
    
      labels = {
        "environment" = "development"
      }
    }
    
    resource "google_bigquery_table" "test_table_set" {
      project    = var.project_id
      count      = length(var.test_bq_table)
      dataset_id = var.test_bq_table[count.index]["dataset_id"]
      table_id   = var.test_bq_table[count.index]["table_id"]
      schema     = file("${path.root}/bq-schema/${var.test_bq_table[count.index]["schema_id"]}")
    
      labels = {
        "environment" = "development"
      }
      depends_on = [
        google_bigquery_dataset.test_dataset_set,
      ]
    }
    

    项目目录结构 - 截图

    请记住子目录名称 - “bq-schema”,因为它用于“main.tf”文件中“google_bigquery_table”资源的“schema”属性。

    BigQuery 控制台 - 屏幕截图

    “terraform apply”命令的结果。

    【讨论】:

    • 我必须将此添加到问题中。表没有相同的架构。
    • 您可能可以定义一个 terraform 变量 - 一个映射“表名 => 模式文件名”,或一个包含模式文件名的列表,以便使用相同的计数循环选择正确的文件的 const "table_schema.json"。
    • 能否请您分享一个使用地图的示例。
    【解决方案2】:

    Terraform 包含一个可选的 schema 参数,该参数需要 JSON 字符串。

    上一个链接分享的文档有一个例子:

    resource "google_bigquery_table" "default" {
      dataset_id = google_bigquery_dataset.default.dataset_id
      table_id   = "bar"
    
      time_partitioning {
        type = "DAY"
      }
    
      labels = {
        env = "default"
      }
    
      schema = <<EOF
    [
      {
        "name": "permalink",
        "type": "STRING",
        "mode": "NULLABLE",
        "description": "The Permalink"
      },
      {
        "name": "state",
        "type": "STRING",
        "mode": "NULLABLE",
        "description": "State where the head office is located"
      }
    ]
    EOF
    
    }
    

    【讨论】:

    • 我有 50 个 BQ 数据集,每个 DS 有 10 个表,我不喜欢硬编码值。找出一种利用变量创建模式的方法(就像我一直在创建表和 DS 一样。)
    • 我明白了!我不知道不同的模式。这肯定需要另一种方法。我相信@al-dann 的回答提供了一种更好的方法。
    猜你喜欢
    • 2018-12-08
    • 2022-01-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-14
    • 1970-01-01
    • 2018-07-31
    • 2020-10-31
    相关资源
    最近更新 更多