Terraform：如何使用命名空间在 Google Cloud (GKE) 上创建 Kubernetes 集群？答案

【问题标题】：Terraform: How to create a Kubernetes cluster on Google Cloud (GKE) with namespaces?Terraform：如何使用命名空间在 Google Cloud (GKE) 上创建 Kubernetes 集群？
【发布时间】：2018-10-09 19:56:58
【问题描述】：

我正在寻找一个可以执行以下操作的示例：

通过 Terraform 的 google_container_cluster 在 GKE 上创建 Kubernetes 集群
... 并继续在其中创建命名空间，我想通过kubernetes_namespace

我不确定的是如何连接新创建的集群和命名空间定义。例如，添加google_container_node_pool 时，我可以执行cluster = "${google_container_cluster.hosting.name}" 之类的操作，但对于kubernetes_namespace，我没有看到类似的内容。

【问题讨论】：

不是答案，因为我没有玩过这个，但看起来 google_container_cluster 提供程序导出了必要的数据：terraform.io/docs/providers/google/r/… 能够连接并验证到集群（例如 IP 和证书数据）。该数据可以填充 k8s 提供者的凭证块：terraform.io/docs/providers/kubernetes/…。 k8s 提供者在内部使用 k8s go 客户端发出经过身份验证的 api 调用来创建命名空间并执行其他集群操作。

标签： kubernetes google-cloud-platform terraform google-kubernetes-engine

【解决方案1】：

理论上，可以在 K8S（或任何其他）提供程序中引用来自 GCP 提供程序的资源，就像在单个提供程序的上下文中引用资源或数据源一样。

provider "google" {
  region = "us-west1"
}

data "google_compute_zones" "available" {}

resource "google_container_cluster" "primary" {
  name = "the-only-marcellus-wallace"
  zone = "${data.google_compute_zones.available.names[0]}"
  initial_node_count = 3

  additional_zones = [
    "${data.google_compute_zones.available.names[1]}"
  ]

  master_auth {
    username = "mr.yoda"
    password = "adoy.rm"
  }

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring"
    ]
  }
}

provider "kubernetes" {
  host = "https://${google_container_cluster.primary.endpoint}"
  username = "${google_container_cluster.primary.master_auth.0.username}"
  password = "${google_container_cluster.primary.master_auth.0.password}"
  client_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
  client_key = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
  cluster_ca_certificate = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}

resource "kubernetes_namespace" "n" {
  metadata {
    name = "blablah"
  }
}

但实际上它可能无法按预期工作，因为一个已知的核心错误会破坏跨提供商的依赖关系，请分别参见 https://github.com/hashicorp/terraform/issues/12393 和 https://github.com/hashicorp/terraform/issues/4149。

替代解决方案是：

首先使用 2-staged apply 和 target GKE 集群，然后是其他任何依赖它的东西，即 terraform apply -target=google_container_cluster.primary，然后是 terraform apply
将 GKE 集群配置与 K8S 配置分开，为它们提供完全隔离的工作流，并通过 remote state 连接它们。

/terraform-gke/main.tf

terraform {
  backend "gcs" {
    bucket  = "tf-state-prod"
    prefix  = "terraform/state"
  }
}

provider "google" {
  region = "us-west1"
}

data "google_compute_zones" "available" {}

resource "google_container_cluster" "primary" {
  name = "the-only-marcellus-wallace"
  zone = "${data.google_compute_zones.available.names[0]}"
  initial_node_count = 3

  additional_zones = [
    "${data.google_compute_zones.available.names[1]}"
  ]

  master_auth {
    username = "mr.yoda"
    password = "adoy.rm"
  }

  node_config {
    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring"
    ]
  }
}

output "gke_host" {
  value = "https://${google_container_cluster.primary.endpoint}"
}

output "gke_username" {
  value = "${google_container_cluster.primary.master_auth.0.username}"
}

output "gke_password" {
  value = "${google_container_cluster.primary.master_auth.0.password}"
}

output "gke_client_certificate" {
  value = "${base64decode(google_container_cluster.primary.master_auth.0.client_certificate)}"
}

output "gke_client_key" {
  value = "${base64decode(google_container_cluster.primary.master_auth.0.client_key)}"
}

output "gke_cluster_ca_certificate" {
  value = "${base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)}"
}

在这里，我们通过outputs 公开所有必要的配置，并使用后端将状态以及这些输出存储在远程位置，在本例中为GCS。这使我们能够在下面的配置中引用它。

/terraform-k8s/main.tf

data "terraform_remote_state" "foo" {
  backend = "gcs"
  config {
    bucket  = "tf-state-prod"
    prefix  = "terraform/state"
  }
}

provider "kubernetes" {
  host = "https://${data.terraform_remote_state.foo.gke_host}"
  username = "${data.terraform_remote_state.foo.gke_username}"
  password = "${data.terraform_remote_state.foo.gke_password}"
  client_certificate = "${base64decode(data.terraform_remote_state.foo.gke_client_certificate)}"
  client_key = "${base64decode(data.terraform_remote_state.foo.gke_client_key)}"
  cluster_ca_certificate = "${base64decode(data.terraform_remote_state.foo.gke_cluster_ca_certificate)}"
}

resource "kubernetes_namespace" "n" {
  metadata {
    name = "blablah"
  }
}

这里可能不明显的是，必须在创建/更新任何 K8S 资源之前创建/更新集群（如果此类更新依赖于集群的更新）。

通常建议采用第二种方法（即使/如果错误不是一个因素并且跨供应商参考有效），因为它减少了爆炸半径并定义了更清晰的责任。此类部署通常由 1 个人/团队负责管理集群和不同的人/团队负责管理 K8S 资源，这是 (IMO) 常见的。

当然可能会有重叠 - 例如希望在新的 GKE 集群上部署日志记录和监控基础设施的运维人员，因此跨提供者依赖关系旨在满足此类用例。因此，我建议订阅上述 GH 问题。

【讨论】：