Managing user clusters using Terraform

This page is for platform administrators.

This page describes how to manage user clusters using Terraform. This is a preview feature.

Prerequisites

You must have an account with access to admin cluster. This account must have correct permissions to manage user clusters. This account needs a subset of the platform administrator's access plus "list" access on "customresourcedefinitions" in the "apiextensions.k8s.io" api group. You will use this account to configure the Terraform Kubernetes provider so that it can manage the user cluster related resources. The account must have the following specific permissions:

  • Full access on namespaces for the user clusters in the admin cluster, for example, cluster-user-cluster2
  • Full access on "clusters" and "nodepools" in "baremetal.cluster.gke.io"
  • Full access on "bootstrapservicebindings", "configmanagementfeaturespecs", "configmanagementbindings", "servicemeshfeaturespecs", "servicemeshbindings" in "managementcenter.anthos.cloud.google.com"
  • "list" access on "customresourcedefinitions" in the "apiextensions.k8s.io"
  • If you need to install ACM and ASM, you also need access to get the secret in the user cluster namespaces in the admin cluster so that you can get the credentials to connect to user cluster.

Download Terraform scripts

Download Terraform scripts tarball file from support tab of Anthos Management Center Console. Extract scripts from the file.

tar xvf terraform-scripts.tar.gz

Init Terraform and set up authentication

There are many ways to set up the Terraform Kubernetes provider to authenticate with the admin cluster API. Choose the way that best suits your use case documented in Provider Setup. If you don't have enough permissions in the admin cluster, contact your Infrastructure Operator.

cd cluster
terraform init

Replace (# TODO authentication) with one of the Provider Setup authentication methods.

Create user cluster

  • Specify all variables in cluster/terraform.tfvars. Make sure to specify login_user_name if it's not 'root'. Example:
admin_cluster_endpoint = "https://10.200.0.100:443"
cluster_name      = "user-cluster2"
control_plane_nodes = [
  { "address" = "10.200.0.26" },
  { "address" = "10.200.0.21" },
  { "address" = "10.200.0.22" },
]
control_plane_vip = "10.200.0.110"
nodepool_nodes = [
  { "address" = "10.200.0.23" },
  { "address" = "10.200.0.25" },
  { "address" = "10.200.0.24" },
]
address_pools = [
  {
    "addresses" = ["10.200.0.111-10.200.0.119"]
    "name"      = "pool1"
  },
]
bootstrap_service            = ["test-bootstrapservice", "test-bootstrapservice-2"]
anthos_baremetal_version     = "1.8.0"
  • Apply the changes. After the command completes successfully, the user cluster is created.
terraform apply

Destroy a user cluster

terraform destroy

A wait time is set to control the destruction of resources—you may want to adjust it by updating the wait_duration variable. The reason for the wait time is due to missing features in the Kubernetes provider (see this feature request).

When a user cluster creation is pending due to some issues, do not use terraform destroy. Instead, manually clean up the installation:

  • Stop terraform command
  • Delete the user cluster from the management console
  • Remove the terraform.tfstate file

Enable ACM and ASM

  • Init Terraform for the features Terraform project.
cd features
terraform init
  • Set up authentication to the admin cluster.
  • Place the private key to git repo in /etc/ssh-key/key.
  • Set activate_acm and activate_asm to true and define all related variables. Example:
cluster_name                 = "user-cluster2"
admin_cluster_endpoint       = "https://10.200.0.100:443"
user_cluster_ca_certificate  = ""
activate_acm                 = true
version_acm                  = "1.7.1"
enable_acm_policy_controller = true
acm_git_repo = {
  git_repo_url    = "git@github.com:example/example.git"
  git_repo_branch = "main"
  git_policy_dir  = "."
  git_secret_type = "ssh"
}
activate_asm = true
version_asm  = "1.9.6-asm.1"
  • Apply the changes.
terraform apply
  • (optional) Adjust the wait time for ASM if needed. You can do this by updating the wait_duration variable.

Disable ACM and ASM

Before deleting ACM, we recommend deleting all resources synced by ACM in the cluster.

Delete the git-creds secret and config-management-system from tfstate. The ACM operator is in charge of deleting resources inside the user cluster, after ConfigManagementBinding and ConfigManagementFeatureSpec are deleted. ACM isn't shown as uninstalled in the Management Center immediately; it can take between 5-10 minutes for the operator to delete everything.

terraform state rm kubernetes_namespace.config-management-system
terraform state rm kubernetes_secret.git-creds
terraform destroy

Known issues

  • Currently it is not possible to update a cluster. When launching an update, the provider tries to delete the finalizers on existing resources due to issue 1378.
  • The destroy command on user cluster and ASM relies on a wait time due to issue 1357. You might need to adjust the wait time depending on the size of your cluster.
  • The cluster shows ready in terraform faster than what is shown in Anthos Management Center (blocked by support of data of kubernetes_manifest by the provider).
  • The deletion of ACM by using terraform destroy requires removing resources from tfstate first (see "Disable ACM and ASM"). ACM isn't shown as uninstalled in the Management Center immediately; it can take between 5-10 minutes for the operator to delete everything.
  • The terraform destroy command deletes the user cluster namespace. The platform admins do not have rights to delete namespaces in the admin cluster. Therefore the command might fail.

Limitations

  • The current code lets you to create a single user cluster.
  • The current code doesn't include the configuration of OIDC. This part requires patch capabilities on the Kubernetes provider on the kubernetes_manifest resource.