Diagnosing cluster issues

This page is for infrastructure operators.

This page describes how to use the actl command-line tool in Anthos private mode to diagnose problems with clusters. The diagnose command generates an archive file that contains a collection of logs capturing the state of the cluster.

Introduction

You can capture the state of your clusters with the actl diagnose command. The diagnostic information can help you discover issues and debug your deployments more effectively. The command captures all relevant cluster and node configuration files for your defined scope, and then packages the information into a single tar archive. The flags for the command let you choose the diagnostic scope of the command.

actl diagnose

Use the actl diagnose command to troubleshoot issues with clusters. This command compresses a cluster's status, configurations, and logs into a tar file. The default configuration of the command captures the following information about your cluster:

  • Kubernetes version
  • Status of Kubernetes resources in the kube-system and APM's controllers namespaces: cluster, machine, nodes, Services, Endpoints, ConfigMaps, ReplicaSets, CronJobs, Pods, and the owners of those Pods, including Deployments, DaemonSets, and StatefulSets.
  • Status of the user control plane if the target cluster is a user cluster (the user cluster's control plane runs in the admin cluster).
  • Details about each node configuration including IP addresses, iptables rules, mount points, file system, network connections, and running processes.
  • Container logs from the admin cluster's control-plane node, when the Kubernetes API server is not available.
  • Information in the Istio system, including Pods, services, deployments, endpoints, secrets, configmaps, current and previous logs from all Istio components and sidecar, and all Istio configuration artifacts.
  • Information in Config Sync, including configurations in config-management-system related namespaces.
  • Logs from the actl diagnose command.

Create a snapshot from a scenario

The actl diagnose command supports six scenarios. To specify a scenario, use the --scenario flag to collect snapshots for any of the following configurations:

  • all: (default) Includes all the predefined scenarios, including: auth, config-management, kubernetes, management-center, observability and service-mesh.
  • auth
  • config-management
  • kubernetes
  • management-center
  • observability
  • service-mesh

You can use each of the six scenarios with the admin cluster. To create a snapshot of the admin cluster using the all scenario:

actl diagnose \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG\
    --cluster=CLUSTER_NAME\
    --scenario=all

Replace the following:

  • ADMIN_CLUSTER_KUBECONFIG: the admin cluster's kubeconfig file.
  • (Optional) CLUSTER_NAME: the name of the admin cluster. If you do not know the cluster name, omit this flag and you are prompted to select a cluster to snapshot.

The output includes a list of files and the name of a tar file:

Using ["all"] snapshot configuration...
? Choose a cluster (namespace/name) cluster-admin/admin
Taking snapshots in 10 thread(s)...
  kubectlCommands/anthos-management-center-operator/kubectl_get_updateitems
  kubectlCommands/kubectl_cluster-info
  kubectlCommands/kubectl_version
  kubectlCommands/anthos-management-center/kubectl_logs_git-server-0_--container_git-server_--since_24h0m0s
  kubectlCommands/anthos-management-center/kubectl_get_deployments
  ...
  nodes/10.200.0.5/files/lib/systemd/system/docker.service
  nodes/10.200.0.4/files/lib/systemd/system/docker.service
  ...
  istioCommands/istioctl_bug-report

Snapshot succeeded.
Snapshots saved in "[TAR_FILE_PATH]/[TAR_FILE_NAME].tar.gz".

Limit a snapshot to a time period

You can use the --log-since flag to limit log collection to a recent time period. For example, you might collect the logs from the last two days or the last three hours. By default, diagnose collects logs from the last 24 hours. This flag is supported only for kubectl logs.

To limit the time period for log collection:

actl diagnose \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster=CLUSTER_NAME \
    --scenario=all \
    --log-since=DURATION

Replace DURATION with a time value such as 2d or 3h. The default duration is 24h.

Perform a dry run for a snapshot

You can use the --dry-run flag to show the actions to be taken and the snapshot configuration.

To perform a dry run on your admin cluster:

actl diagnose \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster=CLUSTER_NAME \
    --dry-run

Customize a snapshot configuration

If the six scenarios don't meet your needs, you can create a customized snapshot. You can either generate a configuration and copy and paste it into a new configuration file, or you can create a configuration file from scratch. Then you can create the snapshot from the custom configuration file.

Option 1: Generating a snapshot configuration

You can generate a snapshot configuration for a given scenario by passing in the --scenario and --dry-run flags. For example, to see the snapshot configuration for the all(default) scenario of a cluster, enter the following command:

actl diagnose \
    --scenario=all  \
    --dry-run

Here's an example of the output:

ExcludeWords:
- certificateAuthorityData
- password
IstioBugReport:
  Enabled: true
KubectlCommands:
- Commands:
  - kubectl get deployments
  - kubectl get deployments -o yaml
  - kubectl get pods
  - kubectl get pods -o yaml
  - kubectl get secret
  - kubectl logs
  - kubectl get gateways -o yaml
  Namespaces:
  - istio-system
NodeCommands:
- Commands:
  - uptime
  - df --all --inodes
  - ip addr
  - iptables-save --counters
  - mount
  - ip route list table all
  - top -bn1
  - docker info
  - docker ps -a
  - ps -edF
  - ps -eo pid,tid,ppid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm,args,cgroup
  - conntrack --count
  Nodes: []
NodeFiles:
- Files:
  - /proc/sys/fs/file-nr
  - /proc/sys/net/netfilter/nf_conntrack_max
  - /lib/systemd/system/kubelet.service
  - /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
  - /lib/systemd/system/docker.service
  - /etc/docker/daemon.json
  Nodes: []
NomosBugReport:
  Enabled: true
NumOfParallelThreads: 10

You can copy and paste the output of this command from the command-line interface into a new configuration file that you create, for example myconfig.yaml. You can edit the following values:

  • ExcludeWords: list of words to exclude from the snapshot (case insensitive). Lines containing these words are removed from snapshot results. "password" is always excluded, whether or not you specify it.
  • IstioBugReport: a flag to enable the istioctl bug-report snapshot.
  • KubectlCommands: list of kubectl commands to run. The commands run against the corresponding namespaces. For kubectl logs commands, all Pods and containers in the corresponding namespaces are added automatically. Regular expressions are supported for specifying namespaces. If you do not specify a namespace, the default namespace is assumed.
  • NodeCommands: list of commands to run on the corresponding nodes. The results are saved. When nodes are not specified, all nodes in the target cluster are considered.
  • NodeFiles: list of files to collect from the corresponding nodes. The files are saved. When nodes are not specified, all nodes in the target cluster are considered.
  • NomosBugReport: a flag to enable the nomos bugreport snapshot.
  • NumOfParallelThreads: number of parallel threads used to take snapshots.

Option 2: Manually define a custom snapshot configuration file

Create a YAML file with the snapshot parameters that you want, for example:

ExcludeWords:
- certificateAuthorityData
- password
NumOfParallelThreads: 10
KubectlCommands:
- Commands:
  - kubectl get deployments
  - kubectl get deployments -o yaml
  - kubectl get pods
  - kubectl get pods -o yaml
  - kubectl get secret
  - kubectl logs
  - kubectl get gateways -o yaml
  Namespaces:
  - istio-system

Create a snapshot using the custom snapshot configuration

Pass in your custom snapshot configuration file by using the --snapshot-config flag:

actl diagnose \
    --kubeconfig=ADMIN_CLUSTER_KUBECONFIG \
    --cluster=CLUSTER_NAME \
    --snapshot-config=SNAPSHOT_CONFIG_FILE

Replace SNAPSHOT_CONFIG_FILE with the name of your custom snapshot configuration file, for example myconfig.yaml.

Review the snapshot contents

To review the contents of the tar file, extract it with the following command:

tar -zxf TAR_FILE_PATHTAR_FILENAME.tar.gz  --directory EXTRACTION_DIRECTORY_NAME

Review the contents of the tar file before attaching it in your interactions with support.