Kubernetes 集群一致性测试工具 Sonobuoy 介绍

Sonobuoy 是一种诊断工具,通过以可访问且无损的方式运行一组插件(包括Kubernetes一致性测试),可以更轻松地了解Kubernetes集群的状态。 这是一种可定制,可扩展且与群集无关的方式,可以生成有关群集的清晰,有用的报告。

Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of plugins (including Kubernetes conformance tests) in an accessible and non-destructive manner. It is a customizable, extendable, and cluster-agnostic way to generate clear, informative reports about your cluster.

运行 Sonobuoy 后会在集群创建一个 sonobuoy/sonobuoy 的 pod 用于收集测试信息,并且会根据测试插件逻辑不停的启动测试 pod 进行测试,待全部测试 pod 运行完成后会生成测试报告。

安装 sonobuoy

  1. 下载 sonobuoy 执行文件,版本与你 k8s 集群一致,比如要测试 1.19 的 k8s 集群,则下载 0.19 的sonobuoy release

Sonobuoy supports 3 Kubernetes minor versions: the current release and 2 minor versions before. Sonobuoy is currently versioned to track the Kubernetes minor version to clarify the support matrix. For example, Sonobuoy v0.14.x would support Kubernetes 1.14.x, 1.13.x, and 1.12.x.

  1. tar -xvf <RELEASE_TARBALL_NAME>.tar.gz 解压至执行目录。

预置测试镜像(若节点可直接拉取镜像,可跳过)

如果待测试集群有翻墙下载镜像条件,可忽略此步骤。提供下列两种方式预置镜像

手动预置镜像

  1. 使用 sonobuoy 命令获取测试使用的全量镜像列表
1
2
export TEST_KUBECONFIG=~/.kube/config
./sonobuoy --kubeconfig=${TEST_KUBECONFIG} images
  1. 给待测试 k8s 集群每个 k8s-node 节点提前预置好镜像(以 conformance 镜像为例)
1
2
3
4
5
6
7
8
# 在可翻墙的节点执行以下命令,{{ k8s-node-ip }} 替换成待测试集群 k8s node 的IP
docker pull k8s.gcr.io/conformance:v1.19.3
docker save k8s.gcr.io/conformance:v1.19.3 -o ./conformance:v1.19.3
scp ./conformance:v1.19.3 root@{{ k8s-node-ip }}:/tmp/

# ssh 到每一个 k8s node,执行以下命令加载镜像
# ctr 在 k8s 集群中需要指定 namespace(如果是 docker 可以改成 docker load 命令)
ctr -n k8s.io image import /tmp/conformance\:v1.19.3

脚本自动化预置镜像

  1. 在可翻墙节点执行以下脚本。
1
2
3
4
5
6
7
8
9
10
vi downloadSonobuoyImages.sh
chmod +x downloadSonobuoyImages.sh

# /Users/cbs/test 是 sonobuoy 二进制文件所在目录的上级目录,sonobuoy 目录为 /Users/cbs/test/sonobuoy
# /Users/cbs/.kube/config 是待测试集群 kubeconfig 文件路径
# /tmp/sonobuoy 是产生镜像 tar.gz 存储目录,需要提前 mkdir
./downloadSonobuoyImages.sh /Users/cbs/test /Users/cbs/.kube/config /tmp/sonobuoy

cd /tmp/sonobuoy
scp sonobuoyimages.tar.gz root@{{ k8s-node-ip }}:/tmp/

downloadSonobuoyImages.sh 脚本参考如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
set -o nounset

# 该脚本会下载所有 sonobuoy 运行所需所有镜像并在脚本执行路径打包成sonobuoyimages.tar.gz
# sonobuoy 二进制文件所在目录的上级目录,如 sonobuoy 的目录为 /Users/cbs/test/sonobuoy,这个参数应该输入 /Users/cbs/test.
sonobuoy_path=$1
# 待测试集群kubeconfig文件路径,需要根据测试集群版本获取全量镜像列表
kubeconfig_path=$2
# 输出镜像文件地址,需要提前mkdir
output_path=$3

export PATH=$PATH:$sonobuoy_path

image_count=1
for image in `sonobuoy --kubeconfig=$kubeconfig_path images`
{
docker pull ${image}
docker save ${image} -o ${output_path}/${image_count}.tar.gz
docker rmi ${image}
image_count=`expr ${image_count} + 1`
}

cd ${output_path}
tar -zcvf sonobuoyimages.tar.gz *.tar.gz
  1. ssh 到每一个 k8s node,执行以下命令加载全量镜像。
1
2
3
4
5
vi loadSonobuoyImages.sh
chmod +x loadSonobuoyImages.sh
# /tmp/sonobuoyimages.tar.gz 是 sonobuoy 镜像压缩包地址
# /tmp/sonobuoy 是解压 sonobuoyimages.tar.gz 产生镜像的中间目录,需要提前 mkdir
./loadSonobuoyImages.sh /tmp/sonobuoyimages.tar.gz /tmp/sonobuoy

loadSonobuoyImages.sh 脚本代码参考如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 将打包的 sonobuoy 镜像通过 ctr (或者 docker)加载到节点上
# sonobuoy 镜像压缩包地址.
sonobuoytar_path=$1
# 输出镜像文件地址,需要提前 mkdir,并且该目录下没有其他 tar.gz 文件
output_path=$2

tar Pxvfz $sonobuoytar_path -C ${output_path}/
cd ${output_path}
for image in $(ls | grep tar.gz)
{
ctr -n k8s.io image import $image
}

# ctr -n k8s.io i tag docker.io/sonobuoy/sonobuoy:v0.19.0 sonobuoy/sonobuoy:v0.19.0
# ctr -n k8s.io i tag docker.io/sonobuoy/systemd-logs:v0.3 sonobuoy/systemd-logs:v0.3

修改集群拉取镜像配置

登录每个master节点,修改 kube-apiserver 的启动命令,删除启动命令中的 AlwaysPullImages ,等待apiserver重启。

1
2
3
4
5
6
7
8
9
10
11
12
vi /etc/kubernetes/manifests/kube-apiserver.yaml

spec:
containers:
- command:
- kube-apiserver

...

- --enable-admission-plugins=NodeRestriction,DenyEscalatingExec,AlwaysPullImages,EventRateLimit,PodSecurityPolicy

...

运行 sonobuoy

等待测试完成 时间会比较久,默认超时时间为3h。

1
2
3
4
5
6
export TEST_KUBECONFIG=~/.kube/config
./sonobuoy --kubeconfig=${TEST_KUBECONFIG} --image-pull-policy=IfNotPresent run --wait

# 运行过程中,可以在其他终端执行以下命令查看运行状态

./sonobuoy --kubeconfig=${TEST_KUBECONFIG} status

查看测试结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
results=$(./sonobuoy  --kubeconfig=${TEST_KUBECONFIG} retrieve)
./sonobuoy results $results

---output
Plugin: e2e
Status: failed
Total: 5233
Passed: 283
Failed: 20
Skipped: 4930

Failed tests:

[sig-network] DNS should provide /etc/hosts entries for the cluster [LinuxOnly] [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should not be able to mutate or prevent deletion of webhook configuration objects [Conformance]
[sig-network] Services should have session affinity timeout work for service with type clusterIP [LinuxOnly] [Conformance]
[sig-network] Proxy version v1 should proxy through a service and a pod [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] patching/updating a mutating webhook should work [Conformance]
[sig-network] Networking Granular Checks: Pods should function for intra-pod communication: http [NodeConformance] [Conformance]
[sig-cli] Kubectl client Update Demo should create and stop a replication controller [Conformance]
[sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert a non homogeneous list of CRs [Conformance]
[sig-network] Networking Granular Checks: Pods should function for node-pod communication: http [LinuxOnly] [NodeConformance] [Conformance]
[sig-network] Services should have session affinity timeout work for NodePort service [LinuxOnly] [Conformance]
[k8s.io] KubeletManagedEtcHosts should test kubelet managed /etc/hosts file [LinuxOnly] [NodeConformance] [Conformance]
[sig-cli] Kubectl client Update Demo should scale a replication controller [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should honor timeout [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate custom resource with pruning [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should be able to deny attaching pod [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should mutate custom resource with different stored version [Conformance]
[sig-api-machinery] Aggregator Should be able to support the 1.17 Sample API Server using the current Aggregator [Conformance]
[sig-api-machinery] CustomResourceConversionWebhook [Privileged:ClusterAdmin] should be able to convert from CR v1 to CR v2 [Conformance]
[sig-network] Networking Granular Checks: Pods should function for node-pod communication: udp [LinuxOnly] [NodeConformance] [Conformance]
[sig-api-machinery] AdmissionWebhook [Privileged:ClusterAdmin] should unconditionally reject operations on fail closed webhook [Conformance]

Plugin: systemd-logs
Status: passed
Total: 5
Passed: 5
Failed: 0
Skipped: 0

PS:如果sonobuoy retrieve报错error retrieving results: error: tmp/sonobuoy no such file or directory,则代表测试结果因为某些原因丢失了,可以重新跑一次生成新结果再查看。

PS:如果超时可以尝试kubectl --kubeconfig=${TEST_KUBECONFIG} get csr | grep Pending | awk '{print $1}' | xargs -I {} kubectl --kubeconfig=${TEST_KUBECONFIG} certificate approve {}

调试失败用例

sonobuoy results 查看详细日志

results 相关命令参考,需要提前安装 jq(brew install jq

1
./sonobuoy results $results --mode=detailed | jq 'select(.status=="failed")'

或者直接查看 detailed 日志文件

1
2
./sonobuoy results $results --mode=detailed > /tmp/sonobuoy.log
cat /tmp/sonobuoy.log | grep "\"status\":\"failed\""

直接查看生成日志

  1. 可以直接解压日志文件查看详细信息,执行./sonobuoy --kubeconfig=${TEST_KUBECONFIG} retrieve后会在当前文件生成{{ data }}_sonobuoy_{{ UUID }}.tar.gz文件(可以通过echo $results查看文件名称)。
  2. 解压该文件后查看/podlogs/<namespace>/<podname>/logs/<plugin-name>.txt日志可获取具体用例失败信息,文件详细目录参考
  3. 若相关日志无法明确错误信息,需要查看对应用例代码寻找根因,定位过程参考 sonobuoy/issues/1151

清理 sonobuoy

该步骤会删除 sonobuoy 测试过程中生成的 k8s 资源,如果用例执行失败,会有插件 pod 残留,需要手动删除。

1
./sonobuoy --kubeconfig=${TEST_KUBECONFIG} delete --wait