Exploring etcd snapshots with koff
We have looked at etcd performance and how critical it is to cluster health, but what if we want to see what’s in etcd?
If you have existing etcd backups / snapshots or want to take one now, there is a lot you can do and look at!
If you deleted someone important and want to get it back, you can get it from a backup and recreate it in your cluster.
If your etcd database is huge, you can parse through a fresh snapshot and see where the issues are without touching production.
Accessing and parsing your etcd data isn’t a daily occurance, but when the time comes, it’s incredibly useful.
In this module we will utilize the koff tool to showcase get and inspect commnads on both standard Kubernetes objects types as well as traditionally stored etcd keys.
koff use command
-
To get started you want to verify that
koffis installed in your path and working -
Once you have confirmed this, you want to call the
koff usecommand to tellkofftheetcd.dbto utilize for review -
Then run the
koff get -hto view the available flags.
cd ~/Module10/
koff version
koff version: v1.0.1
hash: 01cddf8
https://github.com/gmeghnag/koff
koff use module10-etcd_snapshot_2024.db
koff get -h
Usage:
koff get [flags]
Flags:
-A, --all-namespaces Show resources across all namespaces.
-h, --help help for get
-n, --namespace string Namespace.
--no-headers Hide headers.
-o, --output string Output format. One of: json|yaml|wide
-K, --show-kind Show kind.
--show-managed-fields Show managedFields when output is one of: json, yaml.
-N, --show-namespace Show namespace.
Viewing resources in etcd
The koff get command allows you to get any resource inside the etcd database like you were using oc
View all of the cluster namespaces with koff get namespaces:
koff get namespaces
NAME STATUS AGE
namespace/aap Active 1y
namespace/aap25 Active 102d
namespace/agilleyaap Active 329d
namespace/anztam Active 136d
namespace/camel-test Active 1y
namespace/cert-manager Active 135d
namespace/cert-manager-operator Active 135d
namespace/cnpg-system Active 1y
Get all of the pods running inside a namespace with koff get pods -n <namespace>
koff get pods -n aap
NAME READY STATUS RESTARTS AGE
ansible-lightspeed-operator-controller-manager-bccffff68-wqj5h 2/2 Running 91 102d
automation-controller-operator-controller-manager-85b5464dldtnw 2/2 Running 89 102d
automation-hub-operator-controller-manager-847659f4bf-ll4jm 2/2 Running 97 102d
eda-server-operator-controller-manager-6df5c675c8-dxqcv 2/2 Running 56 75d
example-pah-api-6dd5bb466-g5kv5 0/1 Pending 0 83d
Continue to explore the etcd database with some addition koff get commands, just like you would use oc get against a running cluster.
koff get service -A
koff get endpoints -A
koff get configmaps -A
koff get daemonset -A
koff get deployment -A
Inspecting the etcd database
The koff etcd inspect command allows you to inspect the etcd databse in it’s true key/value pair structure
View all of the object inside the etcd database:
koff etcd inspect module10-etcd_snapshot_2024.db
/kubernetes.io/apiregistration.k8s.io/apiservices/v1.apiextensions.k8s.io
/kubernetes.io/prioritylevelconfigurations/system
/kubernetes.io/apiregistration.k8s.io/apiservices/v1.apps
/kubernetes.io/apiregistration.k8s.io/apiservices/v1.
/kubernetes.io/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io
Inspect a specifc resource, in this case a node (actually called a minion in kubernetes), to see it’s full configuration:
koff etcd inspect module10-etcd_snapshot_2024.db /kubernetes.io/minions/master-1.ocp413shared.tamlab.brq2.redhat.com
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"annotations": {
"machineconfiguration.openshift.io/controlPlaneTopology": "HighlyAvailable",
"machineconfiguration.openshift.io/currentConfig": "rendered-master-151f24e2087c2b5521a7b9af2c4d5b16",
"machineconfiguration.openshift.io/desiredConfig": "rendered-master-151f24e2087c2b5521a7b9af2c4d5b16",
"machineconfiguration.openshift.io/desiredDrain": "uncordon-rendered-master-151f24e2087c2b5521a7b9af2c4d5b16",
"machineconfiguration.openshift.io/lastAppliedDrain": "uncordon-rendered-master-151f24e2087c2b5521a7b9af2c4d5b16",
"machineconfiguration.openshift.io/lastSyncedControllerConfigResourceVersion": "635663717",
"machineconfiguration.openshift.io/reason": "",
"machineconfiguration.openshift.io/state": "Done",
"nfd.node.kubernetes.io/master.version": "undefined",
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
},
"creationTimestamp": "2023-08-18T21:08:47Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "master-1.ocp413shared.tamlab.brq2.redhat.com",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": "",
"node-role.kubernetes.io/master": "",
"node.openshift.io/os_id": "rhcos"
},
Maybe you deleted an important ConfigMap or Secret and needed to get it back from a backup!
koff etcd inspect module10-etcd_snapshot_2024.db /kubernetes.io/configmaps/firewall-rule/httpd-ex-1-sys-config
"apiVersion": "v1",
"kind": "ConfigMap",
"metadata": {
"creationTimestamp": "2024-05-02T04:11:08Z",
"name": "httpd-ex-1-sys-config",
"namespace": "firewall-rule",
koff etcd inspect module10-etcd_snapshot_2024.db /kubernetes.io/secrets/quay/tamlab-quay-config-secret-98gh285gcd
"apiVersion": "v1",
"data": {
"config.yaml": ""
},
"kind": "Secret",