Troubleshoot the Data Plane deployment

Troubleshooting data plane deployment:

Follow below steps to track the progress with the data plane deployment and troubleshoot it.

  1. Check the status of the deployment, it is in progress.

    oc get openstackdataplanedeployments.dataplane.openstack.org
    oc get openstackdataplanenodesets.dataplane.openstack.org
    Sample output
    NAME                  NODESETS                  STATUS   MESSAGE
    openstack-edpm-ipam   ["openstack-edpm-ipam"]   False    Deployment in progress
    
     . . .
    
    NAME                  STATUS   MESSAGE
    openstack-edpm-ipam   False    Deployment in progress
  2. List running jobs

    oc get jobs
    Sample output
    NAME                                 COMPLETIONS   DURATION   AGE
    bootstrap-openstack-edpm-ipam        1/1           103s       2m27s
    cinder-db-purge-28585441             1/1           34s        11h
    >> download-cache-openstack-edpm-ipam   0/1           43s        43s <<
    glance-purge-cron-28585441           1/1           17s        11h
    keystone-cron-28585981               1/1           25s        155m
    keystone-cron-28586041               1/1           27s        95m
    keystone-cron-28586101               1/1           44s        35m

    Make note of the job in progress (not completed) in the above output.

  3. Check the logs of running job to track it’s progress and investigate any failure.

    Example:
    oc logs jobs/download-cache-openstack-edpm-ipam -f

Troubleshooting networking on compute node:

Netwroking setup of the data plane deployment could be error prone while you are tweaking your YAML files. You may loose the connectivity of the compute node due to any misconfiguration in the files. You may need to tweak the data plane network configuration files for appropriate networking setup (os-net-config) on the compute node.

If you are encountering errors related to any network related job, follow these steps:

  1. Access the compute node and review /etc/os-net-confg/config.yaml.

  2. Make necessary tweaks and run os-net-config command manually.

  3. Repeate above step until you get the networking setup as desired.

  4. You may loose the network access to the compute node due to application of misconfigured os-net-config settings.

  5. You may access the node over the console if it is not reachable on the network.

  6. Reset the network settings of the compute node with below steps:

    1. Log in to the compute node as the root user via console.

    2. Delete OVS bridges:

      ovs-vsctl del-br br-osp
      ovs-vsctl del-br br-ex
    3. Delete current network configuration files:

      rm -f /etc/sysconfig/network-scripts/ifcfg-*
    4. Reboot the compute node to restore network connectivity.

  7. Once a working configuration is achieved, update related changes in osp-ng-dataplane-node-set-deploy.yaml and osp-ng-dataplane-netconfig.yaml as per requirement.

  8. Apply the settings using oc apply command.

Re-run Data Plane Deployment:

During troubleshooting, if needed, delete the existing data plane deployment and re-run it:

Deleting the data plane node set and deployment will clean up all the related jobs and associated pods and you may start from scratch.

  1. Delete existing deployment:

    oc delete -f osp-ng-dataplane-deployment.yaml
    oc delete -f osp-ng-dataplane-node-set-deploy.yaml
  2. You may need to restore the compute node depending on where the deployment is failing.

    Run below command as root user from hyperviosr:

    virsh snapshot-revert --domain osp-compute0 preprovisioned
    Sample output
    [root@hypervisor ~]# virsh snapshot-revert --domain osp-compute0 preprovisioned
  3. Wait for a few minutes and then make sure you are able to connect to the compute node from the hypervisor

    ssh root@172.22.0.100
    Sample output
    [root@hypervisor ~]# ssh root@172.22.0.100
    Activate the web console with: systemctl enable --now cockpit.socket
    
    Register this system with Red Hat Insights: insights-client --register
    Create an account or view all your systems at https://red.ht/insights-dashboard
    Last login: Mon May 27 22:49:50 2024 from 172.22.0.254
    [root@edpm-compute-0 ~]#
  4. Go back to the bastion VM as root user and re-run the deployment:

    oc apply -f osp-ng-dataplane-node-set-deploy.yaml
    oc apply -f osp-ng-dataplane-deployment.yaml