Guided solution (page 1)

  1. Log in to your lab environment on the ROLE platform.

  2. Break the environment if you have not done it and then step through the fix.

    As the student user, run the lab start script on the workstation VM to reproduce the issue.

    cd /home/student/osp_training/scenarios_repo/
    ./lab start bfx021
    Sample output
    [student@workstation ~]$ lab start bfx021
    Running start action against scenario bfx021
    Run the following command:
    ssh -i /home/student/osp_training/.scenariobfx021/scenario-bfx021-key.pem cirros@192.168.51.98
    The IP address in the displayed output may differ in your case.
  3. Run the ssh command from the previous lab command output and notice the Connection timed out error. Also run the ping command against instance IP.

    ssh -i /home/student/osp_training/.scenariobfx019/scenario-bfx019-key.pem cirros@instance.ip.in.your.lab
    ping instance.ip.in.your.lab -c 1
    In the above commands replace the string instance.ip.in.your.lab with the actual IP address displayed in the output of the lab start script.
    Sample output
    [student@workstation ~]$ ssh -i /home/student/osp_training/.scenariobfx021/scenario-bfx021-key.pem cirros@192.168.51.98
    Connection closed by 192.168.51.98 port 22
    
    [student@workstation ~]$ ping 192.168.51.98 -c 1
    PING 192.168.51.98 (192.168.51.98) 56(84) bytes of data.
    64 bytes from 192.168.51.98: icmp_seq=1 ttl=62 time=0.949 ms
    
    --- 192.168.51.98 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.949/0.949/0.949/0.000 ms
    [student@workstation ~]$

    You observe that you can ping the instance but not able to ssh to it.

  4. To further diagnose, check the connectivity to port 22 (SSH) to the instance.

  5. Use either the nc (netcat) or telnet command to verify the connectivity.

    nc 192.168.51.98 22
    telnet 192.168.51.98 22

    The port 22 is indeed open on the instance, and basic network connectivity is established.

    • The problem might be related to authentication issues rather than network connectivity.

    • The SSH key being used for authentication might be problematic.

    • Inspect and ensure that the SSH key setup is correct.

  6. Run the SSH command in verbose mode (use the -vvv flag) to gain more detailed insights into the authentication process.

    ssh -i /home/student/osp_training/.scenariobfx019/scenario-bfx019-key.pem cirros@instance.ip.in.your.lab -vvv

    replace the string instance.ip.in.your.lab with the actual IP address displayed in the output of the lab start script.

    • Observe that authentication was attempted but ultimately failed.

    • This indicates a specific authentication issue.

    • Access the RHOSO environment to determine the specific compute node where the RHOSP instance is currently hosted.

  7. Use the openstack server list --long command to list instances and identify the compute node hosting your instance.

    oc exec -n openstack openstackclient -- openstack server list --long
    Sample output
    [student@workstation ~]$ oc exec -n openstack openstackclient -- openstack server list --long
    +--------------------------------------+--------------------+--------+------------+-------------+--------------------------------------------------------+---------------------+--------------------------------------+----------+-------------------+---------------------------+------------+-------------+
    | ID                                   | Name               | Status | Task State | Power State | Networks                                               | Image Name          | Image ID                             | Flavor   | Availability Zone | Host                      | Properties | Host Status |
    +--------------------------------------+--------------------+--------+------------+-------------+--------------------------------------------------------+---------------------+--------------------------------------+----------+-------------------+---------------------------+------------+-------------+
    | 859334f3-966b-4a4f-9cef-b923b5b168dc | scenario-bfx021-vm | ACTIVE | None       | Running     | scenario-bfx021-network=192.168.140.180, 192.168.51.98 | cirros-0.5.2-x86_64 | efe0f0f6-7cd4-4ff4-878e-8ffbccd738d7 | m1.small | nova              | compute01.srv.example.com |            | UP          |
    +--------------------------------------+--------------------+--------+------------+-------------+--------------------------------------------------------+---------------------+--------------------------------------+----------+-------------------+---------------------------+------------+-------------+
    [student@workstation ~]$

    Per above output, the instance is running on compute01.srv.example.com, this might vary in your case.

  8. Log in to the compute node where the instance is running. You might use SSH to connect to the node.

    ssh root@compute02.srv.example.com
    Sample output
    [student@workstation ~]$ ssh root@compute02.srv.example.com
    Activate the web console with: systemctl enable --now cockpit.socket
    
    Register this system with Red Hat Insights: insights-client --register
    Create an account or view all your systems at https://red.ht/insights-dashboard
    Last login: Mon Mar 31 09:31:34 2025 from 192.168.51.254
    
    [root@compute02 ~]#
  9. Investigate the metadata service on the compute node by verifying if the necessary metadata resources are correctly provisioned or not.

    podman ps -a | grep ovn_metadata_agent
    ip net
    podman ps -a | grep haproxy
    Sample output
    [root@compute01 ~]# podman ps -a | grep ovn_metadata_agent
    393aca2672c7  registry.redhat.io/rhoso/openstack-neutron-metadata-agent-ovn-rhel9@sha256:b01c35ee7c7bfb03c91981cbed675628e2a145cb9b0fb123370d4679907736f4  kolla_start  7 weeks ago  Up 8 days               ovn_metadata_agent
    [root@compute01 ~]#
    [root@compute01 ~]# ip net
    [root@compute01 ~]#
    [root@compute01 ~]# podman ps -a | grep haproxy
    [root@compute01 ~]#

    No metadata namespace listed here and haproxy container is not running. Hence metadata service seems to be not working on this node.

  10. Exit from compute to get back to the workstation vm.

  11. Now list the ports for the network associated with the instance.

    oc exec -n openstack openstackclient -- openstack port list --server scenario-bfx021-vm
[student@workstation ~]$ oc exec -n openstack openstackclient -- openstack port list --server scenario-bfx021-vm
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                             | Status |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+
| e123fcf9-74bc-416e-a627-b11b2b630c62 |      | fa:16:3e:e6:a9:e3 | ip_address='192.168.140.180', subnet_id='89db0de7-6d5a-498a-948d-2e1ab26185f6' | ACTIVE |
+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+
  1. Look for the port ID in the compute node’s ovn-mtadata-agent.log file.

  2. Login again to the associated compute node and execute below command.

    grep port_id /var/log/containers/neutron/ovn-metadata-agent.log

    Replace the port_id string with appropriate port id that was captured in the earlier command.

    Sample output
    [root@compute01 ~]# grep e123fcf9-74bc-416e-a627-b11b2b630c62 /var/log/containers/neutron/ovn-metadata-agent.log*
    /var/log/containers/neutron/ovn-metadata-agent.log.1:2025-06-25 06:10:05.342 2261 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingUpdatedEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None), priority=20 to row=Port_Binding(mac=['fa:16:3e:e6:a9:e3 192.168.140.180'], port_security=['fa:16:3e:e6:a9:e3 192.168.140.180'], type=, nat_addresses=[], virtual_parent=[], up=[False], options={'requested-chassis': 'compute01.srv.example.com'}, parent_port=[], requested_additional_chassis=[], ha_chassis_group=[], external_ids={'neutron:cidrs': '192.168.140.180/24', 'neutron:device_id': '859334f3-966b-4a4f-9cef-b923b5b168dc', 'neutron:device_owner': 'compute:nova', 'neutron:mtu': '', 'neutron:network_name': 'neutron-d806b6c9-ee50-4cdf-9a79-2f58f3f6b8ce', 'neutron:port_capabilities': '', 'neutron:port_name': '', 'neutron:project_id': 'd388b58059514443a8dced8c2ed691f6', 'neutron:revision_number': '2', 'neutron:security_group_ids': '9a6f20f9-de2e-4d40-824d-4856be72d0ed', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[], additional_encap=[], encap=[], mirror_rules=[], datapath=2b4395ab-7242-44ed-9d5d-7888266e2847, chassis=[<ovs.db.idl.Row object at 0x7fab2cb9dfa0>], tunnel_key=3, gateway_chassis=[], requested_chassis=[<ovs.db.idl.Row object at 0x7fab2cb9dfa0>], logical_port=e123fcf9-74bc-416e-a627-b11b2b630c62) old=Port_Binding(chassis=[]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43
    /var/log/containers/neutron/ovn-metadata-agent.log.1:2025-06-25 06:10:05.346 2261 INFO neutron.agent.ovn.metadata.agent [-] Port e123fcf9-74bc-416e-a627-b11b2b630c62 in datapath d806b6c9-ee50-4cdf-9a79-2f58f3f6b8ce bound to our chassis
    [root@compute01 ~]#
  3. Open the log file to see the detailed message.

  4. Exit from the compute node to get back to the workstation vm.