OpenShift 4.x Routers…

Kamlesh Prajapati
8 min readDec 10, 2022

Before we actually start talking about router lets understand how pods and services are running on the cluster.

When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients. The Ingress Operator implements the IngressController API and that is the component responsible for enabling external access to OpenShift cluster services.

Now lets talk about the below points which will help for sure to get more understanding about the router its configuration and traffic flow along with troubleshooting tips.

  1. Ingress Operator.
  2. HAProxy-based Ingress Controller.
  3. Few Ingress Controller configuration parameters.
  4. How traffic is processed a IngressController ?
  5. Ingress Controller sharding.
  6. Logging with IngressController.
  7. Troubleshooting.

Lets discuss the point mentioned above one by one for detailed understanding..

Ingress Operator:

Ingress operator is an OpenShift container platform component which enables external access to cluster services and applications by configuring ingress controllers.
The operator makes this possible by deploying and managing one or more HAProxy-based Ingress Controllers which route traffic as specified by an OpenShift ‘Route’.
The installation program generate an asses with an Ingress resource and stores this asset in the cluster-ingress-02-config.yaml file in the manifests/directory.
This Ingress resource defines the cluster-wide configuration for Ingress. This Ingress configuration is used as follows:
1. The Ingress Operator uses the domain from the cluster Ingress configuration as the domain for the default IngressController.

2. This domain is also used when generating a default host for a Route resource that does not specify an explicit hostname.

<Installation_dir>/manifests/cluster-ingress-02-config.yml
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
name: cluster
spec:
domain: apps.example.com
appsDomain: <test.example.com>

You can also check the config detail by running the below command
$oc get ingresses.config -o yaml

HAProxy-based Ingress Controller:
Every new openshift installation gets an HAProxy based IngressController names ‘default’ within ‘openshift-ingress-operator’ namespace which can be customized, replaced, or supplemented with additional ingress controllers.

Few Ingress Controller configuration parameters:-
replicas
: As the name suggests, it sets the number of haproxy router pod replicas within ‘openshift-ingress’ namespace.

endpointPublishingStrategy: This field is used to publish the ingress controller endpoints to other networks, enable load balancer integrations, etc. If not set, the default value is based on infrastructure.config.openshift.io/cluster .status.platform: AWS: LoadBalancerService ; Azure: LoadBalancerService ; GCP: LoadBalancerService ; Bare metal: NodePortService ; Other:defaults to HostNetwork.
domain: It is a DNS name serviced by the ingress controller and is used to configure multiple features, like — For the LoadBalancerService endpointpublishingstrategy, domain is ised to configure DNS records. When using a generated default certificate, the certificate is valid for domain and its subdomains. The value is published to individual Route statuses so that users know where to target external DNS records. The domain value must be unique among all the ingress controllers and connot be updated.
namespaceSelector: It is used to filter the set of namespaces serviced by the ingress controller. Only useful when router sharding is implemented.

routeSelector: It is used to filter the set of individual ‘route’ serviced by the ingress controller. Only useful when router sharding is implemented.

defaultCertificate: The defaultCertificate value is a reference to a secret that contains the default certificate that is served by the ingress controller. When Routes do not specify their own certificate, defaultCertificate is used.

logging: By default it is not enabled with which the usual router pods logs get generated but to generate haproxy communication access logs, one needs to enable this.

httpHeaders: It defines the policy for HTTP headers, you specify when and how the ingress controller sets the Forwarded, X-Forwarwed-For, X-Forwarded-Host, Forwarded-Port, Forwarded-Proto, and Forwarded-Proto-Version HTTP headers.

How traffic is processed a IngressController:

Ingress Controller sharding:

Ingress Controller sharding is useful when balacing incoming traffic load among a set of Ingress Controllers and when isolating traffic to a specific Ingress Controller. For example, company A goes to one Ingress Controller and company B to another.

Sharding helps us in:

  1. Balancing Ingress Controller, or routers, with several routes to speed up responses to changes.
  2. Allocate certain routes to have different reliability guarantees than other routes.
  3. Allow certain Ingress Controllers to have different policies defined.
  4. Allow only specific routes to use additional features.
  5. Expose different routes on different addresses so that internal and external users can see different routes.

Ingress Controller can use either route labels or namespace labels as a sharding method.
Using route labels means that the Ingress Controller servers as route in any namespace that is selected by the route selector.

Using namespace labels means that the Ingress Controller serves any route in any namespaces that is selected by the namespace selector.

Example:

IngressController Sharding — route labels:

IngressController Sharding — namespace labels:

Logging with IngressController:

We can configure the Ingress Controller to enable access logs. The logs can be gathered either to a sidecar container within the router pod OR if it is a very high traffic cluster, then to avoid exceeding the capacity of the logging stack or to integrate with some logging infrastructure outside of the openshift cluster , we can also foward the logs to a custom syslog server.

Configure logging to a sidecar container.

Logging with IngressControoler.

If log format is not suitable to your requirement you can define as per individual requirement.

Collecting logs:

Access logs after configuring:
$oc -n openshift-ingress logs <router-default pod-name> -c logs > file.tx
(If logging is configured to a sidecar container)

Run above command for all the router-default pods to capture logs from all the pods.

Ingress controller operator logs:
$oc logs -n openshift-ingress-operator deployments/ingress-operator > file.txt

Troubleshooting:

Expected prior knowledge for basic network troubleshooting;

Most of the time it is important to have a base knowledge of traditional networking to troubleshoot the openshift networking issues.

One should be familiar with following:

  1. IP Subnetting
  2. ARP
  3. ICMP
  4. Linux routing
  5. DNS communication
  6. TCP
  7. UDP
  8. iptables
  9. OCP basic packet flow OR the hops the packets takes while travelling.

Before starting the troubleshooting it is important to get a little background of the issue below;

  • What is the starting point of the communication.
  • What is the intended destination of the communication.
  • How the connections are originated ? Via some application or manually ?
  • The destination is IP/URL? If IP then is it an SVC / POD or some external system IP? If URL then is it getting resolved successfully, the resolved IP is of what source/node/system, etc?
  • Considering various hops the packet takes during travelling, it is important to know upto what point the packets have reached before getting dropped.
  • Connections are failing with what error message?

Curl to application ‘route’ — from outside the cluster not working !

Let say the route is — “apache.apps.example.com” , so can narrow down the issue as below;

  1. Does the DNS resolution of “apache.apps.example.com” happens? Try doing ‘nslookup’/’dig’ to the URL in question.
  2. Say the DNS works , next, the IP to which the name resolves is of which system ? Is it of LB or some other device/cluster node?

To narrow down if its an outside cluster environment issue or internal OpenShift Application issue:

3. Does the curl to the SVC of the same project/application works if done from one of the cluster nodes?

4. Does curl if done directly to the endpoint application pod from one of the cluster nodes works?

If the failure at point 4 occurs the obviously the route from outside and point 3 will fail. Now if the curl fails, how about ping to the pods IP?(via curl we are connecting to the application running within the pod and via ping we are just connecting to the pod IP). If the ping works and curl fails most probably its an issue with application running withing the pod. To check if the application port is in listening state or not can do $telnet <pod-ip> <port>. If telent fails, it means port is NOT in listening state most probably because application is not running. (If this occurs, mostly the readiness probs should also fail.)

If 4 (curl from the same and different nodes) works and curl to ‘route’ from outside fails, check the status of the ingress controller pods. Check the haproxy config file and verify it has the configuration correctly corresponding to the ‘route’ in question.

Finally to check if the packets from outside are reaching toll openshift/haproxy ? Are they getting forwarded correctly to the correct node? Are the packets reaching the endpoint, we will need to tcpdump capture at multiple hops.

To narrow down if it is an ingresscontroller pod issue or the outside LB issue, one can try below curl command from one of the cluster nodes.

$curl -kvv — resolve <route-url>:443-or-80:<node-ip or ingresscontroller-pod-ip> https://<route-url>/path-if-any

Above command will bypass the external LB we are making the route-url to resolve to the either node-ip on which ingresscontroller pod is running(if type hostNetwork) or ingresscontroller-pod-ip.
If this works but curl from outside the cluster failing, its most probably an LB issue.

Finally to check if the packets from outside are reaching upto what point? are they getting forwarded correctly to the correct node? Are the packets reaching the endpoint, we will need tcpdump captured at multiple hops.

Capturing tcpdump from multiple node’s interfaces at the same time;
$for iface in ens1 tun0 vxlan_sys_4789;do tcpdump -nn -i $iface -s0 -w /path/$HOSTNAME-$int.pcap & done

Happy learning…

Refernce links:

https://docs.openshift.com/container-platform/4.7/networking/ingress-operator.html

--

--

Kamlesh Prajapati

DevOps Practitioner (CKA certified , RHOCP Certified, Azure Certified on az-104,az-400,az-303.)