OpenShift 4.x Upgrade….

Kamlesh Prajapati
6 min readOct 22, 2021

--

This blog aims to help cluster administrators plan out their upgrades to their OpenShift fleet and communicate best practices to control OpenShift’s automated operations.

OpenShift 4.x has the following features

  1. No downtime for applications during an upgrade. Applications should be using all Kubernetes best practices for maintaining high availability.
  2. Ability to always roll forward if any bugs are encountered. Kubernetes API migrations are not always reversible and vary from component to component. OpenShift components are forward and backward compatible within a range of versions.
  3. Pause on any blocking errors without impacting the cluster, if possible. All components monitor their health, and most errors do not impact functionality. This allows an admin to do remediation or reconfigure into an acceptable configuration.
  4. Fully manage everything, from the OS up to the cluster control plane and cluster add-ons.
  5. All installations behave the same for all Day 2 operations. Clusters that are correctly user-provisioned or installer-provisioned can be upgraded in the exact same manner.

What to Expect When Upgrading OpenShift 4.x

When you combine the goals outlined for no downtime for your apps, and all of the automation baked into the Operators, the upgrade experience boils down to a single button. Cluster admins have the ability to choose when to upgrade, and if there are multiple versions available, which one they would like to upgrade to.

Fig: 01

Upgrading an OpenShift cluster via the Console

The cluster understands the best version for you to upgrade to and presents that in the Console. You can also drive this via API across multiple clusters or integrate it into automation tools you already use. Red Hat understands that there is an element of trust that must be earned over time. Overall, we have seen a very high participation rate, and customers are successful upgrading with ease and stability.

Earlier we covered how the Operators that manage monitoring, logging, registry, and others each work on a desired state loop, which allows the cluster admin to configure and manage these features. Upgrading the entire cluster itself also works on a desired state loop. When you change the desired version in the Console, you are just manipulating a single field on a Kubernetes object. It looks like this:

$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.1 True False 2d21h Cluster version is 4.6.1
$ oc get clusterversion version -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
name: version
spec:
channel: fast-4.6
clusterID: abc-123-abc-123
desiredUpdate:
force: false
image: quay.io/openshift-release-dev/ocp-release@sha256:d78292...c2b9
version: 4.6.1
upstream: https://api.openshift.com/api/upgrades_info/v1/graph

UPGRADING THE CONTROL PLANE

While that is easy by design, I do not want to minimize what is happening under the hood. First, OpenShift’s Cluster Version Operator (CVO) is protecting you by only offering to upgrade between versions of OpenShift that are validated and known to be high quality at the time of upgrade. When you choose to start the upgrade, new versions of all of the cluster Operators are downloaded, and their signatures checked.

Second, CVO orchestrates desired state changes to each of the Operators in a specific order, with constant health checking along the way. The control plane and etcd are upgraded first, then the OS and config of the Nodes that run the control plane, and finally, the rest of the cluster Operators. Upgrading a 3 Node Control Plane usually takes about an hour, and containers are downloaded, components are reconfigured, and Nodes are rebooted. You can watch all of this happen live in the Console(administrator view). Upgrade times for the Control Plane are directly proportional to the number of Nodes.

During this time, you should still have availability of the Kubernetes API, the etcd database, and cluster ingress and routing. All of these components are highly available. Once the Control Plane is upgraded, you will see the process be marked as complete.

for few operators you might notice that they degraded so dont worry on that as they are going to be fine in most of time.

UPGRADING WORKER NODES ACROSS NODE POOLS

Worker Nodes are upgraded after the Control Plane has finished upgrading. These do not block the cluster’s upgrade process, because Nodes may come and go as autoscaling takes place, and factors outside of the cluster’s control can slow down or even block roll out of your worker nodes.

This is a good thing, of course: PodDisruptionBudgets, affinity and anti-affinity rules, resource limits, Readiness and Liveness probes, and other Kubernetes best practices keep your applications highly available and resilient to the upgrade process.

Fig: 02

During the upgrade, cluster Operators upgrade in order and will appear as mixed versions.

Each of your configured Node Pools have a maximum number of machines that are allowed to be unavailable at the same time. The Machine Configuration Operator (MCO) will use this value to upgrade and reboot all of the Worker Nodes as quickly as possible, using signals from the workloads. During this time, Pods will be scheduled onto other Nodes as needed. Node Pools are helpful to configure certain hardware (for example, GPUs) correctly, but also to slow down (or speed up) upgrades for certain classes of your applications.

Connected Clusters Get Smarter Over Time

Connected OpenShift clusters get smarter over time, for several reasons:

Upgrade more smoothly: Connected clusters can fetch the latest upgrade graph to get them on the best path at time of update. From time to time, specific upgrade paths are blocked while investigation takes place or bugs are fixed, and connected clusters are routed around these issues.

Fresh catalog of content: Operators, Helm charts and other certified content are regularly refreshed with new bug fixes and security patches. Connected clusters can consume this as soon as it is released.

Fig: 03

Connected clusters are given the latest set of happy paths from Red Hat

OpenShift 4.x Provides More Control Through Channels

When you install or upgrade to a y-stream of OpenShift, you have the choice of “channels” to attach that cluster to. There are three channels that you can choose from: candidate, fast, and stable.

SWITCH CHANNELS TO UPGRADE TO A NEW Version

The trigger to switch to a new y-stream of OpenShift is to switch your channel. Later versions of your current y-stream will automatically enable these new channels in your cluster.

Naturally, you have a choice to stay on the same channel, or you can switch to a different one when you upgrade. Once you make your choice, the cluster will check to see if there is an upgrade path from your current version to one on the new channel, and the upgrade button will appear.

Fig: 04

Upgrading by switching your channel in the OpenShift Console.

Disconnected Clusters Have a Similar Experience But Require More Curation by Administrators

Finally, we get to the topic of disconnected clusters. This guide covered all of the parts of the connected cluster experience in order to contrast it with the disconnected/restricted network cluster.

OpenShift’s Operator model, with its desired state loop automation, means that once all of the containers and metadata are loaded into the container registry behind your firewall, you will get the same failure recovery and upgrade experience as a connected cluster.

Because all of the containers need to be moved behind your firewall, a cluster admin is taking on the responsibility of deciding what versions to upgrade from and to. This process consists of parsing the current update graph at the time you start mirroring, and understanding which outstanding bugs may impact you and which will not. If you have lengthy multiday approvals, container scanning, and other security controls, a better upgrade path may exist for you.

Fig: 05

Above Image Comparing the disconnected upgrade process to a connected cluster.

Hope this will help those want to understand the OpenShift upgrade flow.

Happy learning….

--

--

Kamlesh Prajapati
Kamlesh Prajapati

Written by Kamlesh Prajapati

DevSecOps Practitioner (CKA certified , RHOCP Certified, Azure Certified on az-104,az-400,az-303.), AIOps , Machine Learning and Deep learning

No responses yet