Skip to main content
Version Next

Rollback

This guide describes the procedures to revert upgrades of TDP Kubernetes components, both via Helm and via ArgoCD. Rollback allows quickly restoring a previous version in case of failures or unexpected behavior after an upgrade.

Attention

Before performing any rollback, back up persistent data. Depending on the component, a rollback may cause data loss if schema migrations are incompatible with previous versions.

Data Backup before Rollback

Before starting the rollback, protect persistent data by creating PVC snapshots.

Identify Component PVCs

Terminal input
kubectl get pvc -n <namespace> -l app.Kubernetes.io/instance=<release>

Create Volume Snapshots

Terminal input
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: <release>-pre-rollback-snapshot
namespace: tdp-project
spec:
volumeSnapshotClassName: <snapshot-class>
source:
persistentVolumeClaimName: <pvc-name>
EOF

Verify Snapshot Status

Terminal input
kubectl get volumesnapshot -n <namespace>

Ensure the snapshot has readyToUse: true status before proceeding.

Rollback via Helm

Check Revision History

Use the helm history command to view all revisions of a release:

Terminal input
helm history <release> -n <namespace>

The output will display a table with columns: REVISION, UPDATED, STATUS, CHART, APP VERSION, and DESCRIPTION. Identify the revision you want to revert to.

Example output:

REVISION    UPDATED                     STATUS        CHART              APP VERSION    DESCRIPTION
1 2025-01-15 10:30:00 deployed tdp-kafka-1.0.0 3.5.1 Install complete
2 2025-02-10 14:00:00 superseded tdp-kafka-1.0.0 3.5.1 Upgrade complete
3 2025-03-01 09:15:00 deployed tdp-kafka-2.0.0 3.6.0 Upgrade complete

Rollback to a Specific Revision

To revert to a specific revision:

Terminal input
helm rollback <release> <revision> -n <namespace>

For example, to revert tdp-kafka to revision 2:

Terminal input
helm rollback tdp-kafka 2 -n <namespace>

Rollback to the Previous Revision

To revert to the immediately previous revision (without specifying the number):

Terminal input
helm rollback <release> -n <namespace>

Rollback with Pod Recreation

In some cases, it may be necessary to force recreation of all pods during rollback:

Terminal input
helm rollback <release> <revision> -n <namespace> --recreate-pods

Verify Rollback Result

After rollback, confirm the release was reverted successfully:

Terminal input
helm status <release> -n <namespace>
helm history <release> -n <namespace>

The last entry in the history should indicate STATUS: deployed with the description Rollback to <revision>.

Rollback via ArgoCD

Rollback via ArgoCD consists of reverting the Git repository state to the previous version of the manifests.

Revert the Manifest in Git

The recommended approach is to revert the commit that introduced the upgrade:

Terminal input
git log --oneline -10  # Identify the upgrade commit
git revert <commit-hash>
git push origin main

ArgoCD will detect the change in the repository and synchronize the cluster automatically (if automatic synchronization is enabled).

Force Synchronization after Reversion

If automatic synchronization is disabled, force synchronization manually:

Terminal input
argocd app sync tdp-kafka

Rollback via ArgoCD CLI

ArgoCD also allows reverting the last synchronization operation:

Terminal input
argocd app rollback tdp-kafka
Note

ArgoCD CLI rollback only reverts the cluster state. The Git repository is not changed, which may cause a new automatic sync restoring the undesired version. Always revert in Git as well to ensure consistency.

Verify State after Rollback

Terminal input
argocd app get tdp-kafka

The status should indicate Synced and Healthy after rollback completion.

Common Rollback Scenarios

Scenario 1: Pod Initialization Failure

Symptom: pods remain in CrashLoopBackOff or Error state after upgrade.

Diagnosis:

Terminal input
kubectl describe pod -n <namespace> -l app.Kubernetes.io/instance=<release>
kubectl logs -n <namespace> -l app.Kubernetes.io/instance=<release> --previous

Solution: rollback to the previous version:

Terminal input
helm rollback <release> -n <namespace>

Scenario 2: Database Schema Incompatibility

Symptom: schema migration errors in component logs.

Diagnosis:

Terminal input
kubectl logs -n <namespace> -l app.Kubernetes.io/instance=<release> | grep -i "migration\|schema\|database"

Solution:

  1. Restore the database backup (PVC snapshot)

  2. Perform release rollback:

    Terminal input
    helm rollback <release> -n <namespace>
  3. Verify the component initializes correctly with the restored database

Scenario 3: Inter-Component Connectivity Issues

Symptom: components cannot communicate after partial upgrade.

Diagnosis:

Terminal input
kubectl get endpoints -n <namespace>
kubectl get svc -n <namespace>

Solution: verify all dependent components were upgraded to compatible versions. If necessary, roll back all affected components.

Scenario 4: Performance Degradation

Symptom: elevated latency or excessive resource consumption after upgrade.

Diagnosis:

Terminal input
kubectl top pods -n <namespace> -l app.Kubernetes.io/instance=<release>
kubectl describe pod -n <namespace> -l app.Kubernetes.io/instance=<release>

Solution: check configuration differences between versions and adjust resources in values.yaml. If the problem persists, perform rollback.

Partial Rollback (Individual Component)

In scenarios where only one component has issues after a general upgrade, it is possible to roll back only that component.

Identify the Problematic Component

Terminal input
kubectl get pods -n <namespace> --field-selector=status.phase!=Running
helm list -n <namespace> --failed

Individual Rollback via Helm

Terminal input
helm rollback <problematic-release> -n <namespace>

Verify Compatibility

After partial rollback, verify the reverted component is compatible with the other updated versions:

Terminal input
kubectl logs -n <namespace> -l app.Kubernetes.io/instance=<problematic-release> --tail=50
kubectl get pods -n <namespace>
Attention

Partial rollback may cause incompatibilities between components if there are version dependencies. Consult the compatibility matrix before proceeding.

Rollback Best Practices

  1. Always back up persistent data before starting rollback
  2. Document the cause of the rollback for future reference
  3. Test in staging environment before upgrading production, reducing the need for rollbacks
  4. Monitor logs after rollback to confirm stability
  5. Keep Git consistent with the cluster state -- if using ArgoCD, always revert in Git
  6. Do not skip revisions -- in case of multiple rollbacks, revert one revision at a time