Rollback Procedures
Overview
When a deployment goes wrong, you need to quickly restore service. This guide covers rollback strategies for the DineTogether infrastructure.
Quick Rollback Commands
Immediate Rollback (Last Known Good)
# Rollback to previous version
kubectl rollout undo deployment/myapp -n test-staging
# Check rollback status
kubectl rollout status deployment/myapp -n test-staging
Rollback to Specific Version
# View rollout history
kubectl rollout history deployment/myapp -n test-staging
# Rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=5 -n test-staging
Finding the Right Version
Check Deployment History
# List all revisions
kubectl rollout history deployment/myapp -n test-staging
# View specific revision details
kubectl rollout history deployment/myapp -n test-staging --revision=5
Find Working Image Tags
# List recent images from GitHub
gh api /orgs/dine-together/packages/container/myapp/versions \
--jq '.[0:10] | .[] | {tag: .metadata.container.tags[], created: .created_at}'
# Check what's currently running
kubectl get deployment myapp -n test-staging -o jsonpath='{.spec.template.spec.containers[0].image}'
Rollback Strategies
Strategy 1: Kubernetes Native Rollback
Best for: Recent deployments, configuration changes
# Rollback deployment
kubectl rollout undo deployment/myapp -n test-staging
# Verify pods are updating
kubectl get pods -n test-staging -w
Strategy 2: Manual Image Update
Best for: Specific version needed, cross-environment rollback
# Update to specific image
kubectl set image deployment/myapp \
myapp=ghcr.io/dine-together/myapp:abc123def \
-n test-staging
# Force restart with new image
kubectl rollout restart deployment/myapp -n test-staging
Strategy 3: Git Revert
Best for: Complex changes, maintaining history
# Revert the problematic commit
git revert HEAD
git push origin main
# This triggers new deployment with reverted code
Strategy 4: Emergency Replace
Best for: Corrupted deployment, complete failure
# Delete current deployment
kubectl delete deployment myapp -n test-staging
# Apply known good configuration
kubectl apply -f backup/myapp-deployment.yaml -n test-staging
Pre-Rollback Checklist
-
Identify the Issue
-
Capture Current State
-
Notify Team
- Alert about rollback
- Document issue
- Create incident ticket
Rollback Scenarios
Scenario 1: Application Won't Start
Symptoms: CrashLoopBackOff, restart loops
# Quick rollback
kubectl rollout undo deployment/myapp -n test-staging
# If that doesn't work, try previous image
kubectl set image deployment/myapp \
myapp=ghcr.io/dine-together/myapp:previous-sha \
-n test-staging
Scenario 2: Bad Configuration
Symptoms: Wrong environment variables, missing secrets
# Rollback deployment (includes ConfigMap)
kubectl rollout undo deployment/myapp -n test-staging
# Or update specific config
kubectl set env deployment/myapp \
API_URL=https://api.test.dinetogether.co.uk \
-n test-staging
Scenario 3: Performance Issues
Symptoms: High CPU/memory, slow responses
# Rollback first
kubectl rollout undo deployment/myapp -n test-staging
# Then investigate
kubectl top pods -n test-staging
kubectl describe pod <pod-name> -n test-staging
Scenario 4: Breaking Changes
Symptoms: API incompatibility, frontend/backend mismatch
# Rollback both services
kubectl rollout undo deployment/frontend -n test-staging
kubectl rollout undo deployment/backend -n test-staging
# Ensure compatible versions
Verification Steps
1. Check Rollback Progress
# Watch rollout status
kubectl rollout status deployment/myapp -n test-staging -w
# Monitor pods
kubectl get pods -n test-staging -w -l app=myapp
2. Verify Application Health
# Check endpoints
kubectl get endpoints myapp -n test-staging
# Test internally
kubectl run test --rm -it --image=busybox -- wget -O- http://myapp
# Check external access
curl https://myapp.test.dinetogether.co.uk/health
3. Monitor Logs
# Stream logs
kubectl logs -f deployment/myapp -n test-staging
# Check for errors
kubectl logs deployment/myapp -n test-staging | grep -i error
Preventing Future Issues
1. Backup Configurations
# Before deployment
kubectl get deployment myapp -n test-staging -o yaml > backups/myapp-$(date +%Y%m%d).yaml
2. Test in Staging First
# Use separate environments
namespaces:
- test-staging # Test here first
- test-production # Then deploy here
3. Gradual Rollouts
# In docker-compose.yml
deploy:
replicas: 3
update_config:
parallelism: 1 # Update one at a time
delay: 10s # Wait between updates
4. Health Checks
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
Emergency Procedures
Complete Service Failure
-
Switch to maintenance mode
-
Restore from backup
-
Verify restoration
Database Corruption
-
Stop application
-
Restore database
-
Restart application
Post-Rollback Actions
- Document the Incident
- What went wrong
- Rollback steps taken
- Time to recovery
-
Root cause
-
Update Monitoring
- Add alerts for the issue
- Update health checks
-
Review dashboards
-
Fix Forward
- Create fix branch
- Test thoroughly
-
Deploy with confidence
-
Update Runbooks
- Document new procedures
- Update emergency contacts
- Review rollback process
Rollback Commands Reference
# Basic rollback
kubectl rollout undo deployment/myapp -n test-staging
# Specific revision
kubectl rollout undo deployment/myapp --to-revision=5 -n test-staging
# Update image
kubectl set image deployment/myapp myapp=ghcr.io/dine-together/myapp:tag -n test-staging
# Scale to zero (stop)
kubectl scale deployment/myapp --replicas=0 -n test-staging
# Scale back up
kubectl scale deployment/myapp --replicas=3 -n test-staging
# Delete and recreate
kubectl delete deployment myapp -n test-staging
kubectl apply -f myapp-deployment.yaml -n test-staging
# Emergency restart
kubectl rollout restart deployment/myapp -n test-staging
Next Steps
- Review Common Issues
- Set up Monitoring
- Create Backup Strategy
- Update Emergency Contacts