Common Issues and Solutions

This page covers the most common issues you might encounter and how to fix them.

Deployment Issues

🔴 Build Failed in GitHub Actions

Symptoms: - Red X on your commit - Build job failed in Actions tab

Common Causes & Solutions:

1. Dockerfile Syntax Error

# Check the error in GitHub Actions
gh run view --log | grep -A5 -B5 "error"

Fix: Review and test your Dockerfile locally:

docker build -t test .

2. Missing Files in Build Context

Error: COPY failed: file not found

Fix: Ensure all files exist and aren't in .dockerignore:

# Make sure these files exist
COPY package.json ./
COPY . .

3. Out of Disk Space

Error: no space left on device

Fix: Use multi-stage builds to reduce image size:

FROM node:20-alpine AS builder
# Build stage
FROM node:20-alpine AS runner
# Only copy necessary files

🔴 Deploy Workflow Not Triggering

Symptoms: - Build succeeds but no deployment - No auto-deploy workflow run

Check:

# Check if deploy job ran
gh run view <run-id> --job=<job-id>

Common Causes:

Missing DEPLOY_TOKEN

gh secret list --repo dine-together/your-repo

Wrong branch
Deploy only triggers on main branch
Workflow syntax error
Check .github/workflows/deploy.yml

🔴 Image Pull Error (403 Forbidden)

Symptoms:

Failed to pull image "ghcr.io/dine-together/app:latest": 
unexpected status: 403 Forbidden

Solutions:

Check GHCR Secret in Kubernetes:

kubectl get secret ghcr-secret -n test-staging

Recreate the secret:

kubectl delete secret ghcr-secret -n test-staging
kubectl create secret docker-registry ghcr-secret \
  --docker-server=ghcr.io \
  --docker-username=YOUR_GITHUB_USERNAME \
  --docker-password=YOUR_DEPLOY_TOKEN \
  --namespace=test-staging

Verify image exists:

docker pull ghcr.io/dine-together/your-app:latest

🔴 502 Bad Gateway

Symptoms: - Site returns 502 error - Traefik can't reach the service

Debug Steps:

Check if pods are running:
```
kubectl get pods -n test-staging
```
Check service endpoints:
```
kubectl get endpoints -n test-staging
```

Check port configuration:

kubectl get svc your-app -n test-staging -o yaml | grep -A5 "ports:"

Common Fix: Wrong target port

# Fix service port
kubectl patch svc your-app -n test-staging --type='json' \
  -p='[{"op": "replace", "path": "/spec/ports/0/targetPort", "value": 3000}]'

Application Issues

🔴 Application Crashes on Startup

Symptoms: - Pod in CrashLoopBackOff state - Continuous restarts

Debug:

# Check pod status
kubectl describe pod <pod-name> -n test-staging

# Check logs
kubectl logs <pod-name> -n test-staging --previous

Common Causes:

Missing Environment Variables

# Add to docker-compose.yml
environment:
  - DATABASE_URL=postgresql://...
  - REQUIRED_VAR=value

Port Binding Issues

// Bind to 0.0.0.0, not localhost
app.listen(3000, '0.0.0.0');

Memory Limits

# Increase memory in docker-compose.yml
deploy:
  resources:
    limits:
      memory: 1G

🔴 Database Connection Failed

Symptoms: - ECONNREFUSED or Connection refused - Unknown host errors

Solutions:

Use Kubernetes service names:

# Wrong
DATABASE_URL=postgresql://localhost:5432/db

# Correct
DATABASE_URL=postgresql://postgres:5432/db

Check if database is running:

kubectl get pods -n test-staging | grep postgres

Verify network connectivity:

kubectl exec -it <app-pod> -n test-staging -- nc -zv postgres 5432

🔴 SSL Certificate Issues

Symptoms: - Browser shows certificate warning - NET::ERR_CERT_AUTHORITY_INVALID

Understanding: - Test environment uses self-signed certificates - This is normal and expected

For production: - Ensure DNS points to server - cert-manager will get Let's Encrypt certificate

Debugging Tools

View Pod Logs

# Current logs
kubectl logs -n test-staging deployment/your-app

# Follow logs
kubectl logs -n test-staging deployment/your-app -f

# Previous container logs (after crash)
kubectl logs -n test-staging <pod-name> --previous

Execute Commands in Pod

# Open shell in pod
kubectl exec -it <pod-name> -n test-staging -- /bin/sh

# Run specific command
kubectl exec <pod-name> -n test-staging -- ls -la

Check Resource Usage

# Pod resource usage
kubectl top pods -n test-staging

# Node resource usage
kubectl top nodes

Port Forwarding for Debugging

# Forward local port to pod
kubectl port-forward -n test-staging pod/<pod-name> 8080:3000

# Access at http://localhost:8080

Quick Fixes

Restart Deployment

kubectl rollout restart deployment/your-app -n test-staging

Force Pull Latest Image

kubectl set image deployment/your-app \
  your-app=ghcr.io/dine-together/your-app:latest \
  -n test-staging

Delete and Recreate Pod

kubectl delete pod <pod-name> -n test-staging
# Deployment will create a new one

Update Service Port

kubectl patch svc your-app -n test-staging \
  --type='json' -p='[{"op": "replace", "path": "/spec/ports/0/targetPort", "value": 3000}]'

Prevention Tips

Always test locally first:
```
docker-compose up
```

Use health checks:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:3000/health"]

Set resource limits:

deploy:
  resources:
    limits:
      memory: 512M

Use specific image tags:

image: ghcr.io/dine-together/app:v1.0.0  # Not just :latest

Monitor logs during deployment:

kubectl logs -n test-staging -f deployment/your-app

Still Stuck?

Check the FAQ
Search GitHub Issues
Create a new issue with:
Error messages
kubectl describe pod output
docker-compose.yml content
GitHub Actions logs