Skip to content

Common Issues and Solutions

This page covers the most common issues you might encounter and how to fix them.

Deployment Issues

🔴 Build Failed in GitHub Actions

Symptoms: - Red X on your commit - Build job failed in Actions tab

Common Causes & Solutions:

1. Dockerfile Syntax Error

# Check the error in GitHub Actions
gh run view --log | grep -A5 -B5 "error"

Fix: Review and test your Dockerfile locally:

docker build -t test .

2. Missing Files in Build Context

Error: COPY failed: file not found

Fix: Ensure all files exist and aren't in .dockerignore:

# Make sure these files exist
COPY package.json ./
COPY . .

3. Out of Disk Space

Error: no space left on device

Fix: Use multi-stage builds to reduce image size:

FROM node:20-alpine AS builder
# Build stage
FROM node:20-alpine AS runner
# Only copy necessary files

🔴 Deploy Workflow Not Triggering

Symptoms: - Build succeeds but no deployment - No auto-deploy workflow run

Check:

# Check if deploy job ran
gh run view <run-id> --job=<job-id>

Common Causes:

  1. Missing DEPLOY_TOKEN

    gh secret list --repo dine-together/your-repo
    

  2. Wrong branch

  3. Deploy only triggers on main branch

  4. Workflow syntax error

  5. Check .github/workflows/deploy.yml

🔴 Image Pull Error (403 Forbidden)

Symptoms:

Failed to pull image "ghcr.io/dine-together/app:latest": 
unexpected status: 403 Forbidden

Solutions:

  1. Check GHCR Secret in Kubernetes:

    kubectl get secret ghcr-secret -n test-staging
    

  2. Recreate the secret:

    kubectl delete secret ghcr-secret -n test-staging
    kubectl create secret docker-registry ghcr-secret \
      --docker-server=ghcr.io \
      --docker-username=YOUR_GITHUB_USERNAME \
      --docker-password=YOUR_DEPLOY_TOKEN \
      --namespace=test-staging
    

  3. Verify image exists:

    docker pull ghcr.io/dine-together/your-app:latest
    

🔴 502 Bad Gateway

Symptoms: - Site returns 502 error - Traefik can't reach the service

Debug Steps:

  1. Check if pods are running:

    kubectl get pods -n test-staging
    

  2. Check service endpoints:

    kubectl get endpoints -n test-staging
    

  3. Check port configuration:

    kubectl get svc your-app -n test-staging -o yaml | grep -A5 "ports:"
    

Common Fix: Wrong target port

# Fix service port
kubectl patch svc your-app -n test-staging --type='json' \
  -p='[{"op": "replace", "path": "/spec/ports/0/targetPort", "value": 3000}]'

Application Issues

🔴 Application Crashes on Startup

Symptoms: - Pod in CrashLoopBackOff state - Continuous restarts

Debug:

# Check pod status
kubectl describe pod <pod-name> -n test-staging

# Check logs
kubectl logs <pod-name> -n test-staging --previous

Common Causes:

  1. Missing Environment Variables

    # Add to docker-compose.yml
    environment:
      - DATABASE_URL=postgresql://...
      - REQUIRED_VAR=value
    

  2. Port Binding Issues

    // Bind to 0.0.0.0, not localhost
    app.listen(3000, '0.0.0.0');
    

  3. Memory Limits

    # Increase memory in docker-compose.yml
    deploy:
      resources:
        limits:
          memory: 1G
    

🔴 Database Connection Failed

Symptoms: - ECONNREFUSED or Connection refused - Unknown host errors

Solutions:

  1. Use Kubernetes service names:

    # Wrong
    DATABASE_URL=postgresql://localhost:5432/db
    
    # Correct
    DATABASE_URL=postgresql://postgres:5432/db
    

  2. Check if database is running:

    kubectl get pods -n test-staging | grep postgres
    

  3. Verify network connectivity:

    kubectl exec -it <app-pod> -n test-staging -- nc -zv postgres 5432
    

🔴 SSL Certificate Issues

Symptoms: - Browser shows certificate warning - NET::ERR_CERT_AUTHORITY_INVALID

Understanding: - Test environment uses self-signed certificates - This is normal and expected

For production: - Ensure DNS points to server - cert-manager will get Let's Encrypt certificate

Debugging Tools

View Pod Logs

# Current logs
kubectl logs -n test-staging deployment/your-app

# Follow logs
kubectl logs -n test-staging deployment/your-app -f

# Previous container logs (after crash)
kubectl logs -n test-staging <pod-name> --previous

Execute Commands in Pod

# Open shell in pod
kubectl exec -it <pod-name> -n test-staging -- /bin/sh

# Run specific command
kubectl exec <pod-name> -n test-staging -- ls -la

Check Resource Usage

# Pod resource usage
kubectl top pods -n test-staging

# Node resource usage
kubectl top nodes

Port Forwarding for Debugging

# Forward local port to pod
kubectl port-forward -n test-staging pod/<pod-name> 8080:3000

# Access at http://localhost:8080

Quick Fixes

Restart Deployment

kubectl rollout restart deployment/your-app -n test-staging

Force Pull Latest Image

kubectl set image deployment/your-app \
  your-app=ghcr.io/dine-together/your-app:latest \
  -n test-staging

Delete and Recreate Pod

kubectl delete pod <pod-name> -n test-staging
# Deployment will create a new one

Update Service Port

kubectl patch svc your-app -n test-staging \
  --type='json' -p='[{"op": "replace", "path": "/spec/ports/0/targetPort", "value": 3000}]'

Prevention Tips

  1. Always test locally first:

    docker-compose up
    

  2. Use health checks:

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
    

  3. Set resource limits:

    deploy:
      resources:
        limits:
          memory: 512M
    

  4. Use specific image tags:

    image: ghcr.io/dine-together/app:v1.0.0  # Not just :latest
    

  5. Monitor logs during deployment:

    kubectl logs -n test-staging -f deployment/your-app
    

Still Stuck?

  1. Check the FAQ
  2. Search GitHub Issues
  3. Create a new issue with:
  4. Error messages
  5. kubectl describe pod output
  6. docker-compose.yml content
  7. GitHub Actions logs