Notification Troubleshooting¶

Symptoms: Notifications are not delivered to Slack, Telegram, or Webhook endpoints.

Check:

Verify the HibernateNotification selector matches your plan labels:
```
kubectl get hnotif -o wide
```

Check the Secret for the sink exists and is valid:

kubectl get secret <sink-secret> -n hibernator-system -o jsonpath='{.data.config}' | base64 -d

Check controller logs for notification errors:

kubectl logs -l app=hibernator-controller -n hibernator-system | grep notification

Check notification metrics for error counts:

curl -s http://localhost:8080/metrics | grep hibernator_notification

Troubleshooting¶

Common issues and their solutions.

Symptoms: Plan stays in Active phase past the expected hibernation time.

Check:

Verify timezone configuration:

kubectl get hibernateplan <name> -n hibernator-system \
  -o jsonpath='{.spec.schedule.timezone}'

Verify off-hours windows are correct:

kubectl get hibernateplan <name> -n hibernator-system \
  -o jsonpath='{.spec.schedule.offHours}' | jq

Check if a suspend exception is active:

kubectl get scheduleexception -n hibernator-system \
  -l hibernator.ardikabs.com/plan=<name>

Ensure spec.suspend is not true:

kubectl get hibernateplan <name> -n hibernator-system \
  -o jsonpath='{.spec.suspend}'

Check controller logs:

kubectl logs -n hibernator-system -l app=hibernator-controller --tail=100

Symptoms: Plan transitions to Error or targets show Failed state.

Check:

Find the failed Job:

kubectl get jobs -n hibernator-system -l hibernator.ardikabs.com/plan=<name>

View pod logs:

kubectl logs job/<job-name> -n hibernator-system

Check executor-specific parameters:

kubectl get hibernateplan <name> -n hibernator-system \
  -o jsonpath='{.spec.targets}' | jq

Verify connector credentials:

kubectl get cloudprovider <connector-name> -n hibernator-system \
  -o jsonpath='{.status}'

Symptoms: Wakeup fails because restore metadata is not found.

Check:

Verify the ConfigMap exists:

kubectl get configmap restore-data-<plan-name> -n hibernator-system

Check the ConfigMap content:

kubectl get configmap restore-data-<plan-name> -n hibernator-system -o yaml

Ensure the ConfigMap was not garbage-collected (check runner pod logs from the shutdown cycle)

Symptoms: Runner fails with AccessDenied or Unauthorized.

Check:

Verify ServiceAccount exists and has IRSA annotation:

kubectl get sa -n hibernator-system -o yaml | grep eks.amazonaws.com

Test IAM role assumption:

# From a pod with the same ServiceAccount
aws sts get-caller-identity

Check the CloudProvider assume role ARN:

kubectl get cloudprovider <name> -n hibernator-system \
  -o jsonpath='{.spec.aws.assumeRoleArn}'

Verify RBAC permissions:

kubectl auth can-i create jobs -n hibernator-system \
  --as=system:serviceaccount:hibernator-system:hibernator-controller

Symptoms: Plan doesn't transition to Hibernated or Active after Jobs complete.

Check:

Look for zombie Jobs:

kubectl get jobs -n hibernator-system -l hibernator.ardikabs.com/plan=<name> \
  --field-selector status.successful=0

Check if any targets are still in Running state:

kubectl get hibernateplan <name> -n hibernator-system \
  -o jsonpath='{.status.executions}' | jq '.[] | select(.state == "Running")'

Check controller logs for errors during status update:

kubectl logs -n hibernator-system -l app=hibernator-controller \
  --tail=200 | grep -i error

If the issue persists: