Notification Troubleshooting¶
Symptoms: Notifications are not delivered to Slack, Telegram, or Webhook endpoints.
Check:
- Verify the
HibernateNotificationselector matches your plan labels: - Check the Secret for the sink exists and is valid:
- Check controller logs for notification errors:
- For webhooks, check the HTTP status code and response body in the logs.
- Check notification metrics for error counts:
Troubleshooting¶
Common issues and their solutions.
Schedule Not Triggering¶
Symptoms: Plan stays in Active phase past the expected hibernation time.
Check:
-
Verify timezone configuration:
-
Verify off-hours windows are correct:
-
Check if a
suspendexception is active: -
Ensure
spec.suspendis nottrue: -
Check controller logs:
Runner Job Failing¶
Symptoms: Plan transitions to Error or targets show Failed state.
Check:
-
Find the failed Job:
-
View pod logs:
-
Check executor-specific parameters:
-
Verify connector credentials:
Restore Data Missing¶
Symptoms: Wakeup fails because restore metadata is not found.
Check:
-
Verify the ConfigMap exists:
-
Check the ConfigMap content:
-
Ensure the ConfigMap was not garbage-collected (check runner pod logs from the shutdown cycle)
Authentication Errors¶
Symptoms: Runner fails with AccessDenied or Unauthorized.
Check:
-
Verify ServiceAccount exists and has IRSA annotation:
-
Test IAM role assumption:
-
Check the CloudProvider assume role ARN:
-
Verify RBAC permissions:
Plan Stuck in Hibernating/WakingUp¶
Symptoms: Plan doesn't transition to Hibernated or Active after Jobs complete.
Check:
-
Look for zombie Jobs:
-
Check if any targets are still in
Runningstate: -
Check controller logs for errors during status update:
Getting Help¶
If the issue persists:
- Collect plan status:
kubectl get hibernateplan <name> -o yaml - Collect controller logs:
kubectl logs -l app=hibernator-controller --tail=500 - Collect runner logs (if applicable)
- Open a GitHub issue with the collected information