Notifications¶

Hibernator can send real-time notifications when key events occur during the hibernation lifecycle — execution starting, success, failure, recovery attempts, and phase changes. Notifications are configured through the HibernateNotification custom resource and delivered to external systems like Slack and Telegram.

How notifications work¶

The notification system is decoupled from the reconciliation loop. When a lifecycle event fires, the controller submits a dispatch request to an async worker pool. This design ensures notifications never block or slow down the core hibernation logic — delivery is fire-and-forget from the controller's perspective.

Architecture Overview¶

graph TD
    subgraph "Control Plane"
        HP[HibernatePlan] --> |Events| C[Controller]
        C --> |Submit Dispatch| DQ[Dispatch Queue]
        DQ --> |Worker Pool| DN[Notification Dispatcher]
    end

    subgraph "External Systems"
        DN --> |HTTPS POST| Slack[Slack Webhook]
        DN --> |HTTPS POST| TG[Telegram Bot API]
        DN --> |HTTPS POST| WH[Generic Webhook]
    end

    HN[HibernateNotification] --> |Selector Match| C

Notifications are matched to plans via label selectors, the same pattern used by ScheduleException. A single HibernateNotification can target multiple plans, and a single plan can be matched by multiple notification resources.

Quick Start¶

Setting up notifications involves three main steps: creating a configuration Secret, defining the notification resource, and matching it to your plans.

1. Create the Sink Secret¶

Each sink reads its configuration from a Kubernetes Secret. The Secret must contain a key named config with a JSON object holding the sink-specific settings.

SlackTelegram

apiVersion: v1
kind: Secret
metadata:
  name: slack-webhook
  namespace: hibernator-system
type: Opaque
stringData:
  config: |
    {
      "webhook_url": "https://hooks.slack.com/services/T00/B00/xxxx"
    }

Optional advanced Slack config:

{
  "bot_token": "xoxb-...",
  "channel_id": "C1234567890",
  "format": "json",
  "block_layout": "default",
  "max_targets": 8,
  "additional_scopes": ["environment", "region"],
  "delivery_mode": "thread",
  "time_display": "slack_dynamic",
  "timezone": "Asia/Jakarta",
  "time_layout": "Mon, 02 Jan 2006 15:04:05 MST"
}

format: text (default): sends plain Slack text.
format: json: sends Slack blocks JSON.
if templateRef exists, template output is parsed as Slack JSON payload.
if no templateRef (or JSON parse fails), built-in preset layout is used.
block_layout: preset for JSON mode (default, compact, auto).
auto uses progress layout for ExecutionProgress and falls back to default for other events.
default and compact suppress non-terminal ExecutionProgress updates (Pending, Running) to reduce notification noise. Terminal updates (Completed, Failed, Aborted) are still sent.
max_targets: maximum target lines in preset JSON output.
additional_scopes: appends extra bottom scope metadata context.
defaults already include Account and Cluster.
supported values: environment (alias env), region, project, provider, connector, account, cluster.
time_display: controls context time rendering.
slack_dynamic (default): Slack renders date/time in each viewer's locale/timezone.
fixed: uses configured timezone + time_layout.
utc: uses UTC + time_layout.
delivery_mode: controls message grouping behavior.
channel (default): every event posts as standalone message (requires webhook_url).
thread: keeps a living root message per plan/cycle (updated on each delivered event) and appends every event as a thread reply, including Start (requires bot_token + channel_id).
- root message includes a static progress bar relative to total targets (for example [██░░░░░░░░] 1/10) so progression remains visible even though the root is updated in place.
- root status is monotonic per sink+plan+cycle+operation: once root reaches a terminal state (Success/Failure), late non-terminal events (ExecutionProgress, Recovery, PhaseChange, Start) do not downgrade the root back to In Progress; those events are still posted as thread replies.
- custom templates from templateRef are intentionally ignored in thread mode; Hibernator always uses built-in, opinionated thread layouts so root context and status progression stay consistent across updates and replies.
- controller logs include an info message when this happens: ignored custom template for Slack thread delivery mode; using built-in opinionated thread layout for consistent context.
- strongly recommended to include ExecutionProgress in onEvents for smooth root progression (Starting -> In Progress -> Completed/Failed). Without it, root updates only when subscribed events are emitted.
timezone: IANA timezone for fixed mode (for example Asia/Jakarta).
time_layout: Go time layout for fixed/utc modes.

apiVersion: v1
kind: Secret
metadata:
  name: telegram-bot
  namespace: hibernator-system
type: Opaque
stringData:
  config: |
    {
      "token": "123456:ABC-DEF",
      "chat_id": "-100123456789",
      "parse_mode": "HTML"
    }

2. Create the HibernateNotification¶

apiVersion: hibernator.ardikabs.com/v1alpha1
kind: HibernateNotification
metadata:
  name: prod-alerts
  namespace: hibernator-system
spec:
  selector:
    matchLabels:
      env: production
  onEvents:
    - Start
    - Success
    - Failure
  sinks:
    - name: slack-team
      type: slack
      secretRef:
        name: slack-webhook

3. Verify the Match¶

kubectl get hnotif prod-alerts -n hibernator-system

The Matched column shows how many plans currently match the label selector.

Supported Notification Sinks¶

Sink Type	Destination	Protocol	Authentication
`slack`	Slack Channel	HTTPS	Webhook URL
`telegram`	Telegram Chat/Channel	HTTPS	Bot Token
`webhook`	Generic HTTP Endpoint	HTTPS	Headers (Bearer/API Key)

Notification Events¶

Event	When It Fires	Use Case
Start	Right before hibernation or wakeup execution begins	"Heads up — resources going down"
Success	After all targets complete successfully	Confirmation that cycle finished
Failure	When retries are exhausted and plan enters Error phase	Alert on-call team
Recovery	Each time a retry attempt starts from Error	Track recovery progress
PhaseChange	On every phase transition	Audit trail (can be noisy)
ExecutionProgress	When an individual target's execution state changes (e.g., Pending→Running)	Track per-target progress in real time

Choosing Events

For most use cases, subscribing to Start, Success, and Failure provides good coverage. Add Recovery if you want visibility into retry attempts. Add ExecutionProgress to track individual target state transitions (e.g., when a runner Job starts or completes). Use PhaseChange only for audit logging — it fires on every transition and can generate significant volume.

Sink Configuration Reference¶

For the full configuration schema, Secret format, and built-in default templates for each sink type, see the Notification Sink Reference.

Multiple Sinks¶

A single HibernateNotification can deliver to multiple sinks simultaneously. Each sink gets its own Secret and optional template:

spec:
  selector:
    matchLabels:
      env: production
  onEvents:
    - Failure
    - Recovery
  sinks:
    - name: slack-oncall
      type: slack
      secretRef:
        name: slack-oncall-webhook
    - name: telegram-ops
      type: telegram
      secretRef:
        name: telegram-ops-bot

Multiple Notifications per Plan¶

Different teams can create separate HibernateNotification resources targeting the same plans with different event subscriptions:

# Team A: wants all events for audit
apiVersion: hibernator.ardikabs.com/v1alpha1
kind: HibernateNotification
metadata:
  name: audit-all-events
spec:
  selector:
    matchLabels:
      env: production
  onEvents: [Start, Success, Failure, Recovery, PhaseChange]
  sinks:
    - name: audit-slack
      type: slack
      secretRef:
        name: slack-audit-webhook
---
# Team B: only critical alerts
apiVersion: hibernator.ardikabs.com/v1alpha1
kind: HibernateNotification
metadata:
  name: oncall-alerts
spec:
  selector:
    matchLabels:
      tier: critical
  onEvents: [Failure]
  sinks:
    - name: oncall-telegram
      type: telegram
      secretRef:
        name: telegram-oncall-bot

Custom Templates¶

By default, each sink uses a built-in template that produces a well-formatted message with event indicators, plan details, phase, targets, and error information. To customize the message format, create a ConfigMap with a Go template and reference it via templateRef:

For Slack:

with format=text, your template should render plain text.
with format=json, your template should render a Slack JSON payload (typically with blocks, optionally with text).
when delivery_mode=thread, custom templates are not used; built-in opinionated thread layouts take precedence to preserve consistent context style on the root status card and all thread replies.

Step 1: Create the Template ConfigMap¶

apiVersion: v1
kind: ConfigMap
metadata:
  name: slack-templates
  namespace: hibernator-system
data:
  template.gotpl: |
    {{ if eq .Event "Failure" -}}
    :rotating_light: *ALERT: Hibernation Failed*
    {{ else if eq .Event "Success" -}}
    :tada: *Hibernation Completed*
    {{ else -}}
    :bell: *{{ .Event }}*
    {{ end -}}
    *Plan:* {{ .Plan.Name }} ({{ .Plan.Namespace }})
    *Phase:* {{ .Phase }}
    *Operation:* {{ .Operation | default "N/A" }}
    {{ if .ErrorMessage }}*Error:* {{ .ErrorMessage }}{{ end }}
    *Time:* {{ .Timestamp | date "2006-01-02 15:04:05 MST" }}

Slack JSON template example (format=json):

apiVersion: v1
kind: ConfigMap
metadata:
  name: slack-json-templates
  namespace: hibernator-system
data:
  template.gotpl: |
    {{- $fallback := printf "[%s] %s/%s phase=%s" .Event .Plan.Namespace .Plan.Name .Phase -}}
    {
      "text": {{ $fallback | toJson }},
      "blocks": [
        {
          "type": "header",
          "text": { "type": "plain_text", "text": "Hibernator Notification" }
        },
        {
          "type": "section",
          "text": {
            "type": "mrkdwn",
            "text": {{ (printf "*Event:* %s\\n*Plan:* `%s/%s`\\n*Phase:* `%s`" .Event .Plan.Namespace .Plan.Name .Phase) | toJson }}
          }
        }
      ]
    }

Step 2: Reference It in the Sink¶

sinks:
  - name: slack-custom
    type: slack
    secretRef:
      name: slack-webhook
    templateRef:
      name: slack-templates
      key: template.gotpl   # optional — defaults to "template.gotpl"

Template Context¶

The following fields are available in templates:

Field	Type	Description
`.Event`	string	`Start`, `Success`, `Failure`, `Recovery`, `PhaseChange`, or `ExecutionProgress`
`.Timestamp`	time.Time	When the event occurred
`.Phase`	string	Current plan phase (e.g., `Hibernating`, `Hibernated`, `Error`)
`.PreviousPhase`	string	Phase before the transition (empty on Start)
`.Operation`	string	`Hibernate` or `WakeUp`
`.Plan.Name`	string	HibernatePlan name
`.Plan.Namespace`	string	HibernatePlan namespace
`.Plan.Labels`	map	HibernatePlan labels
`.Plan.Annotations`	map	HibernatePlan annotations
`.CycleID`	string	Current execution cycle ID
`.Targets`	list (Target)	Per-target execution state (see below)
`.TargetExecution`	Target or nil	The specific target whose state just changed (`ExecutionProgress` only; nil for other events)
`.ErrorMessage`	string	Error details (Failure/Recovery only)
`.RetryCount`	int	Current retry attempt number
`.SinkName`	string	Name of the sink being dispatched to
`.SinkType`	string	Sink type (`slack`, `telegram`)

Target details:

Field	Description
`.Name`	Target name
`.Executor`	Executor type (e.g., `rds`, `eks`)
`.State`	Execution state (`Completed`, `Failed`)
`.ErrorMessage`	Error details for failed targets
`.Connector.Kind`	Connector type: `CloudProvider` or `K8SCluster`
`.Connector.Name`	Connector resource name
`.Connector.Provider`	Cloud provider (e.g., `aws`, `gcp`)
`.Connector.AccountID`	Cloud account ID (e.g., AWS Account ID)
`.Connector.ProjectID`	Cloud project ID (e.g., GCP Project ID)
`.Connector.Region`	Cloud region (e.g., `us-east-1`)
`.Connector.ClusterName`	Kubernetes cluster name (EKS/GKE)

Templates support Sprig template functions — the same function library used by Helm — including date, upper, lower, default, toJson, and many more.

Template Safety

Templates have a 1-second execution timeout to prevent infinite loops. If rendering fails for any reason (parse error, execution error, timeout), the system automatically falls back to a plain-text message containing the event, operation, plan name, phase, and error.

Observability¶

The notification dispatcher exposes Prometheus metrics for delivery success, errors, latency, and dropped messages. For the full list of notification metrics and example PromQL queries, see the Metrics Reference.

Troubleshooting¶

Notification Delivery Issues¶

Symptoms: Notifications are not delivered to Slack, Telegram, or Webhook endpoints.

General Checks: 1. Verify label matching: Ensure the HibernateNotification selector matches labels on your HibernatePlan. Verify with kubectl get hnotif -o wide — the Matched column should be > 0. 2. Verify the Secret exists and has the right key:

kubectl get secret <sink-secret> -n hibernator-system -o jsonpath='{.data.config}' | base64 -d

3. Check controller logs for dispatch errors:

kubectl logs -l app=hibernator-controller -n hibernator-system | grep notification

4. Check notification metrics for error counts:

curl -s http://localhost:8080/metrics | grep hibernator_notification

Slack Troubleshooting¶

Invalid Webhook URL: Ensure the webhook_url is correct and hasn't been revoked.
Channel Permissions: Ensure the webhook is authorized to post to the target channel.
Payload Too Large: If using very large custom templates, Slack might reject the request. Keep templates concise.

Telegram Troubleshooting¶

Bot Permissions: Ensure the bot has been added to the chat/channel and has "Post Messages" permissions.
Wrong Chat ID: Verify the chat_id. Private chats usually have positive IDs, while groups and channels have negative IDs starting with -100.
Parse Mode Errors: If a message contains reserved characters that aren't properly escaped (e.g., _ or * in MarkdownV2), Telegram will reject the message. Use escapeHTML (for HTML parse mode) or escapeMarkdown (for MarkdownV2 parse mode) in custom templates.
Bot Token Revoked: Verify the token is valid by calling https://api.telegram.org/bot<TOKEN>/getMe.

Webhook Troubleshooting¶

Unreachable Endpoint: Ensure the webhook URL is reachable from the controller pod.
Authentication Failure: Verify the headers (e.g., Authorization) are correctly configured in the Secret.
Timeouts: The dispatcher has a 10-second timeout for webhook requests. Ensure your endpoint responds promptly.

Template Rendering Issues¶

If custom templates produce unexpected output: - Remove templateRef temporarily to verify the built-in default works. - Test your template locally with go template syntax — remember Sprig functions are available. - Check controller logs for "template parse failed" or "template execution failed" messages. - Ensure your template uses the correct field names (e.g., .Plan.Name not .PlanName).

Rate Limiting¶

External services (Slack, Telegram) enforce rate limits. The notification system uses a retryable HTTP client with up to 3 retries and exponential backoff (500ms–5s). If you see persistent errors, consider:

Reducing the number of subscribed events (e.g., drop PhaseChange).
Consolidating sinks to reduce total request volume.
Using a single webhook endpoint that fans out internally.