k8s-ai-sre Documentation

Investigate, propose, approve, execute

Start the product, trigger an incident, and review the approval loop without digging through the whole repo.

`k8s-ai-sre` is an AI-assisted Kubernetes incident investigator with guarded remediation. Use this page to choose the fastest path for your role.

Quick Start Run a local demo, trigger an investigation, and see the response shape in minutes. Deploy to Kubernetes Install with Helm or manifests, verify readiness, and keep rollback steps nearby. Validate the Full Loop Follow the test runbook for investigate, notify, approve, and execute flows.

Input channels HTTP API, Alertmanager webhooks, and Telegram approvals.

Guardrails Explicit approval, namespace allow-lists, and `kubectl auth can-i` checks.

Operator view Readable incident summary, action proposals, and audit-friendly status flows.

Investigates pods and deployments with real Kubernetes reads.
Collects evidence from object state, events, logs, and optional Prometheus queries.
Accepts manual investigations at /investigate and Alertmanager webhooks at /webhooks/alertmanager.
Stores incidents and pending actions in SQLite by default at /tmp/k8s-ai-sre-store.sqlite3.
Sends Telegram notifications and supports /incident, /status, /approve, and /reject.
Requires explicit approval before any remediation action executes.

If you only need one next step, start with Quick Start. It is now the first path on this site and the fastest way to validate the product.

This docs site must stay aligned with repository sources:

When these sources change, update matching docs pages in the same pull request.