Architecture

k8s-ai-sre is a service-first assistant that combines HTTP ingestion, investigation orchestration, tool access, and guarded action execution.

Main Components

main.py: server entrypoint
app/http.py: investigation and webhook endpoints
app/investigate.py: orchestration flow
app/tools/k8s.py: Kubernetes and Prometheus read helpers
app/tools/actions.py: guarded write-action helpers
app/telegram.py: Telegram polling and command handling
app/stores/: SQLite-backed key-value stores (default path /tmp/k8s-ai-sre-store.sqlite3)
model_factory.py: model client configuration

End-to-End Flow

request enters from /investigate or /webhooks/alertmanager
investigation gathers cluster evidence through read tools
model returns findings and optionally proposes actions
incident and proposed actions are persisted
Telegram can notify and accept approve/reject commands
approved actions execute with namespace and action guardrails

Guarded Actions

Current actions:

delete-pod
rollout-restart
scale
rollout-undo

Guardrails currently enforced include:

explicit approval required before execution
namespace allow-list via WRITE_ALLOWED_NAMESPACES
deployment existence checks for scale and rollout-undo
non-negative replica checks for scale

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search