Architecture
k8s-ai-sre is a service-first assistant that combines HTTP ingestion, investigation orchestration, tool access, and guarded action execution.
Main Components
main.py: server entrypointapp/http.py: investigation and webhook endpointsapp/investigate.py: orchestration flowapp/tools/k8s.py: Kubernetes and Prometheus read helpersapp/tools/actions.py: guarded write-action helpersapp/telegram.py: Telegram polling and command handlingapp/stores/: SQLite-backed key-value stores (default path/tmp/k8s-ai-sre-store.sqlite3)model_factory.py: model client configuration
End-to-End Flow
- request enters from
/investigateor/webhooks/alertmanager - investigation gathers cluster evidence through read tools
- model returns findings and optionally proposes actions
- incident and proposed actions are persisted
- Telegram can notify and accept approve/reject commands
- approved actions execute with namespace and action guardrails
Guarded Actions
Current actions:
delete-podrollout-restartscalerollout-undo
Guardrails currently enforced include:
- explicit approval required before execution
- namespace allow-list via
WRITE_ALLOWED_NAMESPACES - deployment existence checks for
scaleandrollout-undo - non-negative replica checks for
scale