Skip to content

conallob/outalator

Repository files navigation

Outalator

A modern web application for tracking outage troubleshooting notes for Site Reliability Engineers (SREs), inspired by Outalator from the Google SRE book.

Features

  • Outage Tracking: Create and manage outages with comprehensive details
  • Multi-Service Alerts: Import alerts from multiple oncall notification services
    • PagerDuty
    • OpsGenie
    • Extensible architecture for additional services
  • Note-Taking: Add plaintext or markdown notes to outages
  • Tagging System: Organize outages with flexible key-value tags (e.g., Jira tickets, services, regions)
  • Modular Storage: Interface-based storage layer with PostgreSQL implementation
  • RESTful API: Clean HTTP API for all operations
  • Slack Bot Integration: Interact with outages directly from Slack
    • Create outages and add notes via messages
    • Tag messages with emoji reactions to add them as notes
  • MCP Server: Model Context Protocol interface for AI assistants
    • Claude Desktop integration
    • Natural language outage management

Architecture

The application is built with Go and follows clean architecture principles:

  • Domain Layer: Core business entities (Outage, Alert, Note, Tag)
  • Storage Layer: Pluggable storage interface with PostgreSQL implementation
  • Notification Layer: Extensible notification service integrations
  • Service Layer: Business logic and orchestration
  • API Layer: HTTP handlers and routing

Prerequisites

  • Go 1.21 or higher
  • PostgreSQL 12 or higher
  • (Optional) PagerDuty API key
  • (Optional) OpsGenie API key
  • (Optional) Slack workspace with bot permissions
  • (Optional) Claude Desktop for MCP integration

Installation

  1. Clone the repository:
git clone https://github.com/conall/outalator.git
cd outalator
  1. Install Go dependencies:
go mod download
  1. Set up PostgreSQL:
# Create database and user
createdb outalator
createuser outalator

# Run migrations
psql -U outalator -d outalator -f migrations/001_initial_schema.sql
  1. Configure the application:
cp config.example.yaml config.yaml
# Edit config.yaml with your settings
  1. Build and run:
go build -o outalator cmd/outalator/main.go
./outalator -config config.yaml

Or run directly:

go run cmd/outalator/main.go -config config.yaml

Importing Historical Data

Outalator includes a tool to bootstrap your database with historical incidents from PagerDuty or OpsGenie.

Quick Start

# Build the import tool
make build-import

# List available teams
./bin/import-history -service pagerduty -list-teams

# Import historical data (dry run first)
./bin/import-history -service pagerduty -since 2024-01-01T00:00:00Z -dry-run

# Import for real
./bin/import-history -service pagerduty -since 2024-01-01T00:00:00Z

# Import specific teams only
./bin/import-history -service pagerduty -since 2024-01-01T00:00:00Z -teams "TEAM_ID_1,TEAM_ID_2"

For complete documentation including examples, troubleshooting, and best practices, see docs/IMPORT_HISTORY.md.

Configuration

Configuration can be provided via YAML file and/or environment variables.

Configuration File (config.yaml)

server:
  host: 0.0.0.0
  port: 8080

database:
  host: localhost
  port: 5432
  user: outalator
  password: outalator
  dbname: outalator
  sslmode: disable

# Optional: PagerDuty integration
pagerduty:
  api_key: your-pagerduty-api-key

# Optional: OpsGenie integration
opsgenie:
  api_key: your-opsgenie-api-key

Environment Variables

Environment variables override config file values:

  • SERVER_HOST - Server host
  • SERVER_PORT - Server port
  • DB_HOST - Database host
  • DB_PORT - Database port
  • DB_USER - Database user
  • DB_PASSWORD - Database password
  • DB_NAME - Database name
  • PAGERDUTY_API_KEY - PagerDuty API key
  • OPSGENIE_API_KEY - OpsGenie API key

API Documentation

Outages

Create Outage

POST /api/v1/outages
Content-Type: application/json

{
  "title": "API Gateway Outage",
  "description": "Users unable to access API endpoints",
  "severity": "high",
  "alert_ids": ["PXYZ123"],
  "tags": [
    {"key": "jira", "value": "OPS-1234"},
    {"key": "service", "value": "api-gateway"}
  ]
}

List Outages

GET /api/v1/outages?limit=50&offset=0

Get Outage

GET /api/v1/outages/{id}

Update Outage

PATCH /api/v1/outages/{id}
Content-Type: application/json

{
  "status": "resolved",
  "severity": "medium"
}

Notes

Add Note to Outage

POST /api/v1/outages/{id}/notes
Content-Type: application/json

{
  "content": "Identified root cause: database connection pool exhaustion",
  "format": "markdown",
  "author": "[email protected]"
}

Tags

Add Tag to Outage

POST /api/v1/outages/{id}/tags
Content-Type: application/json

{
  "key": "jira",
  "value": "OPS-5678"
}

Search Outages by Tag

GET /api/v1/tags/search?key=jira&value=OPS-1234

Alerts

Import Alert

POST /api/v1/alerts/import
Content-Type: application/json

{
  "source": "pagerduty",
  "external_id": "PXYZ123",
  "outage_id": "optional-uuid-to-associate"
}

Health Check

GET /health

Authentication

Outalator supports OIDC authentication with providers like Okta, Auth0, Google, etc. When authentication is enabled, all notes are automatically tagged with the authenticated user's email address.

Configuring Authentication

Add to your config.yaml:

auth:
  enabled: true
  issuer: https://your-company.okta.com
  client_id: your-okta-client-id
  client_secret: your-okta-client-secret
  redirect_url: http://localhost:8080/auth/callback
  session_key: generate-a-random-32-byte-base64-key

Generate a session key:

openssl rand -base64 32

When authentication is disabled, the application runs without authentication (useful for development).

Slack Bot Integration

Outalator includes a Slack bot that allows teams to interact with outages directly from Slack.

Features

  • Create outages using simple text commands
  • Add notes to outages via direct messages
  • Tag existing Slack messages to add them as notes using emoji reactions

Quick Start

  1. Create a Slack app and get your bot token and signing secret
  2. Configure Outalator using your preferred method:

Config file:

slack:
  enabled: true
  bot_token: xoxb-your-bot-token
  signing_secret: your-signing-secret
  reaction_emoji: outage_note  # Any emoji name without colons

CLI flags:

./outalator -slack-enabled -slack-bot-token=xoxb-... -slack-reaction-emoji=bookmark

Environment variables:

export SLACK_ENABLED=true SLACK_BOT_TOKEN=xoxb-... SLACK_REACTION_EMOJI=memo
  1. Set up event subscriptions in Slack to point to https://your-server.com/slack/events

Usage Examples

Create an outage:

outage API Gateway is down | Users cannot authenticate | critical

Add a note:

note 123e4567-e89b-12d3-a456-426614174000 Restarted the API gateway service

Tag a message:

  1. Post a message mentioning the outage ID
  2. React with your configured emoji (e.g., :outage_note:, :bookmark:, etc.)
  3. The message is automatically added as a note

For complete setup instructions and troubleshooting, see docs/SLACK_INTEGRATION.md.

MCP Server for AI Assistants

The MCP (Model Context Protocol) server provides a standardized interface for AI assistants like Claude to interact with outages.

Running the MCP Server

# Build the MCP server
go build -o mcp-server ./cmd/mcp-server

# Run it
./mcp-server -config config.yaml

Claude Desktop Integration

Add to your Claude Desktop configuration:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "outalator": {
      "command": "/path/to/outalator/mcp-server",
      "args": ["-config", "/path/to/config.yaml"]
    }
  }
}

Available Tools

  • list_outages: List all outages with pagination
  • get_outage: Get details of a specific outage
  • create_outage: Create a new outage entry
  • add_note: Add a note to an existing outage
  • update_outage: Update an outage's status or severity

Example AI Interactions

  • "What outages have we had in the past week?"
  • "Create an outage for the database connection issues"
  • "Add a note that we restarted the Redis cluster"
  • "Update outage 123... to resolved status"

For complete documentation, see docs/MCP_SERVER.md.

Kubernetes Deployment

Outalator can be deployed to Kubernetes using either Helm or Kustomize.

Quick Start with Helm

# Create namespace
kubectl create namespace outalator

# Install with Helm
helm install outalator ./helm/outalator \
  --namespace outalator \
  --set ingress.hosts[0].host=outalator.example.com \
  --set config.auth.issuer=https://your-company.okta.com \
  --set secrets.auth.clientId=your-client-id \
  --set secrets.auth.clientSecret=your-client-secret

Quick Start with Kustomize

# Update configuration in k8s/base/configmap.yaml and k8s/base/secret.yaml
# Then apply
kubectl apply -k k8s/overlays/prod

For complete Kubernetes deployment documentation including:

  • Detailed Helm configuration
  • Kustomize overlays
  • Okta/OIDC setup
  • Security best practices
  • Monitoring and troubleshooting

See docs/KUBERNETES.md

Building the Container Image

docker build -t outalator:latest .
docker tag outalator:latest your-registry/outalator:v1.0.0
docker push your-registry/outalator:v1.0.0

Database Schema

The application uses PostgreSQL with the following tables:

  • outages: Main outage tracking table
  • alerts: Imported alerts from notification services
  • notes: Troubleshooting notes attached to outages (with author attribution)
  • tags: Key-value metadata tags

See migrations/001_initial_schema.sql for the complete schema.

Extending the Application

Adding a New Notification Service

  1. Create a new package under internal/notification/

  2. Implement the notification.Service interface:

    • Name() string
    • FetchAlert(ctx, alertID) (*Alert, error)
    • FetchRecentAlerts(ctx, since) ([]*Alert, error)
    • WebhookHandler() interface{}
  3. Register the service in cmd/outalator/main.go

Adding a New Storage Backend

  1. Create a new package under internal/storage/
  2. Implement the storage.Storage interface
  3. Update cmd/outalator/main.go to use the new storage

Development

Running Tests

go test ./...

Project Structure

outalator/
├── cmd/
│   ├── outalator/          # Main application entry point
│   ├── mcp-server/         # MCP server for AI assistants
│   └── import-history/     # Historical data import tool
├── internal/
│   ├── api/                # HTTP handlers and routes
│   ├── config/             # Configuration management
│   ├── domain/             # Domain models and DTOs
│   ├── mcp/                # MCP server implementation
│   ├── slack/              # Slack bot integration
│   ├── notification/       # Notification service integrations
│   │   ├── opsgenie/
│   │   └── pagerduty/
│   ├── service/            # Business logic
│   └── storage/            # Storage layer
│       └── postgres/       # PostgreSQL implementation
├── docs/                   # Documentation
│   ├── SLACK_INTEGRATION.md
│   └── MCP_SERVER.md
├── migrations/             # Database migration scripts
├── config.example.yaml     # Example configuration file
├── go.mod                  # Go module definition
└── README.md              # This file

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

MIT License - See LICENSE file for details

References

About

A tool for tracking troubleshooting notes for SREs

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •