Integrations¶

OpenRemedy is a hub: alerts come in, notifications go out, and an HTTP/JSON API plus a real-time WebSocket layer let other systems read and act on what's happening. This page is the operator's map of those surfaces.

There are five integration directions:

Inbound — webhooks from monitoring tools (Prometheus, Grafana, Datadog, PagerDuty, custom).
Daemon — host-side agent that reports server state and runs scheduled monitors. Briefed here, full reference in Daemon → Install.
Outbound notifications — Slack, Teams, Discord, ServiceNow, Jira, generic webhook. Routed by the HookDispatcher with per-tenant rules.
Programmatic API — every dashboard surface backed by a documented REST endpoint, JWT or cookie auth.
Real-time — two WebSocket channels for incident-feed subscribers and live execution output.

The marketplace ships preset bundles (alert sources + recipes + plugin configs) that cover most of these in one click. The pages below are the full manual for when a bundle doesn't quite fit.

Inbound webhooks¶

All inbound alerts hit the same endpoint:

POST https://<your-domain>/api/v1/webhooks/alerts/<tenant-slug>
Content-Type: application/json
X-OpenRemedy-Signature: sha256=<hex digest>

{
  "hostname": "web-01",
  "incident_type": "disk_full",
  "severity": "high",
  "evidence": {
    "disk_usage_percent": 95,
    "mount": "/"
  }
}

Supported incident types: service_down, disk_full, cpu_high, memory_high, port_unavailable, custom. Supported severities: critical, high, medium, low, info. Evidence: any JSON object with relevant data — the classifier uses it to match a recipe.

Authentication: HMAC signatures¶

Every request must carry an HMAC-SHA256 signature of the raw body, keyed by the tenant's webhook_secret. Unsigned or wrongly-signed requests get 401 Unauthorized. The endpoint is rate-limited at 60 requests/min per source IP. The cryptographic detail lives in Security → Webhook authentication.

The signing recipe in three lines: fetch the tenant's webhook_secret from the dashboard (Settings → Webhooks); compute sha256=<hex> of the raw body, byte-exact, no re-serialisation; send it in X-OpenRemedy-Signature.

bash + openssl + curl¶

SECRET="your-tenant-webhook-secret"
BODY='{"hostname":"web-01","incident_type":"disk_full","severity":"high","evidence":{}}'
SIG=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')

curl -X POST https://app.openremedy.io/api/v1/webhooks/alerts/my-company \
  -H "Content-Type: application/json" \
  -H "X-OpenRemedy-Signature: sha256=$SIG" \
  -d "$BODY"

Python¶

import hashlib, hmac, json, requests

secret = "your-tenant-webhook-secret"
body = json.dumps({"hostname": "web-01", "incident_type": "disk_full", "severity": "high", "evidence": {}})
sig = hmac.new(secret.encode(), body.encode(), hashlib.sha256).hexdigest()

requests.post(
    "https://app.openremedy.io/api/v1/webhooks/alerts/my-company",
    data=body,  # NOT json= (would re-serialise and break the signature)
    headers={
        "Content-Type": "application/json",
        "X-OpenRemedy-Signature": f"sha256={sig}",
    },
)

Node.js¶

import crypto from "crypto";

const secret = "your-tenant-webhook-secret";
const body = JSON.stringify({ hostname: "web-01", incident_type: "disk_full" });
const sig = crypto.createHmac("sha256", secret).update(body).digest("hex");

await fetch("https://app.openremedy.io/api/v1/webhooks/alerts/my-company", {
  method: "POST",
  body,
  headers: {
    "Content-Type": "application/json",
    "X-OpenRemedy-Signature": `sha256=${sig}`,
  },
});

For senders that can't sign on the wire (Grafana's basic webhook, PagerDuty's generic format), use a sidecar adapter that takes the upstream payload, signs it, and forwards. The Alertmanager example below shows the pattern.

Prometheus + Alertmanager¶

Step 1 — alert rules¶

/etc/prometheus/rules/openremedy.yml:

groups:
  - name: openremedy
    rules:
      - alert: DiskFull
        expr: (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 90
        for: 5m
        labels:
          severity: high
          incident_type: disk_full
        annotations:
          hostname: "{{ $labels.instance }}"
          mount: "{{ $labels.mountpoint }}"
          usage_percent: "{{ $value }}"

      - alert: HighCPU
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: high
          incident_type: cpu_high
        annotations:
          hostname: "{{ $labels.instance }}"
          cpu_percent: "{{ $value }}"

      - alert: HighMemory
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
        for: 5m
        labels:
          severity: high
          incident_type: memory_high
        annotations:
          hostname: "{{ $labels.instance }}"
          memory_percent: "{{ $value }}"

      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
          incident_type: service_down
        annotations:
          hostname: "{{ $labels.instance }}"

      - alert: PortDown
        expr: probe_success == 0
        for: 2m
        labels:
          severity: high
          incident_type: port_unavailable
        annotations:
          hostname: "{{ $labels.instance }}"

Step 2 — Alertmanager receiver¶

/etc/alertmanager/alertmanager.yml:

route:
  receiver: openremedy
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: openremedy
    webhook_configs:
      - url: 'http://localhost:9095/alertmanager'
        send_resolved: true

Alertmanager points at the adapter, not at OpenRemedy directly, because Alertmanager cannot sign requests. The adapter signs and forwards.

Step 3 — adapter¶

#!/usr/bin/env python3
"""Alertmanager → OpenRemedy webhook adapter. Run as a sidecar.

Reads the tenant's HMAC secret from the env so it can sign every
forwarded request.
"""
import hashlib, hmac, json, os
from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

OPENREMEDY_URL = "https://app.openremedy.io/api/v1/webhooks/alerts/my-company"
WEBHOOK_SECRET = os.environ["OPENREMEDY_WEBHOOK_SECRET"]


def _signed_post(url: str, payload: dict) -> requests.Response:
    body = json.dumps(payload).encode("utf-8")
    sig = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256).hexdigest()
    return requests.post(
        url,
        data=body,  # raw bytes — re-serialising would break the signature
        headers={
            "Content-Type": "application/json",
            "X-OpenRemedy-Signature": f"sha256={sig}",
        },
        timeout=10,
    )


@app.route("/alertmanager", methods=["POST"])
def handle():
    data = request.json
    for alert in data.get("alerts", []):
        labels = alert.get("labels", {})
        annotations = alert.get("annotations", {})
        payload = {
            "hostname": annotations.get("hostname", labels.get("instance", "unknown")).split(":")[0],
            "incident_type": labels.get("incident_type", "custom"),
            "severity": labels.get("severity", "medium"),
            "evidence": {
                "alertname": labels.get("alertname", ""),
                "status": alert.get("status", ""),
                **{k: v for k, v in annotations.items() if k != "hostname"},
            },
        }
        try:
            _signed_post(OPENREMEDY_URL, payload)
        except Exception as e:
            print(f"Failed: {e}")
    return jsonify({"status": "ok"})


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=9095)

Grafana¶

Grafana's webhook contact point can hit OpenRemedy via the same adapter pattern, or — if you can write a custom notification template — you can sign the body inside Grafana's templating layer. The simpler path is the Alertmanager-style adapter on localhost:9095.

Map Grafana labels to the OpenRemedy fields the classifier expects:

Field	Source
`hostname`	`instance` label of the alert
`incident_type`	a dedicated `incident_type` label on the rule
`severity`	a `severity` label on the rule

Datadog, PagerDuty, custom HTTP¶

Same pattern: a small adapter (Flask, FastAPI, Cloudflare Worker — any HTTP endpoint will do) that translates the upstream payload to the {hostname, incident_type, severity, evidence} shape and signs it. Translation tables for Datadog priorities (P1→critical, P2→high…) and PagerDuty urgencies live in the marketplace bundles.

Curl one-liner for arbitrary sources:

curl -X POST https://app.openremedy.io/api/v1/webhooks/alerts/my-company \
  -H "Content-Type: application/json" \
  -H "X-OpenRemedy-Signature: sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')" \
  -d "$BODY"

Evidence best-practices¶

The richer the evidence, the better the LLM classifier matches an incident to a recipe.

Type	Useful evidence fields
`disk_full`	`disk_usage_percent`, `mount`, `largest_files`
`cpu_high`	`cpu_percent`, `load_1m`, `cores`, `top_process`
`memory_high`	`memory_percent`, `total_mb`, `used_mb`, `top_process`
`service_down`	`service_name`, `service_active`, `error`, `exit_code`
`port_unavailable`	`port`, `port_open`, `expected_service`
`custom`	Any relevant JSON — the LLM analyses freely

Daemon — host-side agent¶

The Go daemon (openremedy-client) lives on each managed Linux host. It registers with the platform, runs operator-defined monitors on a schedule, and posts heartbeats and evidence back through /daemon/v1/*. It is the only "always-on" path into the platform that doesn't require an external monitoring tool.

For the operator workflow — install URL, registration token, config schema, log paths — see:

Custom monitors use HMAC-signed commands and the agent_version >= 0.2.0 gate; the cryptographic detail is in Security → Daemon authentication.

Outbound notifications¶

Outbound notifications use the HookDispatcher (plugins/hooks.py). Pipeline events fire into the dispatcher, which loads the tenant's enabled plugin configs, evaluates each configured rule against the event, and runs matching plugins as fire-and-forget asyncio tasks with a 10-second timeout. Failures are logged into hook_events for the UI but never propagate back into the agent pipeline.

The legacy notify() shim (services/notifications.py) still exists for backwards compatibility. It maps old event names to the canonical hook namespace and dispatches through the same HookDispatcher. New code should call the dispatcher directly; nothing inside the platform writes new notify() call sites.

Event surface¶

Event	When
`incident.created`	A new incident was opened (any source).
`incident.resolved`	Incident moved to `resolved`.
`incident.escalated`	Incident moved to `escalated` (operator action required).
`incident.cancelled`	Incident closed without remediation.
`incident.comment_added`	A user (or agent) posted a comment.
`recipe.proposed`	The agent proposed a remediation.
`approval.required`	Trust × risk gate paused execution; a human must approve.
`approval.resolved`	An operator approved or rejected the proposal.
`execution.completed`	The recipe finished (success or failure).
`stage.completed`	A pipeline stage (triage / diagnose / validate / execute / review) finished.
`sla.breached`	An SLA timer crossed its threshold.
`maintenance.scheduled` / `.approved` / `.started` / `.completed` / `.failed` / `.paused` / `.resumed`	Maintenance plan lifecycle.
`maintenance.step.awaiting_approval`	A `manual` step is waiting on an operator.
`agent.notification`	Free-form push from an agent tool.

Each plugin declares which subset of these it subscribes to via its config_schema()['rules']['events'] block.

Built-in plugins¶

Six plugins ship with the platform; tenants enable and configure them per-tenant from Settings → Plugins. Configuration is encrypted at rest using the same AES-256-GCM key that protects SSH credentials.

Plugin	What it does	Required config
`discord`	Posts an embed to a Discord channel webhook.	`webhook_url`, optional `dashboard_url`, optional default `mention`.
`slack`	Posts a Block Kit message via incoming-webhook URL.	`webhook_url`, optional `dashboard_url`.
`teams`	Posts an Adaptive Card to a Microsoft Teams channel.	`webhook_url`, optional `dashboard_url`.
`servicenow`	Creates / updates `incident` (or another) table records via the Table API.	`instance_url`, `username`, `password`, optional `table` (default `incident`).
`jira`	Creates an issue in a project, attaches subsequent updates as comments.	`instance_url`, `email`, `api_token`, `project_key`, optional `default_issue_type`.
`webhook`	Posts the raw `HookPayload` JSON to an arbitrary URL. Useful when none of the above fit.	`url`, optional HMAC `secret` (mirrors the inbound signing scheme).

Rules — fan-out and filtering¶

A plugin config carries a list of rules. Each rule has:

{
  "event": "incident.created",
  "conditions": [
    {"field": "severity", "operator": "in", "value": ["critical", "high"]},
    {"field": "incident_type", "operator": "eq", "value": "disk_full"}
  ],
  "action": "post_message",
  "params": {"channel_override": "#incidents-disk"}
}

When the dispatcher receives an event, it iterates every active rule. A rule matches if its event is the firing event and all its conditions evaluate true. Operators are eq, in, and contains; unknown operators fail closed (the rule does not fire). Resolvable fields include severity, incident_type, source, status, stage, tenant_id, and any extra.<key> path.

A plugin without any rules fires on every event it subscribes to — that's the "send everything to this Slack channel" mode. Tenants that want fan-out (separate Slack channels for separate severities) add multiple rules; the same plugin instance runs once per matching rule.

Failure model¶

Hooks are fire-and-forget. The dispatcher records every attempt in hook_events with status ∈ {success, failed, timeout}. A failed plugin does not retry automatically; the operator can re-fire from the UI. The 10-second per-call timeout is fixed at the dispatcher and not configurable per plugin — long-running ServiceNow / Jira POSTs that exceed it are logged as timeout and abandoned.

Programmatic API¶

Every page in the dashboard is backed by a documented REST endpoint at /api/v1/*. Programmatic clients authenticate with a JWT bearer token (CLI, daemon, sidecar adapters) or with the HttpOnly cookie the browser receives on /auth/login (SPA, SSR).

Get a token:

curl -X POST https://app.openremedy.io/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "ops@example.com", "password": "..."}'
# response includes an access_token field

Refresh tokens have a 30-day default lifetime; /api/v1/auth/refresh returns a fresh pair. Expiry is configurable via OREMEDY_ACCESS_TOKEN_EXPIRE_MINUTES and OREMEDY_REFRESH_TOKEN_EXPIRE_DAYS — see Configuration → Auth tuning.

Most-used endpoints¶

Endpoint	Purpose
`GET /api/v1/incidents`	List incidents (tenant-scoped, paginated).
`GET /api/v1/incidents/{id}`	Full incident detail including timeline.
`POST /api/v1/incidents/{id}/comments`	Add a comment (re-invokes the IncidentWatcher).
`POST /api/v1/incidents/{id}/cancel`	Move an incident to `cancelled`.
`GET /api/v1/servers`	Server inventory.
`POST /api/v1/servers`	Register a server (operator path; the daemon uses `/daemon/v1/register` instead).
`GET /api/v1/recipes`	Recipe catalogue.
`GET /api/v1/executions/{id}`	Execution detail.
`POST /api/v1/executions/{id}/approve` / `/reject`	Approval gate decisions.
`GET /api/v1/maintenance/schedules`	Maintenance schedule list.
`POST /api/v1/maintenance/schedules`	Create a maintenance schedule.
`GET /api/v1/audit/logs`	Audit log query (tenant-scoped).
`GET /api/v1/admin/dashboard`	Cross-tenant fleet stats (superadmin only).

The full machine-readable list is at https://app.openremedy.io/openapi.json. The dashboard ships an embedded Swagger / Stoplight viewer at https://app.openremedy.io/docs (auth required).

Tenant scoping¶

Every endpoint that touches tenant-owned data resolves current_user.tenant_id from the JWT and filters on it. Cross-tenant lookups respond with 404 (not 403) so a malicious caller can't probe for the existence of a foreign UUID. Superadmin is the only role that bypasses this; impersonation flips a superadmin's session into a target tenant and is fully audited.

Marketplace bundles¶

The marketplace (Settings → Marketplace) packages a set of recipes, alert sources, and plugin presets as a single installable bundle. Installing the "NGINX core ops" bundle, for example, adds the relevant Prometheus alert rules, the matching recipes for restart / reload / stale-config recovery, and a Slack plugin preset wired to fire on the related incident events. Bundles are the fastest way to wire up a new tenant; everything they install is also reachable through the manual surfaces above and can be edited freely after install.

For the bundle catalogue and authoring guide see Dashboard → Marketplace.

Real-time WebSockets¶

Two WebSocket channels stream live state out to the browser (and to any other client willing to speak the protocol).

`/ws/incidents`¶

A tenant-scoped firehose of incident lifecycle events. Every backend that mutates incident state — swarm/events.py, worker/notify.py, the proactive loops — publishes through Redis pub/sub on the incidents channel. The WebSocket handler reads the connection's JWT-bound tenant_id and drops every message that doesn't match it. Superadmin connections see all tenants.

`/ws/executions/{execution_id}`¶

Live output from a running execution — Ansible stdout, recipe progress, exit code. Before subscribing, the handler verifies that the connection's tenant owns the execution UUID; cross-tenant attempts close with policy-violation status.

Authentication¶

Both endpoints accept either:

the access_token cookie (browser default on a same-origin upgrade), or
the Sec-WebSocket-Protocol: bearer, <jwt> slot for non-browser clients.

URL query params are not supported because they leak into proxy access logs. Pre-handshake auth failures close the WS with policy-violation status — the cryptographic detail is in Security → WebSocket handshake.

Browser — minimal example¶

// The browser sends the access_token cookie automatically.
const ws = new WebSocket("wss://app.openremedy.io/ws/incidents");
ws.onmessage = (e) => {
  const event = JSON.parse(e.data);
  console.log(event.type, event.incident_id);
};

Non-browser — Python¶

import websockets

async with websockets.connect(
    "wss://app.openremedy.io/ws/incidents",
    subprotocols=["bearer", JWT_TOKEN],
) as ws:
    async for msg in ws:
        ...