Skip to content

Integrations

OpenRemedy is a hub: alerts come in, notifications go out, and an HTTP/JSON API plus a real-time WebSocket layer let other systems read and act on what's happening. This page is the operator's map of those surfaces.

There are five integration directions:

  • Inbound — webhooks from monitoring tools (Prometheus, Grafana, Datadog, PagerDuty, custom).
  • Daemon — host-side agent that reports server state and runs scheduled monitors. Briefed here, full reference in Daemon → Install.
  • Outbound notifications — Slack, Teams, Discord, ServiceNow, Jira, generic webhook. Routed by the HookDispatcher with per-tenant rules.
  • Programmatic API — every dashboard surface backed by a documented REST endpoint, JWT or cookie auth.
  • Real-time — two WebSocket channels for incident-feed subscribers and live execution output.

The marketplace ships preset bundles (alert sources + recipes + plugin configs) that cover most of these in one click. The pages below are the full manual for when a bundle doesn't quite fit.


Inbound webhooks

All inbound alerts hit the same endpoint:

POST https://<your-domain>/api/v1/webhooks/alerts/<tenant-slug>
Content-Type: application/json
X-OpenRemedy-Signature: sha256=<hex digest>

{
  "hostname": "web-01",
  "incident_type": "disk_full",
  "severity": "high",
  "evidence": {
    "disk_usage_percent": 95,
    "mount": "/"
  }
}

Supported incident types: service_down, disk_full, cpu_high, memory_high, port_unavailable, custom. Supported severities: critical, high, medium, low, info. Evidence: any JSON object with relevant data — the classifier uses it to match a recipe.

Authentication: HMAC signatures

Every request must carry an HMAC-SHA256 signature of the raw body, keyed by the tenant's webhook_secret. Unsigned or wrongly-signed requests get 401 Unauthorized. The endpoint is rate-limited at 60 requests/min per source IP. The cryptographic detail lives in Security → Webhook authentication.

The signing recipe in three lines: fetch the tenant's webhook_secret from the dashboard (Settings → Webhooks); compute sha256=<hex> of the raw body, byte-exact, no re-serialisation; send it in X-OpenRemedy-Signature.

bash + openssl + curl

SECRET="your-tenant-webhook-secret"
BODY='{"hostname":"web-01","incident_type":"disk_full","severity":"high","evidence":{}}'
SIG=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')

curl -X POST https://app.openremedy.io/api/v1/webhooks/alerts/my-company \
  -H "Content-Type: application/json" \
  -H "X-OpenRemedy-Signature: sha256=$SIG" \
  -d "$BODY"

Python

import hashlib, hmac, json, requests

secret = "your-tenant-webhook-secret"
body = json.dumps({"hostname": "web-01", "incident_type": "disk_full", "severity": "high", "evidence": {}})
sig = hmac.new(secret.encode(), body.encode(), hashlib.sha256).hexdigest()

requests.post(
    "https://app.openremedy.io/api/v1/webhooks/alerts/my-company",
    data=body,  # NOT json= (would re-serialise and break the signature)
    headers={
        "Content-Type": "application/json",
        "X-OpenRemedy-Signature": f"sha256={sig}",
    },
)

Node.js

import crypto from "crypto";

const secret = "your-tenant-webhook-secret";
const body = JSON.stringify({ hostname: "web-01", incident_type: "disk_full" });
const sig = crypto.createHmac("sha256", secret).update(body).digest("hex");

await fetch("https://app.openremedy.io/api/v1/webhooks/alerts/my-company", {
  method: "POST",
  body,
  headers: {
    "Content-Type": "application/json",
    "X-OpenRemedy-Signature": `sha256=${sig}`,
  },
});

For senders that can't sign on the wire (Grafana's basic webhook, PagerDuty's generic format), use a sidecar adapter that takes the upstream payload, signs it, and forwards. The Alertmanager example below shows the pattern.

Prometheus + Alertmanager

Step 1 — alert rules

/etc/prometheus/rules/openremedy.yml:

groups:
  - name: openremedy
    rules:
      - alert: DiskFull
        expr: (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 90
        for: 5m
        labels:
          severity: high
          incident_type: disk_full
        annotations:
          hostname: "{{ $labels.instance }}"
          mount: "{{ $labels.mountpoint }}"
          usage_percent: "{{ $value }}"

      - alert: HighCPU
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: high
          incident_type: cpu_high
        annotations:
          hostname: "{{ $labels.instance }}"
          cpu_percent: "{{ $value }}"

      - alert: HighMemory
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
        for: 5m
        labels:
          severity: high
          incident_type: memory_high
        annotations:
          hostname: "{{ $labels.instance }}"
          memory_percent: "{{ $value }}"

      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
          incident_type: service_down
        annotations:
          hostname: "{{ $labels.instance }}"

      - alert: PortDown
        expr: probe_success == 0
        for: 2m
        labels:
          severity: high
          incident_type: port_unavailable
        annotations:
          hostname: "{{ $labels.instance }}"

Step 2 — Alertmanager receiver

/etc/alertmanager/alertmanager.yml:

route:
  receiver: openremedy
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: openremedy
    webhook_configs:
      - url: 'http://localhost:9095/alertmanager'
        send_resolved: true

Alertmanager points at the adapter, not at OpenRemedy directly, because Alertmanager cannot sign requests. The adapter signs and forwards.

Step 3 — adapter

#!/usr/bin/env python3
"""Alertmanager → OpenRemedy webhook adapter. Run as a sidecar.

Reads the tenant's HMAC secret from the env so it can sign every
forwarded request.
"""
import hashlib, hmac, json, os
from flask import Flask, request, jsonify
import requests

app = Flask(__name__)

OPENREMEDY_URL = "https://app.openremedy.io/api/v1/webhooks/alerts/my-company"
WEBHOOK_SECRET = os.environ["OPENREMEDY_WEBHOOK_SECRET"]


def _signed_post(url: str, payload: dict) -> requests.Response:
    body = json.dumps(payload).encode("utf-8")
    sig = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256).hexdigest()
    return requests.post(
        url,
        data=body,  # raw bytes — re-serialising would break the signature
        headers={
            "Content-Type": "application/json",
            "X-OpenRemedy-Signature": f"sha256={sig}",
        },
        timeout=10,
    )


@app.route("/alertmanager", methods=["POST"])
def handle():
    data = request.json
    for alert in data.get("alerts", []):
        labels = alert.get("labels", {})
        annotations = alert.get("annotations", {})
        payload = {
            "hostname": annotations.get("hostname", labels.get("instance", "unknown")).split(":")[0],
            "incident_type": labels.get("incident_type", "custom"),
            "severity": labels.get("severity", "medium"),
            "evidence": {
                "alertname": labels.get("alertname", ""),
                "status": alert.get("status", ""),
                **{k: v for k, v in annotations.items() if k != "hostname"},
            },
        }
        try:
            _signed_post(OPENREMEDY_URL, payload)
        except Exception as e:
            print(f"Failed: {e}")
    return jsonify({"status": "ok"})


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=9095)

Grafana

Grafana's webhook contact point can hit OpenRemedy via the same adapter pattern, or — if you can write a custom notification template — you can sign the body inside Grafana's templating layer. The simpler path is the Alertmanager-style adapter on localhost:9095.

Map Grafana labels to the OpenRemedy fields the classifier expects:

Field Source
hostname instance label of the alert
incident_type a dedicated incident_type label on the rule
severity a severity label on the rule

Datadog, PagerDuty, custom HTTP

Same pattern: a small adapter (Flask, FastAPI, Cloudflare Worker — any HTTP endpoint will do) that translates the upstream payload to the {hostname, incident_type, severity, evidence} shape and signs it. Translation tables for Datadog priorities (P1→critical, P2→high…) and PagerDuty urgencies live in the marketplace bundles.

Curl one-liner for arbitrary sources:

curl -X POST https://app.openremedy.io/api/v1/webhooks/alerts/my-company \
  -H "Content-Type: application/json" \
  -H "X-OpenRemedy-Signature: sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')" \
  -d "$BODY"

Evidence best-practices

The richer the evidence, the better the LLM classifier matches an incident to a recipe.

Type Useful evidence fields
disk_full disk_usage_percent, mount, largest_files
cpu_high cpu_percent, load_1m, cores, top_process
memory_high memory_percent, total_mb, used_mb, top_process
service_down service_name, service_active, error, exit_code
port_unavailable port, port_open, expected_service
custom Any relevant JSON — the LLM analyses freely

Daemon — host-side agent

The Go daemon (openremedy-client) lives on each managed Linux host. It registers with the platform, runs operator-defined monitors on a schedule, and posts heartbeats and evidence back through /daemon/v1/*. It is the only "always-on" path into the platform that doesn't require an external monitoring tool.

For the operator workflow — install URL, registration token, config schema, log paths — see:

Custom monitors use HMAC-signed commands and the agent_version >= 0.2.0 gate; the cryptographic detail is in Security → Daemon authentication.


Outbound notifications

Outbound notifications use the HookDispatcher (plugins/hooks.py). Pipeline events fire into the dispatcher, which loads the tenant's enabled plugin configs, evaluates each configured rule against the event, and runs matching plugins as fire-and-forget asyncio tasks with a 10-second timeout. Failures are logged into hook_events for the UI but never propagate back into the agent pipeline.

The legacy notify() shim (services/notifications.py) still exists for backwards compatibility. It maps old event names to the canonical hook namespace and dispatches through the same HookDispatcher. New code should call the dispatcher directly; nothing inside the platform writes new notify() call sites.

Event surface

Event When
incident.created A new incident was opened (any source).
incident.resolved Incident moved to resolved.
incident.escalated Incident moved to escalated (operator action required).
incident.cancelled Incident closed without remediation.
incident.comment_added A user (or agent) posted a comment.
recipe.proposed The agent proposed a remediation.
approval.required Trust × risk gate paused execution; a human must approve.
approval.resolved An operator approved or rejected the proposal.
execution.completed The recipe finished (success or failure).
stage.completed A pipeline stage (triage / diagnose / validate / execute / review) finished.
sla.breached An SLA timer crossed its threshold.
maintenance.scheduled / .approved / .started / .completed / .failed / .paused / .resumed Maintenance plan lifecycle.
maintenance.step.awaiting_approval A manual step is waiting on an operator.
agent.notification Free-form push from an agent tool.

Each plugin declares which subset of these it subscribes to via its config_schema()['rules']['events'] block.

Built-in plugins

Six plugins ship with the platform; tenants enable and configure them per-tenant from Settings → Plugins. Configuration is encrypted at rest using the same AES-256-GCM key that protects SSH credentials.

Plugin What it does Required config
discord Posts an embed to a Discord channel webhook. webhook_url, optional dashboard_url, optional default mention.
slack Posts a Block Kit message via incoming-webhook URL. webhook_url, optional dashboard_url.
teams Posts an Adaptive Card to a Microsoft Teams channel. webhook_url, optional dashboard_url.
servicenow Creates / updates incident (or another) table records via the Table API. instance_url, username, password, optional table (default incident).
jira Creates an issue in a project, attaches subsequent updates as comments. instance_url, email, api_token, project_key, optional default_issue_type.
webhook Posts the raw HookPayload JSON to an arbitrary URL. Useful when none of the above fit. url, optional HMAC secret (mirrors the inbound signing scheme).

Rules — fan-out and filtering

A plugin config carries a list of rules. Each rule has:

{
  "event": "incident.created",
  "conditions": [
    {"field": "severity", "operator": "in", "value": ["critical", "high"]},
    {"field": "incident_type", "operator": "eq", "value": "disk_full"}
  ],
  "action": "post_message",
  "params": {"channel_override": "#incidents-disk"}
}

When the dispatcher receives an event, it iterates every active rule. A rule matches if its event is the firing event and all its conditions evaluate true. Operators are eq, in, and contains; unknown operators fail closed (the rule does not fire). Resolvable fields include severity, incident_type, source, status, stage, tenant_id, and any extra.<key> path.

A plugin without any rules fires on every event it subscribes to — that's the "send everything to this Slack channel" mode. Tenants that want fan-out (separate Slack channels for separate severities) add multiple rules; the same plugin instance runs once per matching rule.

Failure model

Hooks are fire-and-forget. The dispatcher records every attempt in hook_events with status{success, failed, timeout}. A failed plugin does not retry automatically; the operator can re-fire from the UI. The 10-second per-call timeout is fixed at the dispatcher and not configurable per plugin — long-running ServiceNow / Jira POSTs that exceed it are logged as timeout and abandoned.


Programmatic API

Every page in the dashboard is backed by a documented REST endpoint at /api/v1/*. Programmatic clients authenticate with a JWT bearer token (CLI, daemon, sidecar adapters) or with the HttpOnly cookie the browser receives on /auth/login (SPA, SSR).

Get a token:

curl -X POST https://app.openremedy.io/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "ops@example.com", "password": "..."}'
# response includes an access_token field

Refresh tokens have a 30-day default lifetime; /api/v1/auth/refresh returns a fresh pair. Expiry is configurable via OREMEDY_ACCESS_TOKEN_EXPIRE_MINUTES and OREMEDY_REFRESH_TOKEN_EXPIRE_DAYS — see Configuration → Auth tuning.

Most-used endpoints

Endpoint Purpose
GET /api/v1/incidents List incidents (tenant-scoped, paginated).
GET /api/v1/incidents/{id} Full incident detail including timeline.
POST /api/v1/incidents/{id}/comments Add a comment (re-invokes the IncidentWatcher).
POST /api/v1/incidents/{id}/cancel Move an incident to cancelled.
GET /api/v1/servers Server inventory.
POST /api/v1/servers Register a server (operator path; the daemon uses /daemon/v1/register instead).
GET /api/v1/recipes Recipe catalogue.
GET /api/v1/executions/{id} Execution detail.
POST /api/v1/executions/{id}/approve / /reject Approval gate decisions.
GET /api/v1/maintenance/schedules Maintenance schedule list.
POST /api/v1/maintenance/schedules Create a maintenance schedule.
GET /api/v1/audit/logs Audit log query (tenant-scoped).
GET /api/v1/admin/dashboard Cross-tenant fleet stats (superadmin only).

The full machine-readable list is at https://app.openremedy.io/openapi.json. The dashboard ships an embedded Swagger / Stoplight viewer at https://app.openremedy.io/docs (auth required).

Tenant scoping

Every endpoint that touches tenant-owned data resolves current_user.tenant_id from the JWT and filters on it. Cross-tenant lookups respond with 404 (not 403) so a malicious caller can't probe for the existence of a foreign UUID. Superadmin is the only role that bypasses this; impersonation flips a superadmin's session into a target tenant and is fully audited.


Marketplace bundles

The marketplace (Settings → Marketplace) packages a set of recipes, alert sources, and plugin presets as a single installable bundle. Installing the "NGINX core ops" bundle, for example, adds the relevant Prometheus alert rules, the matching recipes for restart / reload / stale-config recovery, and a Slack plugin preset wired to fire on the related incident events. Bundles are the fastest way to wire up a new tenant; everything they install is also reachable through the manual surfaces above and can be edited freely after install.

For the bundle catalogue and authoring guide see Dashboard → Marketplace.


Real-time WebSockets

Two WebSocket channels stream live state out to the browser (and to any other client willing to speak the protocol).

/ws/incidents

A tenant-scoped firehose of incident lifecycle events. Every backend that mutates incident state — swarm/events.py, worker/notify.py, the proactive loops — publishes through Redis pub/sub on the incidents channel. The WebSocket handler reads the connection's JWT-bound tenant_id and drops every message that doesn't match it. Superadmin connections see all tenants.

/ws/executions/{execution_id}

Live output from a running execution — Ansible stdout, recipe progress, exit code. Before subscribing, the handler verifies that the connection's tenant owns the execution UUID; cross-tenant attempts close with policy-violation status.

Authentication

Both endpoints accept either:

  • the access_token cookie (browser default on a same-origin upgrade), or
  • the Sec-WebSocket-Protocol: bearer, <jwt> slot for non-browser clients.

URL query params are not supported because they leak into proxy access logs. Pre-handshake auth failures close the WS with policy-violation status — the cryptographic detail is in Security → WebSocket handshake.

Browser — minimal example

// The browser sends the access_token cookie automatically.
const ws = new WebSocket("wss://app.openremedy.io/ws/incidents");
ws.onmessage = (e) => {
  const event = JSON.parse(e.data);
  console.log(event.type, event.incident_id);
};

Non-browser — Python

import websockets

async with websockets.connect(
    "wss://app.openremedy.io/ws/incidents",
    subprotocols=["bearer", JWT_TOKEN],
) as ws:
    async for msg in ws:
        ...

See also

  • Daemon → Install — host-side agent setup.
  • Recipes — how the receiving end uses the alerts you forward.
  • Configuration — environment variables that govern auth, CORS, rate limits.
  • Security — full cryptographic posture of every surface this page touches.