Integrations¶
OpenRemedy is a hub: alerts come in, notifications go out, and an HTTP/JSON API plus a real-time WebSocket layer let other systems read and act on what's happening. This page is the operator's map of those surfaces.
There are five integration directions:
- Inbound — webhooks from monitoring tools (Prometheus, Grafana, Datadog, PagerDuty, custom).
- Daemon — host-side agent that reports server state and runs scheduled monitors. Briefed here, full reference in Daemon → Install.
- Outbound notifications — Slack, Teams, Discord, ServiceNow, Jira, generic webhook. Routed by the HookDispatcher with per-tenant rules.
- Programmatic API — every dashboard surface backed by a documented REST endpoint, JWT or cookie auth.
- Real-time — two WebSocket channels for incident-feed subscribers and live execution output.
The marketplace ships preset bundles (alert sources + recipes + plugin configs) that cover most of these in one click. The pages below are the full manual for when a bundle doesn't quite fit.
Inbound webhooks¶
All inbound alerts hit the same endpoint:
POST https://<your-domain>/api/v1/webhooks/alerts/<tenant-slug>
Content-Type: application/json
X-OpenRemedy-Signature: sha256=<hex digest>
{
"hostname": "web-01",
"incident_type": "disk_full",
"severity": "high",
"evidence": {
"disk_usage_percent": 95,
"mount": "/"
}
}
Supported incident types: service_down, disk_full,
cpu_high, memory_high, port_unavailable, custom.
Supported severities: critical, high, medium, low,
info. Evidence: any JSON object with relevant data — the
classifier uses it to match a recipe.
Authentication: HMAC signatures¶
Every request must carry an HMAC-SHA256 signature of the raw body,
keyed by the tenant's webhook_secret. Unsigned or wrongly-signed
requests get 401 Unauthorized. The endpoint is rate-limited at
60 requests/min per source IP. The cryptographic detail lives
in Security → Webhook authentication.
The signing recipe in three lines: fetch the tenant's
webhook_secret from the dashboard (Settings → Webhooks);
compute sha256=<hex> of the raw body, byte-exact, no
re-serialisation; send it in X-OpenRemedy-Signature.
bash + openssl + curl¶
SECRET="your-tenant-webhook-secret"
BODY='{"hostname":"web-01","incident_type":"disk_full","severity":"high","evidence":{}}'
SIG=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')
curl -X POST https://app.openremedy.io/api/v1/webhooks/alerts/my-company \
-H "Content-Type: application/json" \
-H "X-OpenRemedy-Signature: sha256=$SIG" \
-d "$BODY"
Python¶
import hashlib, hmac, json, requests
secret = "your-tenant-webhook-secret"
body = json.dumps({"hostname": "web-01", "incident_type": "disk_full", "severity": "high", "evidence": {}})
sig = hmac.new(secret.encode(), body.encode(), hashlib.sha256).hexdigest()
requests.post(
"https://app.openremedy.io/api/v1/webhooks/alerts/my-company",
data=body, # NOT json= (would re-serialise and break the signature)
headers={
"Content-Type": "application/json",
"X-OpenRemedy-Signature": f"sha256={sig}",
},
)
Node.js¶
import crypto from "crypto";
const secret = "your-tenant-webhook-secret";
const body = JSON.stringify({ hostname: "web-01", incident_type: "disk_full" });
const sig = crypto.createHmac("sha256", secret).update(body).digest("hex");
await fetch("https://app.openremedy.io/api/v1/webhooks/alerts/my-company", {
method: "POST",
body,
headers: {
"Content-Type": "application/json",
"X-OpenRemedy-Signature": `sha256=${sig}`,
},
});
For senders that can't sign on the wire (Grafana's basic webhook, PagerDuty's generic format), use a sidecar adapter that takes the upstream payload, signs it, and forwards. The Alertmanager example below shows the pattern.
Prometheus + Alertmanager¶
Step 1 — alert rules¶
/etc/prometheus/rules/openremedy.yml:
groups:
- name: openremedy
rules:
- alert: DiskFull
expr: (1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 90
for: 5m
labels:
severity: high
incident_type: disk_full
annotations:
hostname: "{{ $labels.instance }}"
mount: "{{ $labels.mountpoint }}"
usage_percent: "{{ $value }}"
- alert: HighCPU
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
for: 5m
labels:
severity: high
incident_type: cpu_high
annotations:
hostname: "{{ $labels.instance }}"
cpu_percent: "{{ $value }}"
- alert: HighMemory
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 5m
labels:
severity: high
incident_type: memory_high
annotations:
hostname: "{{ $labels.instance }}"
memory_percent: "{{ $value }}"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
incident_type: service_down
annotations:
hostname: "{{ $labels.instance }}"
- alert: PortDown
expr: probe_success == 0
for: 2m
labels:
severity: high
incident_type: port_unavailable
annotations:
hostname: "{{ $labels.instance }}"
Step 2 — Alertmanager receiver¶
/etc/alertmanager/alertmanager.yml:
route:
receiver: openremedy
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: openremedy
webhook_configs:
- url: 'http://localhost:9095/alertmanager'
send_resolved: true
Alertmanager points at the adapter, not at OpenRemedy directly, because Alertmanager cannot sign requests. The adapter signs and forwards.
Step 3 — adapter¶
#!/usr/bin/env python3
"""Alertmanager → OpenRemedy webhook adapter. Run as a sidecar.
Reads the tenant's HMAC secret from the env so it can sign every
forwarded request.
"""
import hashlib, hmac, json, os
from flask import Flask, request, jsonify
import requests
app = Flask(__name__)
OPENREMEDY_URL = "https://app.openremedy.io/api/v1/webhooks/alerts/my-company"
WEBHOOK_SECRET = os.environ["OPENREMEDY_WEBHOOK_SECRET"]
def _signed_post(url: str, payload: dict) -> requests.Response:
body = json.dumps(payload).encode("utf-8")
sig = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256).hexdigest()
return requests.post(
url,
data=body, # raw bytes — re-serialising would break the signature
headers={
"Content-Type": "application/json",
"X-OpenRemedy-Signature": f"sha256={sig}",
},
timeout=10,
)
@app.route("/alertmanager", methods=["POST"])
def handle():
data = request.json
for alert in data.get("alerts", []):
labels = alert.get("labels", {})
annotations = alert.get("annotations", {})
payload = {
"hostname": annotations.get("hostname", labels.get("instance", "unknown")).split(":")[0],
"incident_type": labels.get("incident_type", "custom"),
"severity": labels.get("severity", "medium"),
"evidence": {
"alertname": labels.get("alertname", ""),
"status": alert.get("status", ""),
**{k: v for k, v in annotations.items() if k != "hostname"},
},
}
try:
_signed_post(OPENREMEDY_URL, payload)
except Exception as e:
print(f"Failed: {e}")
return jsonify({"status": "ok"})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=9095)
Grafana¶
Grafana's webhook contact point can hit OpenRemedy via the same
adapter pattern, or — if you can write a custom notification
template — you can sign the body inside Grafana's templating layer.
The simpler path is the Alertmanager-style adapter on
localhost:9095.
Map Grafana labels to the OpenRemedy fields the classifier expects:
| Field | Source |
|---|---|
hostname |
instance label of the alert |
incident_type |
a dedicated incident_type label on the rule |
severity |
a severity label on the rule |
Datadog, PagerDuty, custom HTTP¶
Same pattern: a small adapter (Flask, FastAPI, Cloudflare Worker —
any HTTP endpoint will do) that translates the upstream payload to
the {hostname, incident_type, severity, evidence} shape and
signs it. Translation tables for Datadog priorities (P1→critical,
P2→high…) and PagerDuty urgencies live in the marketplace bundles.
Curl one-liner for arbitrary sources:
curl -X POST https://app.openremedy.io/api/v1/webhooks/alerts/my-company \
-H "Content-Type: application/json" \
-H "X-OpenRemedy-Signature: sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')" \
-d "$BODY"
Evidence best-practices¶
The richer the evidence, the better the LLM classifier matches an
incident to a recipe.
| Type | Useful evidence fields |
|---|---|
disk_full |
disk_usage_percent, mount, largest_files |
cpu_high |
cpu_percent, load_1m, cores, top_process |
memory_high |
memory_percent, total_mb, used_mb, top_process |
service_down |
service_name, service_active, error, exit_code |
port_unavailable |
port, port_open, expected_service |
custom |
Any relevant JSON — the LLM analyses freely |
Daemon — host-side agent¶
The Go daemon (openremedy-client) lives on each managed Linux
host. It registers with the platform, runs operator-defined
monitors on a schedule, and posts heartbeats and evidence back
through /daemon/v1/*. It is the only "always-on" path into the
platform that doesn't require an external monitoring tool.
For the operator workflow — install URL, registration token, config schema, log paths — see:
Custom monitors use HMAC-signed commands and the
agent_version >= 0.2.0 gate; the cryptographic detail is in
Security → Daemon authentication.
Outbound notifications¶
Outbound notifications use the HookDispatcher
(plugins/hooks.py). Pipeline events fire into the dispatcher,
which loads the tenant's enabled plugin configs, evaluates each
configured rule against the event, and runs matching plugins as
fire-and-forget asyncio tasks with a 10-second timeout. Failures
are logged into hook_events for the UI but never propagate back
into the agent pipeline.
The legacy
notify()shim (services/notifications.py) still exists for backwards compatibility. It maps old event names to the canonical hook namespace and dispatches through the sameHookDispatcher. New code should call the dispatcher directly; nothing inside the platform writes newnotify()call sites.
Event surface¶
| Event | When |
|---|---|
incident.created |
A new incident was opened (any source). |
incident.resolved |
Incident moved to resolved. |
incident.escalated |
Incident moved to escalated (operator action required). |
incident.cancelled |
Incident closed without remediation. |
incident.comment_added |
A user (or agent) posted a comment. |
recipe.proposed |
The agent proposed a remediation. |
approval.required |
Trust × risk gate paused execution; a human must approve. |
approval.resolved |
An operator approved or rejected the proposal. |
execution.completed |
The recipe finished (success or failure). |
stage.completed |
A pipeline stage (triage / diagnose / validate / execute / review) finished. |
sla.breached |
An SLA timer crossed its threshold. |
maintenance.scheduled / .approved / .started / .completed / .failed / .paused / .resumed |
Maintenance plan lifecycle. |
maintenance.step.awaiting_approval |
A manual step is waiting on an operator. |
agent.notification |
Free-form push from an agent tool. |
Each plugin declares which subset of these it subscribes to via its
config_schema()['rules']['events'] block.
Built-in plugins¶
Six plugins ship with the platform; tenants enable and configure them per-tenant from Settings → Plugins. Configuration is encrypted at rest using the same AES-256-GCM key that protects SSH credentials.
| Plugin | What it does | Required config |
|---|---|---|
discord |
Posts an embed to a Discord channel webhook. | webhook_url, optional dashboard_url, optional default mention. |
slack |
Posts a Block Kit message via incoming-webhook URL. | webhook_url, optional dashboard_url. |
teams |
Posts an Adaptive Card to a Microsoft Teams channel. | webhook_url, optional dashboard_url. |
servicenow |
Creates / updates incident (or another) table records via the Table API. |
instance_url, username, password, optional table (default incident). |
jira |
Creates an issue in a project, attaches subsequent updates as comments. | instance_url, email, api_token, project_key, optional default_issue_type. |
webhook |
Posts the raw HookPayload JSON to an arbitrary URL. Useful when none of the above fit. |
url, optional HMAC secret (mirrors the inbound signing scheme). |
Rules — fan-out and filtering¶
A plugin config carries a list of rules. Each rule has:
{
"event": "incident.created",
"conditions": [
{"field": "severity", "operator": "in", "value": ["critical", "high"]},
{"field": "incident_type", "operator": "eq", "value": "disk_full"}
],
"action": "post_message",
"params": {"channel_override": "#incidents-disk"}
}
When the dispatcher receives an event, it iterates every active
rule. A rule matches if its event is the firing event and all
its conditions evaluate true. Operators are eq, in, and
contains; unknown operators fail closed (the rule does not
fire). Resolvable fields include severity, incident_type,
source, status, stage, tenant_id, and any
extra.<key> path.
A plugin without any rules fires on every event it subscribes to — that's the "send everything to this Slack channel" mode. Tenants that want fan-out (separate Slack channels for separate severities) add multiple rules; the same plugin instance runs once per matching rule.
Failure model¶
Hooks are fire-and-forget. The dispatcher records every attempt in
hook_events with status ∈ {success, failed, timeout}. A
failed plugin does not retry automatically; the operator can
re-fire from the UI. The 10-second per-call timeout is fixed at
the dispatcher and not configurable per plugin — long-running
ServiceNow / Jira POSTs that exceed it are logged as timeout and
abandoned.
Programmatic API¶
Every page in the dashboard is backed by a documented REST
endpoint at /api/v1/*. Programmatic clients authenticate with a
JWT bearer token (CLI, daemon, sidecar adapters) or with the
HttpOnly cookie the browser receives on /auth/login (SPA,
SSR).
Get a token:
curl -X POST https://app.openremedy.io/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email": "ops@example.com", "password": "..."}'
# response includes an access_token field
Refresh tokens have a 30-day default lifetime;
/api/v1/auth/refresh returns a fresh pair. Expiry is configurable
via OREMEDY_ACCESS_TOKEN_EXPIRE_MINUTES and
OREMEDY_REFRESH_TOKEN_EXPIRE_DAYS — see
Configuration → Auth tuning.
Most-used endpoints¶
| Endpoint | Purpose |
|---|---|
GET /api/v1/incidents |
List incidents (tenant-scoped, paginated). |
GET /api/v1/incidents/{id} |
Full incident detail including timeline. |
POST /api/v1/incidents/{id}/comments |
Add a comment (re-invokes the IncidentWatcher). |
POST /api/v1/incidents/{id}/cancel |
Move an incident to cancelled. |
GET /api/v1/servers |
Server inventory. |
POST /api/v1/servers |
Register a server (operator path; the daemon uses /daemon/v1/register instead). |
GET /api/v1/recipes |
Recipe catalogue. |
GET /api/v1/executions/{id} |
Execution detail. |
POST /api/v1/executions/{id}/approve / /reject |
Approval gate decisions. |
GET /api/v1/maintenance/schedules |
Maintenance schedule list. |
POST /api/v1/maintenance/schedules |
Create a maintenance schedule. |
GET /api/v1/audit/logs |
Audit log query (tenant-scoped). |
GET /api/v1/admin/dashboard |
Cross-tenant fleet stats (superadmin only). |
The full machine-readable list is at
https://app.openremedy.io/openapi.json. The dashboard ships an
embedded Swagger / Stoplight viewer at
https://app.openremedy.io/docs (auth required).
Tenant scoping¶
Every endpoint that touches tenant-owned data resolves
current_user.tenant_id from the JWT and filters on it.
Cross-tenant lookups respond with 404 (not 403) so a malicious
caller can't probe for the existence of a foreign UUID. Superadmin
is the only role that bypasses this; impersonation flips a
superadmin's session into a target tenant and is fully audited.
Marketplace bundles¶
The marketplace (Settings → Marketplace) packages a set of recipes, alert sources, and plugin presets as a single installable bundle. Installing the "NGINX core ops" bundle, for example, adds the relevant Prometheus alert rules, the matching recipes for restart / reload / stale-config recovery, and a Slack plugin preset wired to fire on the related incident events. Bundles are the fastest way to wire up a new tenant; everything they install is also reachable through the manual surfaces above and can be edited freely after install.
For the bundle catalogue and authoring guide see Dashboard → Marketplace.
Real-time WebSockets¶
Two WebSocket channels stream live state out to the browser (and to any other client willing to speak the protocol).
/ws/incidents¶
A tenant-scoped firehose of incident lifecycle events. Every
backend that mutates incident state — swarm/events.py,
worker/notify.py, the proactive loops — publishes through Redis
pub/sub on the incidents channel. The WebSocket handler reads
the connection's JWT-bound tenant_id and drops every message
that doesn't match it. Superadmin connections see all tenants.
/ws/executions/{execution_id}¶
Live output from a running execution — Ansible stdout, recipe progress, exit code. Before subscribing, the handler verifies that the connection's tenant owns the execution UUID; cross-tenant attempts close with policy-violation status.
Authentication¶
Both endpoints accept either:
- the
access_tokencookie (browser default on a same-origin upgrade), or - the
Sec-WebSocket-Protocol: bearer, <jwt>slot for non-browser clients.
URL query params are not supported because they leak into proxy access logs. Pre-handshake auth failures close the WS with policy-violation status — the cryptographic detail is in Security → WebSocket handshake.
Browser — minimal example¶
// The browser sends the access_token cookie automatically.
const ws = new WebSocket("wss://app.openremedy.io/ws/incidents");
ws.onmessage = (e) => {
const event = JSON.parse(e.data);
console.log(event.type, event.incident_id);
};
Non-browser — Python¶
import websockets
async with websockets.connect(
"wss://app.openremedy.io/ws/incidents",
subprotocols=["bearer", JWT_TOKEN],
) as ws:
async for msg in ws:
...
See also¶
- Daemon → Install — host-side agent setup.
- Recipes — how the receiving end uses the alerts you forward.
- Configuration — environment variables that govern auth, CORS, rate limits.
- Security — full cryptographic posture of every surface this page touches.