Daemon configuration¶
This page is the reference for /etc/openremedy-client/config.json
and the runtime surfaces the daemon exposes (local API, sudo rules,
systemd hardening). For the install + register flow, see
install.
Config file¶
Format: JSON or YAML (extension picks the parser). Default location:
/etc/openremedy-client/config.json. Permissions: 0600, owner
openremedy-client. The daemon refuses to start if the file is
world-readable.
{
"platform_url": "https://app.example.com",
"token": "orem_srv_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"log_level": "info",
"heartbeat_interval_seconds": 30,
"report_interval_seconds": 60
}
| Field | Required | Default | Meaning |
|---|---|---|---|
platform_url |
yes | — | The base URL of the OpenRemedy platform (https://app.example.com). The daemon talks to <platform_url>/daemon/v1/.... |
token |
yes | — | One-time server-registration token from the dashboard (orem_srv_…). Read only at first start; after registration the daemon uses the persisted session_token from state.json. |
log_level |
no | info |
info or debug. Verbose debug logs go to journald. |
heartbeat_interval_seconds |
no | 30 | How often the daemon POSTs /daemon/v1/heartbeat. |
report_interval_seconds |
no | 60 | How often the daemon POSTs /daemon/v1/evidence (monitor results + alerts + every-5th-cycle discovery snapshot). |
The platform can override the two intervals in its registration response — useful for slowing down a noisy host fleet without touching every config file.
State file¶
/etc/openremedy-client/state.json is written by the daemon after
the first successful registration. Don't hand-edit it; if registration
goes sideways, delete the file and restart the daemon to re-register.
{
"server_id": "1f4b56ca-4ede-4494-aa6e-808870ca673a",
"session_token": "orem_sess_...",
"registered_at": "2026-05-08T01:30:00Z"
}
The session_token is the bearer token the daemon uses for all
subsequent platform calls. The original token from config.json
is invalid after registration — it's a one-shot.
Local API (port 9201)¶
The daemon binds a small HTTP server to 127.0.0.1:9201 (loopback
only — never reachable from the network). Two endpoints, both for
on-host troubleshooting and metrics scraping.
GET /healthz¶
Returns:
{
"status": "ok",
"uptime_seconds": 3621,
"last_heartbeat_at": "2026-05-08T01:30:00Z",
"last_evidence_at": "2026-05-08T01:29:30Z",
"last_error": null,
"counts": {
"heartbeat_attempts": 121,
"heartbeat_errors": 0,
"evidence_reports": 60,
"evidence_errors": 0
},
"self": {
"rss_bytes": 18874368,
"cpu_pct": 0.4,
"goroutines": 14,
"open_fds": 23
}
}
Status values:
ok— heartbeat succeeded within the last3 × heartbeat_interval.degraded— no successful heartbeat in that window. HTTP 503 returned.starting— daemon hasn't completed its first heartbeat yet.
GET /metrics¶
Prometheus text format. Same data as /healthz. Useful when the host
also runs node_exporter — drop a scrape config pointing at
127.0.0.1:9201.
# HELP openremedy_client_self_rss_bytes Resident memory of the daemon.
# TYPE openremedy_client_self_rss_bytes gauge
openremedy_client_self_rss_bytes 18874368
# HELP openremedy_client_heartbeat_total Heartbeat attempts since startup.
# TYPE openremedy_client_heartbeat_total counter
openremedy_client_heartbeat_total{result="ok"} 121
openremedy_client_heartbeat_total{result="error"} 0
...
Monitor types¶
Monitors are defined on the platform side (alert policies) and sent
to the daemon via /daemon/v1/tasks (polled every five minutes,
cached to disk). The daemon doesn't decide what to monitor —
operators do, in the dashboard.
| Type | Required fields | Optional fields |
|---|---|---|
service |
name (systemd unit name) |
expect_state (default active) |
port |
port (integer) |
protocol (tcp/udp, default tcp), timeout_seconds (default 5) |
http |
url |
expect_status (default 200), timeout_seconds (default 5) |
custom |
command |
timeout_seconds (default 30), expect_exit (default 0), signature (HMAC-SHA256, required) |
Custom monitors are HMAC-signed by the platform using the daemon's
session token. The daemon verifies the signature before exec'ing.
Unsigned custom monitors are silently dropped — this is the
defence against an attacker who can write to the tasks cache file
(they can't forge a signature without the session token).
Custom monitors also require client version ≥ 0.2.0; older daemons get HTTP 426 from the platform when their tasks endpoint includes custom monitors, so the platform skips them.
Systemd hardening¶
The unit file shipped in the .deb:
[Service]
User=openremedy-client
Group=openremedy-client
ExecStart=/usr/local/bin/openremedy-client --config /etc/openremedy-client/config.json
Restart=always
RestartSec=5
LimitNOFILE=65535
# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/openremedy-client /etc/openremedy-client
[Install]
WantedBy=multi-user.target
NoNewPrivileges=yes blocks setuid escalation. The daemon uses sudo
NOPASSWD for the privileged commands it needs (see below) — sudo's
mechanism doesn't rely on setuid in the way affected by this flag.
Sudo rules¶
/etc/sudoers.d/openremedy-client is installed by the .deb and
validated by the postinst (the package fails to install if the file
is syntactically invalid). It grants the daemon NOPASSWD sudo for:
- systemctl —
start,stop,restart,reload,enable,disable,status,is-active,is-enabled,daemon-reload,list-units,show. - journalctl — query logs.
- apt-get / apt / dnf / yum — package operations (recipe and maintenance steps install/upgrade packages via sudo).
- File ops —
cp,mv,chmod,chown,rmon common config paths. - Firewall —
ufw,iptables. - certbot — TLS renewals from a maintenance plan.
- Process control —
kill,killall.
Docker is reached without sudo by group membership. Other
discovery + diagnostic commands (ss, df, free, top, ps,
stat, cat of public files) need no privilege.
Resilience¶
Two failure modes the daemon handles gracefully:
Network outage on first start¶
If the platform is unreachable on first boot, registration fails and the daemon exits — there's no fallback for that case. Re-run the install flow once the platform is reachable.
Platform unreachable mid-run¶
If the platform is reachable at startup but goes away later, the
daemon keeps running. The tasks endpoint is polled every five
minutes; failures fall back to the on-disk cache at
/var/lib/openremedy-client/tasks-cache.json. The cache is
overwritten on every successful poll.
The cached maintenance_active flag is not trusted — the
daemon always assumes live mode if it can't ask the platform fresh.
Safer default: continue to run monitors; mark as degraded on the
local healthz so an operator on the host knows.
Maintenance window awareness¶
When the platform marks the host as maintenance_active=true (e.g.
because a maintenance schedule is running against it), the daemon
suppresses evidence pushes for the duration. Heartbeats continue —
the platform still wants to know the host is alive — but no monitor
results, no alerts. This prevents a maintenance plan that, say,
restarts a service, from triggering an alert about the service being
down for the seven seconds it took to restart.
Reconfiguring an existing daemon¶
| Change | How |
|---|---|
| Rotate the session token | Delete /etc/openremedy-client/state.json, regenerate a server token in the dashboard, edit config.json to use it, restart the daemon. |
Change platform_url (e.g. domain rename) |
Edit config.json; the registered server in the platform stays the same — same server_id, same session_token. The daemon just talks to a new host. |
| Slow down heartbeat / evidence | Edit config.json (or override server-side at registration), restart. |
| Move the daemon between hosts | Don't. Re-register from scratch. The fingerprint catches token theft and the daemon will refuse to talk to the platform. |
See also¶
- Daemon install — initial setup.
- Architecture: daemon flow — what's on the wire between daemon and platform.
- Security — HMAC custom-monitor signing,
fingerprint integrity, the
agent_versiongate.