Daemon configuration¶

This page is the reference for /etc/openremedy-client/config.json and the runtime surfaces the daemon exposes (local API, sudo rules, systemd hardening). For the install + register flow, see install.

Config file¶

Format: JSON or YAML (extension picks the parser). Default location: /etc/openremedy-client/config.json. Permissions: 0600, owner openremedy-client. The daemon refuses to start if the file is world-readable.

{
  "platform_url": "https://app.example.com",
  "token": "orem_srv_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "log_level": "info",
  "heartbeat_interval_seconds": 30,
  "report_interval_seconds": 60
}

Field	Required	Default	Meaning
`platform_url`	yes	—	The base URL of the OpenRemedy platform (`https://app.example.com`). The daemon talks to `<platform_url>/daemon/v1/...`.
`token`	yes	—	One-time server-registration token from the dashboard (`orem_srv_…`). Read only at first start; after registration the daemon uses the persisted `session_token` from `state.json`.
`log_level`	no	`info`	`info` or `debug`. Verbose debug logs go to journald.
`heartbeat_interval_seconds`	no	30	How often the daemon POSTs `/daemon/v1/heartbeat`.
`report_interval_seconds`	no	60	How often the daemon POSTs `/daemon/v1/evidence` (monitor results + alerts + every-5th-cycle discovery snapshot).

The platform can override the two intervals in its registration response — useful for slowing down a noisy host fleet without touching every config file.

State file¶

/etc/openremedy-client/state.json is written by the daemon after the first successful registration. Don't hand-edit it; if registration goes sideways, delete the file and restart the daemon to re-register.

{
  "server_id": "1f4b56ca-4ede-4494-aa6e-808870ca673a",
  "session_token": "orem_sess_...",
  "registered_at": "2026-05-08T01:30:00Z"
}

The session_token is the bearer token the daemon uses for all subsequent platform calls. The original token from config.json is invalid after registration — it's a one-shot.

Local API (port 9201)¶

The daemon binds a small HTTP server to 127.0.0.1:9201 (loopback only — never reachable from the network). Two endpoints, both for on-host troubleshooting and metrics scraping.

`GET /healthz`¶

curl -fsS http://127.0.0.1:9201/healthz | jq .

Returns:

{
  "status": "ok",
  "uptime_seconds": 3621,
  "last_heartbeat_at": "2026-05-08T01:30:00Z",
  "last_evidence_at": "2026-05-08T01:29:30Z",
  "last_error": null,
  "counts": {
    "heartbeat_attempts": 121,
    "heartbeat_errors": 0,
    "evidence_reports": 60,
    "evidence_errors": 0
  },
  "self": {
    "rss_bytes": 18874368,
    "cpu_pct": 0.4,
    "goroutines": 14,
    "open_fds": 23
  }
}

Status values:

ok — heartbeat succeeded within the last 3 × heartbeat_interval.
degraded — no successful heartbeat in that window. HTTP 503 returned.
starting — daemon hasn't completed its first heartbeat yet.

`GET /metrics`¶

Prometheus text format. Same data as /healthz. Useful when the host also runs node_exporter — drop a scrape config pointing at 127.0.0.1:9201.

# HELP openremedy_client_self_rss_bytes Resident memory of the daemon.
# TYPE openremedy_client_self_rss_bytes gauge
openremedy_client_self_rss_bytes 18874368
# HELP openremedy_client_heartbeat_total Heartbeat attempts since startup.
# TYPE openremedy_client_heartbeat_total counter
openremedy_client_heartbeat_total{result="ok"} 121
openremedy_client_heartbeat_total{result="error"} 0
...

Monitor types¶

Monitors are defined on the platform side (alert policies) and sent to the daemon via /daemon/v1/tasks (polled every five minutes, cached to disk). The daemon doesn't decide what to monitor — operators do, in the dashboard.

Type	Required fields	Optional fields
`service`	`name` (systemd unit name)	`expect_state` (default `active`)
`port`	`port` (integer)	`protocol` (`tcp`/`udp`, default `tcp`), `timeout_seconds` (default 5)
`http`	`url`	`expect_status` (default 200), `timeout_seconds` (default 5)
`custom`	`command`	`timeout_seconds` (default 30), `expect_exit` (default 0), `signature` (HMAC-SHA256, required)

Custom monitors are HMAC-signed by the platform using the daemon's session token. The daemon verifies the signature before exec'ing. Unsigned custom monitors are silently dropped — this is the defence against an attacker who can write to the tasks cache file (they can't forge a signature without the session token).

Custom monitors also require client version ≥ 0.2.0; older daemons get HTTP 426 from the platform when their tasks endpoint includes custom monitors, so the platform skips them.

Systemd hardening¶

The unit file shipped in the .deb:

[Service]
User=openremedy-client
Group=openremedy-client
ExecStart=/usr/local/bin/openremedy-client --config /etc/openremedy-client/config.json
Restart=always
RestartSec=5
LimitNOFILE=65535

# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/openremedy-client /etc/openremedy-client

[Install]
WantedBy=multi-user.target

NoNewPrivileges=yes blocks setuid escalation. The daemon uses sudo NOPASSWD for the privileged commands it needs (see below) — sudo's mechanism doesn't rely on setuid in the way affected by this flag.

Sudo rules¶

/etc/sudoers.d/openremedy-client is installed by the .deb and validated by the postinst (the package fails to install if the file is syntactically invalid). It grants the daemon NOPASSWD sudo for:

systemctl — start, stop, restart, reload, enable, disable, status, is-active, is-enabled, daemon-reload, list-units, show.
journalctl — query logs.
apt-get / apt / dnf / yum — package operations (recipe and maintenance steps install/upgrade packages via sudo).
File ops — cp, mv, chmod, chown, rm on common config paths.
Firewall — ufw, iptables.
certbot — TLS renewals from a maintenance plan.
Process control — kill, killall.

Docker is reached without sudo by group membership. Other discovery + diagnostic commands (ss, df, free, top, ps, stat, cat of public files) need no privilege.

Resilience¶

Two failure modes the daemon handles gracefully:

Network outage on first start¶

If the platform is unreachable on first boot, registration fails and the daemon exits — there's no fallback for that case. Re-run the install flow once the platform is reachable.

Platform unreachable mid-run¶

If the platform is reachable at startup but goes away later, the daemon keeps running. The tasks endpoint is polled every five minutes; failures fall back to the on-disk cache at /var/lib/openremedy-client/tasks-cache.json. The cache is overwritten on every successful poll.

The cached maintenance_active flag is not trusted — the daemon always assumes live mode if it can't ask the platform fresh. Safer default: continue to run monitors; mark as degraded on the local healthz so an operator on the host knows.

Maintenance window awareness¶

When the platform marks the host as maintenance_active=true (e.g. because a maintenance schedule is running against it), the daemon suppresses evidence pushes for the duration. Heartbeats continue — the platform still wants to know the host is alive — but no monitor results, no alerts. This prevents a maintenance plan that, say, restarts a service, from triggering an alert about the service being down for the seven seconds it took to restart.

Reconfiguring an existing daemon¶

Change	How
Rotate the session token	Delete `/etc/openremedy-client/state.json`, regenerate a server token in the dashboard, edit `config.json` to use it, restart the daemon.
Change `platform_url` (e.g. domain rename)	Edit `config.json`; the registered server in the platform stays the same — same `server_id`, same `session_token`. The daemon just talks to a new host.
Slow down heartbeat / evidence	Edit `config.json` (or override server-side at registration), restart.
Move the daemon between hosts	Don't. Re-register from scratch. The fingerprint catches token theft and the daemon will refuse to talk to the platform.