Skip to content

Daemon configuration

This page is the reference for /etc/openremedy-client/config.json and the runtime surfaces the daemon exposes (local API, sudo rules, systemd hardening). For the install + register flow, see install.

Config file

Format: JSON or YAML (extension picks the parser). Default location: /etc/openremedy-client/config.json. Permissions: 0600, owner openremedy-client. The daemon refuses to start if the file is world-readable.

{
  "platform_url": "https://app.example.com",
  "token": "orem_srv_xxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "log_level": "info",
  "heartbeat_interval_seconds": 30,
  "report_interval_seconds": 60
}
Field Required Default Meaning
platform_url yes The base URL of the OpenRemedy platform (https://app.example.com). The daemon talks to <platform_url>/daemon/v1/....
token yes One-time server-registration token from the dashboard (orem_srv_…). Read only at first start; after registration the daemon uses the persisted session_token from state.json.
log_level no info info or debug. Verbose debug logs go to journald.
heartbeat_interval_seconds no 30 How often the daemon POSTs /daemon/v1/heartbeat.
report_interval_seconds no 60 How often the daemon POSTs /daemon/v1/evidence (monitor results + alerts + every-5th-cycle discovery snapshot).

The platform can override the two intervals in its registration response — useful for slowing down a noisy host fleet without touching every config file.

State file

/etc/openremedy-client/state.json is written by the daemon after the first successful registration. Don't hand-edit it; if registration goes sideways, delete the file and restart the daemon to re-register.

{
  "server_id": "1f4b56ca-4ede-4494-aa6e-808870ca673a",
  "session_token": "orem_sess_...",
  "registered_at": "2026-05-08T01:30:00Z"
}

The session_token is the bearer token the daemon uses for all subsequent platform calls. The original token from config.json is invalid after registration — it's a one-shot.

Local API (port 9201)

The daemon binds a small HTTP server to 127.0.0.1:9201 (loopback only — never reachable from the network). Two endpoints, both for on-host troubleshooting and metrics scraping.

GET /healthz

curl -fsS http://127.0.0.1:9201/healthz | jq .

Returns:

{
  "status": "ok",
  "uptime_seconds": 3621,
  "last_heartbeat_at": "2026-05-08T01:30:00Z",
  "last_evidence_at": "2026-05-08T01:29:30Z",
  "last_error": null,
  "counts": {
    "heartbeat_attempts": 121,
    "heartbeat_errors": 0,
    "evidence_reports": 60,
    "evidence_errors": 0
  },
  "self": {
    "rss_bytes": 18874368,
    "cpu_pct": 0.4,
    "goroutines": 14,
    "open_fds": 23
  }
}

Status values:

  • ok — heartbeat succeeded within the last 3 × heartbeat_interval.
  • degraded — no successful heartbeat in that window. HTTP 503 returned.
  • starting — daemon hasn't completed its first heartbeat yet.

GET /metrics

Prometheus text format. Same data as /healthz. Useful when the host also runs node_exporter — drop a scrape config pointing at 127.0.0.1:9201.

# HELP openremedy_client_self_rss_bytes Resident memory of the daemon.
# TYPE openremedy_client_self_rss_bytes gauge
openremedy_client_self_rss_bytes 18874368
# HELP openremedy_client_heartbeat_total Heartbeat attempts since startup.
# TYPE openremedy_client_heartbeat_total counter
openremedy_client_heartbeat_total{result="ok"} 121
openremedy_client_heartbeat_total{result="error"} 0
...

Monitor types

Monitors are defined on the platform side (alert policies) and sent to the daemon via /daemon/v1/tasks (polled every five minutes, cached to disk). The daemon doesn't decide what to monitor — operators do, in the dashboard.

Type Required fields Optional fields
service name (systemd unit name) expect_state (default active)
port port (integer) protocol (tcp/udp, default tcp), timeout_seconds (default 5)
http url expect_status (default 200), timeout_seconds (default 5)
custom command timeout_seconds (default 30), expect_exit (default 0), signature (HMAC-SHA256, required)

Custom monitors are HMAC-signed by the platform using the daemon's session token. The daemon verifies the signature before exec'ing. Unsigned custom monitors are silently dropped — this is the defence against an attacker who can write to the tasks cache file (they can't forge a signature without the session token).

Custom monitors also require client version ≥ 0.2.0; older daemons get HTTP 426 from the platform when their tasks endpoint includes custom monitors, so the platform skips them.

Systemd hardening

The unit file shipped in the .deb:

[Service]
User=openremedy-client
Group=openremedy-client
ExecStart=/usr/local/bin/openremedy-client --config /etc/openremedy-client/config.json
Restart=always
RestartSec=5
LimitNOFILE=65535

# Hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=/var/lib/openremedy-client /etc/openremedy-client

[Install]
WantedBy=multi-user.target

NoNewPrivileges=yes blocks setuid escalation. The daemon uses sudo NOPASSWD for the privileged commands it needs (see below) — sudo's mechanism doesn't rely on setuid in the way affected by this flag.

Sudo rules

/etc/sudoers.d/openremedy-client is installed by the .deb and validated by the postinst (the package fails to install if the file is syntactically invalid). It grants the daemon NOPASSWD sudo for:

  • systemctlstart, stop, restart, reload, enable, disable, status, is-active, is-enabled, daemon-reload, list-units, show.
  • journalctl — query logs.
  • apt-get / apt / dnf / yum — package operations (recipe and maintenance steps install/upgrade packages via sudo).
  • File opscp, mv, chmod, chown, rm on common config paths.
  • Firewallufw, iptables.
  • certbot — TLS renewals from a maintenance plan.
  • Process controlkill, killall.

Docker is reached without sudo by group membership. Other discovery + diagnostic commands (ss, df, free, top, ps, stat, cat of public files) need no privilege.

Resilience

Two failure modes the daemon handles gracefully:

Network outage on first start

If the platform is unreachable on first boot, registration fails and the daemon exits — there's no fallback for that case. Re-run the install flow once the platform is reachable.

Platform unreachable mid-run

If the platform is reachable at startup but goes away later, the daemon keeps running. The tasks endpoint is polled every five minutes; failures fall back to the on-disk cache at /var/lib/openremedy-client/tasks-cache.json. The cache is overwritten on every successful poll.

The cached maintenance_active flag is not trusted — the daemon always assumes live mode if it can't ask the platform fresh. Safer default: continue to run monitors; mark as degraded on the local healthz so an operator on the host knows.

Maintenance window awareness

When the platform marks the host as maintenance_active=true (e.g. because a maintenance schedule is running against it), the daemon suppresses evidence pushes for the duration. Heartbeats continue — the platform still wants to know the host is alive — but no monitor results, no alerts. This prevents a maintenance plan that, say, restarts a service, from triggering an alert about the service being down for the seven seconds it took to restart.

Reconfiguring an existing daemon

Change How
Rotate the session token Delete /etc/openremedy-client/state.json, regenerate a server token in the dashboard, edit config.json to use it, restart the daemon.
Change platform_url (e.g. domain rename) Edit config.json; the registered server in the platform stays the same — same server_id, same session_token. The daemon just talks to a new host.
Slow down heartbeat / evidence Edit config.json (or override server-side at registration), restart.
Move the daemon between hosts Don't. Re-register from scratch. The fingerprint catches token theft and the daemon will refuse to talk to the platform.

See also