Integrate ZFS pool health and alerting into the monitoring stack #2
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The JBOD monitor currently treats ZFS pool membership as a static label — it shows which pool a drive belongs to, but has zero visibility into actual ZFS health. Pool degradation, scrub errors, capacity warnings, and resilver progress are all invisible until you SSH in and run
zpool statusmanually.Since we already have the drive-level SMART data and enclosure topology, ZFS health is the missing layer that ties physical hardware state to logical storage state.
Proposed Solution
Add a ZFS health module that polls pool status, scrub history, and per-vdev error counters, then surfaces alerts through both the API and the frontend dashboard.
Architecture
Implementation Details
1. ZFS Pool Poller (
services/zfs.py)Extend the existing
get_zfs_pool_map()into a full ZFS health service:2. Data Model (
models/zfs_schemas.py)3. Alert Engine (
services/zfs_alerts.py)Generate alerts from pool health data. Each alert has a severity and maps back to the physical drive/enclosure when possible.
Key feature: When an alert references a device (e.g.,
sdahas checksum errors), cross-reference the enclosure topology to include the physical slot and enclosure ID. This lets the frontend highlight the exact bay in the grid view.4. Redis Cache Keys
jbod:zfs:poolsjbod:zfs:alertsjbod:zfs:pool:{name}Polled in the same background loop as SMART data.
5. API Endpoints
GET /api/zfs/poolsReturns all pools with full health, vdev tree, scrub status, capacity.
GET /api/zfs/pools/{name}Single pool detail with full vdev tree expanded.
GET /api/zfs/alertsActive alerts only. Filterable by
?severity=critical&pool=tank.GET /api/overview(extended)Add top-level fields:
6. Frontend Integration
Extend the existing dashboard:
zfs_alertsis non-empty — color-coded by severity, dismissable per sessionAcceptance Criteria
services/zfs.pyparseszpool status -Pandzpool list -Hpinto typed models/api/zfs/pools,/api/zfs/alertsendpoints working/api/overviewextended withzfs_healthy,zfs_alerts,zfs_poolsfields