Replace in-memory TTL cache with Redis for SMART data caching #1

Open
opened 2026-03-07 06:34:05 +00:00 by adamksmith · 0 comments
Owner

Problem

The current in-memory TTL cache for SMART data has several limitations:

  1. Cold start penalty — every restart means a full SMART scan of all 78 drives before the first /api/overview response, which shells out to smartctl 78 times sequentially
  2. Per-worker isolation — with 2 uvicorn workers, each maintains its own cache, so the same drive can get polled twice within the TTL window
  3. No persistence — cache evaporates on container restart, no way to serve stale-while-revalidate

Proposed Solution

Replace services/cache.py with a Redis-backed cache using redis.asyncio.

Architecture

jbod-monitor container
  └── FastAPI (uvicorn)
         ├── /api/overview  ──▶  Redis GET "smart:{device}" 
         │                        ├─ HIT  → return cached JSON
         │                        └─ MISS → smartctl → Redis SET w/ TTL → return
         └── Background task  ──▶  Pre-warm loop: scan all drives every N seconds
                                    and populate Redis proactively

Redis (existing homelab instance or sidecar)

Implementation Details

1. Redis connection (services/redis.py)

import redis.asyncio as redis

_pool: redis.Redis | None = None

async def get_redis() -> redis.Redis:
    global _pool
    if _pool is None:
        _pool = redis.Redis(
            host=os.getenv("REDIS_HOST", "localhost"),
            port=int(os.getenv("REDIS_PORT", "6379")),
            db=int(os.getenv("REDIS_DB", "0")),
            decode_responses=True,
        )
    return _pool

2. Cache keys and TTLs

Key pattern Value TTL
jbod:smart:{device} JSON-encoded SMART response 120s (default, configurable via SMART_CACHE_TTL)
jbod:enclosures JSON-encoded enclosure list 300s
jbod:overview JSON-encoded full overview payload 60s
jbod:zfs_map JSON-encoded device→pool mapping 300s

3. Background pre-warm task

@app.on_event("startup")
async def start_smart_poller():
    asyncio.create_task(smart_poll_loop())

async def smart_poll_loop():
    """Continuously refresh SMART data in Redis so API reads are always cache hits."""
    while True:
        try:
            enclosures = await discover_enclosures()
            for enc in enclosures:
                for slot in enc.slots:
                    if slot.populated:
                        data = await fetch_smart(slot.device)
                        await cache_set(f"jbod:smart:{slot.device}", data, ttl=SMART_CACHE_TTL)
            # Rebuild and cache the full overview
            overview = await build_overview(enclosures)
            await cache_set("jbod:overview", overview, ttl=60)
        except Exception as e:
            logger.error(f"SMART poll loop error: {e}")
        await asyncio.sleep(SMART_POLL_INTERVAL)  # default 90s

This means the API endpoints become pure Redis reads — sub-millisecond response times regardless of drive count.

4. Graceful degradation

  • If Redis is unreachable, fall back to direct smartctl calls (log a warning)
  • If a cache key is missing for a single drive, fetch on-demand and populate
  • Add X-Cache: HIT / X-Cache: MISS response headers for debugging

5. Docker changes

Add Redis as a sidecar in docker-compose.yml (or point to existing homelab Redis):

services:
  jbod-monitor:
    # ... existing config ...
    environment:
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - REDIS_DB=0
      - SMART_CACHE_TTL=120
      - SMART_POLL_INTERVAL=90
    depends_on:
      - redis

  redis:
    image: redis:7-alpine
    container_name: jbod-redis
    restart: unless-stopped
    volumes:
      - redis-data:/data
    command: redis-server --save 60 1 --loglevel warning

volumes:
  redis-data:

6. Dependencies

Add to requirements.txt:

redis>=5.0.0

Benefits

  • Shared cache across workers — no duplicate polls
  • Instant API responses — background poller keeps Redis warm, API just reads
  • Survives restarts — Redis persistence means stale data is available immediately while poller catches up
  • Observableredis-cli KEYS "jbod:*" to inspect cache state, TTLs, etc.
  • Configurable — TTLs and poll intervals via env vars, easy to tune

Acceptance Criteria

  • Redis connection with env-based config
  • Background poller pre-warms all drive SMART data
  • /api/overview reads from Redis, responds in <50ms for 78 drives
  • Graceful fallback to direct smartctl when Redis is down
  • X-Cache response header on all /api/drives/* and /api/overview endpoints
  • Docker Compose updated with Redis sidecar
  • Existing tests pass (update cache mocks)
## Problem The current in-memory TTL cache for SMART data has several limitations: 1. **Cold start penalty** — every restart means a full SMART scan of all 78 drives before the first `/api/overview` response, which shells out to `smartctl` 78 times sequentially 2. **Per-worker isolation** — with 2 uvicorn workers, each maintains its own cache, so the same drive can get polled twice within the TTL window 3. **No persistence** — cache evaporates on container restart, no way to serve stale-while-revalidate ## Proposed Solution Replace `services/cache.py` with a Redis-backed cache using `redis.asyncio`. ### Architecture ``` jbod-monitor container └── FastAPI (uvicorn) ├── /api/overview ──▶ Redis GET "smart:{device}" │ ├─ HIT → return cached JSON │ └─ MISS → smartctl → Redis SET w/ TTL → return └── Background task ──▶ Pre-warm loop: scan all drives every N seconds and populate Redis proactively Redis (existing homelab instance or sidecar) ``` ### Implementation Details #### 1. Redis connection (`services/redis.py`) ```python import redis.asyncio as redis _pool: redis.Redis | None = None async def get_redis() -> redis.Redis: global _pool if _pool is None: _pool = redis.Redis( host=os.getenv("REDIS_HOST", "localhost"), port=int(os.getenv("REDIS_PORT", "6379")), db=int(os.getenv("REDIS_DB", "0")), decode_responses=True, ) return _pool ``` #### 2. Cache keys and TTLs | Key pattern | Value | TTL | |---|---|---| | `jbod:smart:{device}` | JSON-encoded SMART response | 120s (default, configurable via `SMART_CACHE_TTL`) | | `jbod:enclosures` | JSON-encoded enclosure list | 300s | | `jbod:overview` | JSON-encoded full overview payload | 60s | | `jbod:zfs_map` | JSON-encoded device→pool mapping | 300s | #### 3. Background pre-warm task ```python @app.on_event("startup") async def start_smart_poller(): asyncio.create_task(smart_poll_loop()) async def smart_poll_loop(): """Continuously refresh SMART data in Redis so API reads are always cache hits.""" while True: try: enclosures = await discover_enclosures() for enc in enclosures: for slot in enc.slots: if slot.populated: data = await fetch_smart(slot.device) await cache_set(f"jbod:smart:{slot.device}", data, ttl=SMART_CACHE_TTL) # Rebuild and cache the full overview overview = await build_overview(enclosures) await cache_set("jbod:overview", overview, ttl=60) except Exception as e: logger.error(f"SMART poll loop error: {e}") await asyncio.sleep(SMART_POLL_INTERVAL) # default 90s ``` This means the API endpoints become pure Redis reads — sub-millisecond response times regardless of drive count. #### 4. Graceful degradation - If Redis is unreachable, fall back to direct `smartctl` calls (log a warning) - If a cache key is missing for a single drive, fetch on-demand and populate - Add `X-Cache: HIT` / `X-Cache: MISS` response headers for debugging #### 5. Docker changes Add Redis as a sidecar in `docker-compose.yml` (or point to existing homelab Redis): ```yaml services: jbod-monitor: # ... existing config ... environment: - REDIS_HOST=redis - REDIS_PORT=6379 - REDIS_DB=0 - SMART_CACHE_TTL=120 - SMART_POLL_INTERVAL=90 depends_on: - redis redis: image: redis:7-alpine container_name: jbod-redis restart: unless-stopped volumes: - redis-data:/data command: redis-server --save 60 1 --loglevel warning volumes: redis-data: ``` #### 6. Dependencies Add to `requirements.txt`: ``` redis>=5.0.0 ``` ### Benefits - **Shared cache across workers** — no duplicate polls - **Instant API responses** — background poller keeps Redis warm, API just reads - **Survives restarts** — Redis persistence means stale data is available immediately while poller catches up - **Observable** — `redis-cli KEYS "jbod:*"` to inspect cache state, TTLs, etc. - **Configurable** — TTLs and poll intervals via env vars, easy to tune ### Acceptance Criteria - [ ] Redis connection with env-based config - [ ] Background poller pre-warms all drive SMART data - [ ] `/api/overview` reads from Redis, responds in <50ms for 78 drives - [ ] Graceful fallback to direct `smartctl` when Redis is down - [ ] `X-Cache` response header on all `/api/drives/*` and `/api/overview` endpoints - [ ] Docker Compose updated with Redis sidecar - [ ] Existing tests pass (update cache mocks)
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: adamksmith/jbod-monitor#1