Refactor Proxmox and Process Monitor configurations for improved Joanna dispatch logic and update README with new automation references

pull/1719/head
Carlo Costanzo 1 month ago
parent ee5238ce72
commit 11d3050f23

@ -48,13 +48,13 @@ Live collection of plug-and-play Home Assistant packages. Each YAML file in this
| [mariadb_monitoring.yaml](mariadb_monitoring.yaml) | MariaDB health sensors and Lovelace dashboard snippet for recorder stats. | `sensor.mariadb_status`, `sensor.database_size` |
| [docker_infrastructure.yaml](docker_infrastructure.yaml) | Docker host patching telemetry + container/stack Repairs automation, 20-minute Joanna escalation for persistent container outages using stable configured monitor membership, and weekly scheduled prune actions across docker_10/14/17/69. | `sensor.docker_*_apt_status`, `binary_sensor.*_stack_status`, `sensor.docker_stacks_down_count`, `repairs.create`, `script.joanna_dispatch` |
| [github_watched_repo_scout.yaml](github_watched_repo_scout.yaml) | Nightly Joanna dispatch that reviews unread notifications from watched GitHub repos, recommends HA-config ideas, refreshes strong-candidate issues, and marks processed watched-repo notifications read. | `automation.github_watched_repo_scout_nightly`, `script.joanna_dispatch`, `script.send_to_logbook` |
| [proxmox.yaml](proxmox.yaml) | Proxmox runtime and disk pressure monitoring with Repairs for node degradations plus nightly Frigate reboot. | `binary_sensor.proxmox*_runtime_healthy`, `sensor.proxmox*_disk_used_percentage`, `repairs.create`, `button.qemu_docker2_101_reboot` |
| [proxmox.yaml](proxmox.yaml) | Proxmox runtime and disk pressure monitoring with Repairs + Joanna dispatch for sustained node degradations, plus nightly Frigate reboot. | `binary_sensor.proxmox*_runtime_healthy`, `sensor.proxmox*_disk_used_percentage`, `repairs.create`, `script.joanna_dispatch`, `button.qemu_docker2_101_reboot` |
| [synology_dsm.yaml](synology_dsm.yaml) | Synology DSM integration health normalization for Carlo-NAS01 and Carlo-NVR, with Repairs + Joanna dispatch on sustained integration, security, or storage problems. | `binary_sensor.carlo_*_synology_problem`, `sensor.carlo_*_synology_problem_summary`, `repairs.create`, `script.joanna_dispatch` |
| [infrastructure_observability.yaml](infrastructure_observability.yaml) | Normalized WAN/DNS/backup/domain/cert health + website uptime/latency SLO signals for Infrastructure dashboards. | `binary_sensor.infra_website_uptime_slo_breach`, `binary_sensor.infra_website_latency_degraded`, `binary_sensor.infra_*` |
| [onenote_indexer.yaml](onenote_indexer.yaml) | OneNote indexer health/status monitoring for Joanna, failure-repair automation, and a daily duplicate-delete maintenance request. | `sensor.onenote_indexer_last_job_status`, `binary_sensor.onenote_indexer_last_job_successful` |
| [mqtt_status.yaml](mqtt_status.yaml) | Command-line MQTT broker reachability probe with Spook Repairs escalation and Joanna troubleshooting dispatch on outage. | `binary_sensor.mqtt_status_raw`, `binary_sensor.mqtt_broker_problem`, `repairs.create`, `rest_command.bearclaw_command` |
| [mariadb.yaml](mariadb.yaml) | MariaDB recorder health and capacity SQL sensors. | `sensor.mariadb_status`, `sensor.database_size` |
| [processmonitor.yaml](processmonitor.yaml) | Root filesystem disk-pressure monitoring with early Joanna review at 80% and Repairs + urgent dispatch at 90%. | `sensor.disk_use_percent`, `repairs.create`, `script.joanna_dispatch`, `tts.clear_cache` |
| [processmonitor.yaml](processmonitor.yaml) | Root filesystem disk-pressure monitoring with immediate digest/logbook notes at 80%, Joanna review after 10 minutes above 80%, and delayed phone alerts only if the issue stays unresolved after dispatch. | `sensor.disk_use_percent`, `repairs.create`, `script.joanna_dispatch`, `tts.clear_cache` |
| [tugtainer_updates.yaml](tugtainer_updates.yaml) | Tugtainer container update notifications via webhook + persistent alerts, plus event-based Joanna dispatch when reports include `### Available:` (24h cooldown via `mode: single` + delay, no new helpers). | `persistent_notification.create`, `event: tugtainer_available_detected`, `script.joanna_dispatch`, `input_datetime.tugtainer_last_update` |
| [bearclaw.yaml](bearclaw.yaml) | Joanna/BearClaw bridge automations that forward Telegram commands to codex_appliance, include LLM-first routing context for freeform text, relay replies back, ingest `/api/bearclaw/status` telemetry, and expose dispatch plus QMD/memory-index sensors for Infrastructure dashboards. | `rest_command.bearclaw_*`, `sensor.bearclaw_status_telemetry`, `sensor.joanna_*`, `binary_sensor.joanna_*`, `automation.bearclaw_*`, `script.send_to_logbook` |
| [telegram_bot.yaml](telegram_bot.yaml) | Legacy Telegram transport marker for BearClaw; the shared `joanna_send_telegram` helper now forwards through the codex_appliance direct Telegram API. | `rest_command.bearclaw_telegram_send`, `script.joanna_send_telegram` |

@ -8,14 +8,15 @@
# -------------------------------------------------------------------
# - Blog: https://www.vcloudinfo.com/2026/04/joanna-agent-engineer-home-assistant-infrastructure-dispatch.html
# Notes: Uses `sensor.disk_use_percent` for the root (`/`) filesystem.
# Notes: 80% usage triggers cleanup-oriented notification + Joanna review.
# Notes: 80% usage writes an immediate activity note; Joanna reviews only after 10 minutes above threshold.
# Notes: Phone alerts happen only after Joanna dispatch and a short unresolved grace period.
# Notes: 90% usage opens a Repairs issue and dispatches Joanna for urgent triage.
######################################################################
automation:
- alias: "Self Heal Disk Use Alarm"
id: b16f2155-4688-4c0f-9cf8-b382e294a029
description: "Warn on elevated root disk usage and request Joanna review before it becomes critical."
description: "Log elevated root disk usage immediately so transient pressure shows up in the digest."
mode: single
trigger:
- platform: numeric_state
@ -24,36 +25,65 @@ automation:
variables:
mount_path: "/"
disk_use: "{{ states('sensor.disk_use_percent') | float(0) | round(1) }}"
trigger_context: "HA automation b16f2155-4688-4c0f-9cf8-b382e294a029 (Self Heal Disk Use Alarm)"
action:
- service: script.notify_engine
data:
value1: "Hard Drive Monitor:"
value2: "Your harddrive is running out of Space! {{ mount_path }}:{{ disk_use }}%!"
value3: "Attempting to clean"
who: "carlo"
- service: script.send_to_logbook
data:
topic: "SYSTEM"
message: "Disk usage exceeded 80% ({{ mount_path }}: {{ disk_use }}%). Attempting to clean."
message: "Disk usage exceeded 80% ({{ mount_path }}: {{ disk_use }}%). Monitoring for sustained pressure."
- service: tts.clear_cache
- condition: template
value_template: "{{ disk_use | float(0) < 90 }}"
- alias: "Self Heal Disk Use Joanna Review"
id: processmonitor_disk_use_joanna_review
description: "Dispatch Joanna when elevated root disk usage remains above 80% for 10 minutes."
mode: single
trigger:
- platform: numeric_state
entity_id: sensor.disk_use_percent
above: 80
for:
minutes: 10
variables:
mount_path: "/"
disk_use: "{{ states('sensor.disk_use_percent') | float(0) | round(1) }}"
trigger_context: "HA automation processmonitor_disk_use_joanna_review (Self Heal Disk Use Joanna Review)"
condition:
- condition: numeric_state
entity_id: sensor.disk_use_percent
below: 90
action:
- service: script.joanna_dispatch
data:
trigger_context: "{{ trigger_context }}"
source: "home_assistant_automation.self_heal_disk_use_alarm"
summary: "Home Assistant root disk usage exceeded 80%"
source: "home_assistant_automation.processmonitor_disk_use_joanna_review"
summary: "Home Assistant root disk usage remained above 80% for 10 minutes"
entity_ids:
- "sensor.disk_use_percent"
diagnostics: >-
mount_path={{ mount_path }},
disk_use={{ disk_use }},
threshold=80
threshold=80,
sustained_for=10m
request: >-
Review Home Assistant disk growth and recommend safe cleanup actions.
Check recorder/database size, logs, cache, backups, and temporary files.
Do not restart Home Assistant or remove data unless explicitly requested.
- service: script.send_to_logbook
data:
topic: "SYSTEM"
message: >-
Disk usage remained above 80% for 10 minutes ({{ mount_path }}: {{ disk_use }}%).
Joanna review requested.
- delay: "00:05:00"
- condition: numeric_state
entity_id: sensor.disk_use_percent
above: 80
below: 90
- service: script.notify_engine
data:
value1: "Hard Drive Monitor:"
value2: "Joanna is reviewing sustained Home Assistant disk usage at {{ mount_path }}:{{ states('sensor.disk_use_percent') | float(0) | round(1) }}%."
value3: "No phone alert was sent until the issue stayed unresolved."
who: "carlo"
- alias: "Disk Use Alarm"
id: 1ce3cb43-0e27-4c53-acdd-d672396f3559
@ -69,17 +99,6 @@ automation:
disk_use: "{{ states('sensor.disk_use_percent') | float(0) | round(1) }}"
trigger_context: "HA automation 1ce3cb43-0e27-4c53-acdd-d672396f3559 (Disk Use Alarm)"
action:
- service: script.notify_engine
data:
value1: "Hard Drive Monitor:"
value2: "Your harddrive is running out of Space! {{ mount_path }}:{{ disk_use }}%!"
who: "carlo"
- service: script.send_to_logbook
data:
topic: "SYSTEM"
message: >-
Disk usage exceeded 90% ({{ mount_path }}: {{ disk_use }}%).
Repair {{ issue_id }} opened and Joanna investigation requested.
- service: repairs.create
data:
issue_id: "{{ issue_id }}"
@ -107,6 +126,22 @@ automation:
Investigate critical Home Assistant disk usage and recommend or perform safe remediation if available.
Check recorder/database size, logs, cache, backups, and temporary files first.
Do not restart Home Assistant or prune/delete data unless explicitly requested.
- service: script.send_to_logbook
data:
topic: "SYSTEM"
message: >-
Disk usage exceeded 90% ({{ mount_path }}: {{ disk_use }}%).
Repair {{ issue_id }} opened and Joanna investigation requested.
- delay: "00:05:00"
- condition: numeric_state
entity_id: sensor.disk_use_percent
above: 90
- service: script.notify_engine
data:
value1: "Hard Drive Monitor:"
value2: "Critical Home Assistant disk usage is still active at {{ mount_path }}:{{ states('sensor.disk_use_percent') | float(0) | round(1) }}%."
value3: "Joanna has already been dispatched to investigate."
who: "carlo"
- alias: "Disk Use Alarm Recovery"
id: processmonitor_disk_use_alarm_recovery

@ -3,12 +3,13 @@
# For more info visit https://www.vcloudinfo.com/click-here
# Original Repo : https://github.com/CCOSTAN/Home-AssistantConfig
# -------------------------------------------------------------------
# Proxmox Host Automations - reboots and update alerts
# Nightly Frigate host reboot plus update repair issues.
# Proxmox Host Automations - reboots, repairs, and Joanna dispatch
# Nightly Frigate host reboot plus update/runtime/disk health automations.
# -------------------------------------------------------------------
# Related Issue: 1584
# Notes: Creates HA repair issues when proxmox nodes report updates.
# Notes: Adds normalized runtime + disk health signals for dashboard/alerts.
# Notes: Joanna dispatch is reserved for sustained runtime and disk-pressure degradations.
######################################################################
template:
- sensor:
@ -148,6 +149,28 @@ automation:
{% else %}
proxmox02_runtime_unhealthy
{% endif %}
runtime_entity: >-
{% if 'proxmox1' in trigger.entity_id %}
binary_sensor.proxmox1_runtime_healthy
{% else %}
binary_sensor.proxmox02_runtime_healthy
{% endif %}
status_entity: >-
{% if 'proxmox1' in trigger.entity_id %}
{% if states('binary_sensor.node_proxmox1_status') not in ['unknown', 'unavailable', 'none', ''] %}
binary_sensor.node_proxmox1_status
{% else %}
sensor.node_proxmox1_status
{% endif %}
{% else %}
{% if states('binary_sensor.node_proxmox02_status') not in ['unknown', 'unavailable', 'none', ''] %}
binary_sensor.node_proxmox02_status
{% else %}
sensor.node_proxmox02_status
{% endif %}
{% endif %}
status_value: "{{ states(status_entity) }}"
trigger_context: "HA automation proxmox_runtime_repairs (Proxmox Runtime Repair Issues)"
action:
- choose:
- conditions: "{{ trigger.to_state.state == 'off' }}"
@ -164,10 +187,30 @@ automation:
description: >
{{ node_name }} has remained offline for over 2 minutes.
Check node status in Proxmox and restore runtime.
- service: script.joanna_dispatch
data:
trigger_context: "{{ trigger_context }}"
source: "home_assistant_automation.proxmox_runtime_repairs"
summary: "{{ node_name }} runtime has remained degraded for over 2 minutes"
entity_ids:
- "{{ runtime_entity }}"
- "{{ status_entity }}"
diagnostics: >-
issue_id={{ issue_id }},
node_name={{ node_name }},
runtime_entity={{ runtime_entity }},
status_entity={{ status_entity }},
status_value={{ status_value }},
unhealthy_for=2m
request: >-
Investigate {{ node_name }} runtime degradation and restore node availability if possible.
Check host status, cluster connectivity, storage reachability, and recent update activity first.
Do not reboot the host unless explicitly requested.
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: "{{ node_name }} runtime is degraded."
message: >-
{{ node_name }} runtime is degraded. Repair {{ issue_id }} opened and Joanna investigation requested.
default:
- service: repairs.remove
continue_on_error: true
@ -188,11 +231,26 @@ automation:
- sensor.proxmox1_disk_used_percentage
- sensor.proxmox02_disk_used_percentage
above: 85
below: 92
for: "00:15:00"
id: warning
- platform: numeric_state
entity_id:
- sensor.proxmox1_disk_used_percentage
- sensor.proxmox02_disk_used_percentage
above: 92
id: critical
- platform: state
entity_id:
- sensor.proxmox1_disk_used_percentage
- sensor.proxmox02_disk_used_percentage
id: band_change
- platform: numeric_state
entity_id:
- sensor.proxmox1_disk_used_percentage
- sensor.proxmox02_disk_used_percentage
below: 85
id: recovered
variables:
node_name: >-
{% if 'proxmox1' in trigger.entity_id %}Proxmox1{% else %}Proxmox02{% endif %}
@ -202,10 +260,33 @@ automation:
{% else %}
proxmox02_disk_pressure
{% endif %}
disk_pct: "{{ states(trigger.entity_id) | float(0) }}"
disk_entity: "{{ trigger.entity_id }}"
raw_disk_entity: >-
{% if 'proxmox1' in trigger.entity_id %}
sensor.node_proxmox1_disk_used_percentage
{% else %}
sensor.node_proxmox02_disk_used_percentage
{% endif %}
disk_pct: "{{ states(disk_entity) | float(0) }}"
previous_disk_pct: >-
{% if trigger.from_state is not none and trigger.from_state.state not in ['unknown', 'unavailable', 'none', ''] %}
{{ trigger.from_state.state | float(0) }}
{% else %}
0
{% endif %}
previous_band: >-
{% if previous_disk_pct >= 92 %}
critical
{% elif previous_disk_pct >= 85 %}
warning
{% else %}
normal
{% endif %}
action:
- choose:
- conditions: "{{ disk_pct >= 92 }}"
- conditions:
- condition: trigger
id: critical
sequence:
- service: repairs.create
data:
@ -216,11 +297,36 @@ automation:
description: >
{{ node_name }} disk usage is critically high.
Free disk space or expand storage allocation.
- service: script.joanna_dispatch
data:
trigger_context: "HA automation proxmox_disk_pressure_repairs (Proxmox Disk Pressure Repair Issues - Critical)"
source: "home_assistant_automation.proxmox_disk_pressure_repairs.critical"
summary: "{{ node_name }} disk pressure is critical at {{ disk_pct | round(1) }}%"
entity_ids:
- "{{ disk_entity }}"
- "{{ raw_disk_entity }}"
diagnostics: >-
issue_id={{ issue_id }},
node_name={{ node_name }},
disk_entity={{ disk_entity }},
raw_disk_entity={{ raw_disk_entity }},
disk_pct={{ disk_pct | round(1) }},
threshold=92
request: >-
Investigate critical disk pressure on {{ node_name }} and recommend safe remediation.
Check local storage usage, backups, logs, snapshots, and VM or container disk consumers first.
Do not delete VM disks or reboot the host unless explicitly requested.
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: "{{ node_name }} disk usage is critical at {{ disk_pct | round(1) }}%."
- conditions: "{{ disk_pct >= 85 }}"
message: >-
{{ node_name }} disk usage is critical at {{ disk_pct | round(1) }}%.
Repair {{ issue_id }} opened and Joanna investigation requested.
- conditions:
- condition: trigger
id: warning
- condition: template
value_template: "{{ previous_band != 'critical' }}"
sequence:
- service: repairs.create
data:
@ -231,12 +337,52 @@ automation:
description: >
{{ node_name }} disk usage has stayed above 85% for 15 minutes.
Plan cleanup before capacity reaches critical levels.
- service: script.joanna_dispatch
data:
trigger_context: "HA automation proxmox_disk_pressure_repairs (Proxmox Disk Pressure Repair Issues - Warning)"
source: "home_assistant_automation.proxmox_disk_pressure_repairs.warning"
summary: "{{ node_name }} disk pressure warning at {{ disk_pct | round(1) }}%"
entity_ids:
- "{{ disk_entity }}"
- "{{ raw_disk_entity }}"
diagnostics: >-
issue_id={{ issue_id }},
node_name={{ node_name }},
disk_entity={{ disk_entity }},
raw_disk_entity={{ raw_disk_entity }},
disk_pct={{ disk_pct | round(1) }},
threshold=85,
sustained_for=15m
request: >-
Investigate elevated disk usage on {{ node_name }} and recommend safe cleanup actions before it becomes critical.
Check local storage usage, backups, logs, snapshots, and VM or container disk consumers first.
Do not delete VM disks or reboot the host unless explicitly requested.
- service: script.send_to_logbook
data:
topic: "PROXMOX"
message: "{{ node_name }} disk usage warning at {{ disk_pct | round(1) }}%."
default:
- service: repairs.remove
continue_on_error: true
data:
issue_id: "{{ issue_id }}"
message: >-
{{ node_name }} disk usage warning at {{ disk_pct | round(1) }}%.
Repair {{ issue_id }} opened and Joanna investigation requested.
- conditions:
- condition: trigger
id: band_change
- condition: template
value_template: "{{ previous_band == 'critical' and disk_pct >= 85 and disk_pct < 92 }}"
sequence:
- service: repairs.create
data:
issue_id: "{{ issue_id }}"
severity: warning
persistent: true
title: "{{ node_name }} disk pressure warning ({{ disk_pct | round(1) }}%)"
description: >
{{ node_name }} disk usage is elevated but no longer critical.
Plan cleanup before capacity reaches critical levels again.
- conditions:
- condition: trigger
id: recovered
sequence:
- service: repairs.remove
continue_on_error: true
data:
issue_id: "{{ issue_id }}"

@ -29,8 +29,7 @@ template:
'binary_sensor.carlo_nas01_drive_3_below_min_remaining_life',
'binary_sensor.carlo_nas01_drive_1_exceeded_max_bad_sectors',
'binary_sensor.carlo_nas01_drive_2_exceeded_max_bad_sectors',
'binary_sensor.carlo_nas01_drive_3_exceeded_max_bad_sectors',
'update.carlo_nas01_dsm_update'
'binary_sensor.carlo_nas01_drive_3_exceeded_max_bad_sectors'
] %}
{% set ns = namespace(problem=false) %}
{% for id in ids %}
@ -86,8 +85,7 @@ template:
'binary_sensor.carlo_nvr_drive_1_below_min_remaining_life',
'binary_sensor.carlo_nvr_drive_2_below_min_remaining_life',
'binary_sensor.carlo_nvr_drive_1_exceeded_max_bad_sectors',
'binary_sensor.carlo_nvr_drive_2_exceeded_max_bad_sectors',
'update.carlo_nvr_dsm_update'
'binary_sensor.carlo_nvr_drive_2_exceeded_max_bad_sectors'
] %}
{% set ns = namespace(problem=false) %}
{% for id in ids %}
@ -422,13 +420,6 @@ automation:
dsm_update: {{ dsm_update_state }}
ssh_alias: {{ ssh_alias }}
dsm_url: {{ dsm_url }}
- service: script.send_to_logbook
data:
topic: "SYNOLOGY"
message: >-
{{ host_name }} reported a Synology DSM problem for 10 minutes.
Repair {{ issue_id }} opened and Joanna investigation requested.
Summary: {{ problem_summary }}.
- service: script.joanna_dispatch
data:
trigger_context: "{{ trigger_context }}"
@ -450,6 +441,13 @@ automation:
Investigate {{ host_name }} using the Home Assistant Synology DSM entities first, then DSM or SSH if needed.
Review security status, drive health, volume health, and integration availability.
Do not reboot or shut down the NAS unless explicitly requested.
- service: script.send_to_logbook
data:
topic: "SYNOLOGY"
message: >-
{{ host_name }} reported a Synology DSM problem for 10 minutes.
Repair {{ issue_id }} opened and Joanna investigation requested.
Summary: {{ problem_summary }}.
- id: synology_dsm_clear_repair_on_recovery
alias: "Synology DSM - Clear Repair On Recovery"

@ -60,9 +60,13 @@ Current automations that kick off automated resolutions (via `script.joanna_disp
| `infra_backup_nightly_verification` | Infrastructure - Backup Nightly Verification | [../packages/infrastructure_observability.yaml](../packages/infrastructure_observability.yaml) |
| `docker_state_sync_repairs_dynamic` | Docker State Sync - Repairs (Dynamic) | [../packages/docker_infrastructure.yaml](../packages/docker_infrastructure.yaml) |
| `docker_group_reconcile_weekly_joanna_review` | Docker Group Reconcile - Weekly Joanna Review | [../packages/docker_infrastructure.yaml](../packages/docker_infrastructure.yaml) |
| `tugtainer_dispatch_joanna_for_available_updates` | Tugtainer - Dispatch Joanna For Available Updates | [../packages/tugtainer_updates.yaml](../packages/tugtainer_updates.yaml) |
| `tugtainer_dispatch_joanna_for_home_assistant_core_digest` | Tugtainer - Dispatch Joanna For Home Assistant Core Digest | [../packages/tugtainer_updates.yaml](../packages/tugtainer_updates.yaml) |
| `unifi_ap_no_clients_repair_combined` | Unifi AP Create Repair Issue after 5m of 0 Clients | [../packages/wireless.yaml](../packages/wireless.yaml) |
| `proxmox_runtime_repairs` | Proxmox Runtime Repair Issues | [../packages/proxmox.yaml](../packages/proxmox.yaml) |
| `proxmox_disk_pressure_repairs` | Proxmox Disk Pressure Repair Issues | [../packages/proxmox.yaml](../packages/proxmox.yaml) |
| `synology_dsm_open_repair_and_dispatch` | Synology DSM - Open Repair And Dispatch | [../packages/synology_dsm.yaml](../packages/synology_dsm.yaml) |
| `b16f2155-4688-4c0f-9cf8-b382e294a029` | Self Heal Disk Use Alarm | [../packages/processmonitor.yaml](../packages/processmonitor.yaml) |
| `processmonitor_disk_use_joanna_review` | Self Heal Disk Use Joanna Review | [../packages/processmonitor.yaml](../packages/processmonitor.yaml) |
| `1ce3cb43-0e27-4c53-acdd-d672396f3559` | Disk Use Alarm | [../packages/processmonitor.yaml](../packages/processmonitor.yaml) |
### Tips

Loading…
Cancel
Save

Powered by TurnKey Linux.