infra:operations:proxmox-backups
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| infra:operations:proxmox-backups [2026/05/23 12:56] – atluxity_idp.hackeriet.no | infra:operations:proxmox-backups [2026/05/23 13:29] (current) – atluxity_idp.hackeriet.no | ||
|---|---|---|---|
| Line 14: | Line 14: | ||
| * [[infra: | * [[infra: | ||
| - | A weekly vzdump job was observed from the cluster configuration while documenting host006. The observed job used: | + | A weekly vzdump job was observed from the cluster configuration while documenting host006 |
| * Schedule: Sunday 03:00 | * Schedule: Sunday 03:00 | ||
| Line 25: | Line 25: | ||
| Treat this as observed state, not as a reviewed backup policy. | Treat this as observed state, not as a reviewed backup policy. | ||
| - | ===== Temporary host006 diagnostics | + | ===== Host006 storage finding |
| - | Temporary increased logging was installed on host006 | + | host006 |
| - | Installed files: | + | * Physical disk observed: about 1 TB NVMe. |
| + | * LVM volume group observed: about 953G, with 0G free. | ||
| + | * Root filesystem observed: about 94G usable, about 94% used, about 6.2G free. | ||
| + | * Proxmox local storage is on the root filesystem. | ||
| + | * / | ||
| + | * local-lvm is the large thin pool for VM disks, not file-based backup dumps. | ||
| - | * / | + | Several recent |
| - | * / | + | |
| - | * / | + | |
| - | * / | + | |
| - | * / | + | |
| - | Log file: | + | * vma_queue_write: write error - Broken pipe |
| - | * / | + | Disk pressure on host006 local storage is the first suspect for these failures. Investigate storage before changing guests. |
| - | Timer schedule: | + | ===== Contrast: host007 ===== |
| - | * Every 5 minutes, with a small randomized delay. | + | host007 has the same cluster backup job, but a healthier storage layout: |
| - | * Sunday 02:45, shortly before the observed weekly 03:00 vzdump job. | + | |
| - | The script captures df, pvesm status, du -sh / | + | * Root filesystem observed: about 18% used. |
| + | * / | ||
| + | * Proxmox | ||
| + | * local-lvm observed around 27% used. | ||
| - | Useful commands: | + | This makes host006 disk-pressure failure mode much less likely on host007. |
| - | * systemctl list-timers --all hackeriet-vzdump-* --no-pager | ||
| - | * tail -200 / | ||
| - | * systemctl stop hackeriet-vzdump-watch.timer hackeriet-vzdump-prebackup.timer | ||
| - | * systemctl disable hackeriet-vzdump-watch.timer hackeriet-vzdump-prebackup.timer | ||
| - | Review and remove this temporary monitoring after the backup issue is understood. | + | ===== Remediation options ===== |
| - | ===== Known issue: host006 disk pressure ===== | + | Possible ways to reduce recurrence risk: |
| - | + | ||
| - | host006 had a nearly full root filesystem when documented. Its / | + | |
| - | + | ||
| - | * vma_queue_write: | + | |
| - | + | ||
| - | Disk pressure on host006 is the first suspect for those failures. Investigate storage before changing guests. | + | |
| - | + | ||
| - | ===== Contrast: host007 ===== | + | |
| - | host007 had healthier storage when documented. Its / | + | * Short term: review and remove obsolete files from / |
| + | * Better medium term: add dedicated backup storage for host006, either mounted at / | ||
| + | * Longer term: use Proxmox Backup Server for clearer retention and deduplicated | ||
| + | * Avoid casual in-place LVM reshaping; it can put VM disks at risk and should only be done with a maintenance window and recovery plan. | ||
| ===== First checks ===== | ===== First checks ===== | ||
/srv/hackeriet-wiki/dokuwiki/data/attic/infra/operations/proxmox-backups.1779541008.txt.gz · Last modified: by atluxity_idp.hackeriet.no