User Tools

Site Tools


infra:operations:proxmox-backups

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
infra:operations:proxmox-backups [2026/05/23 11:59] – created atluxity_idp.hackeriet.noinfra:operations:proxmox-backups [2026/05/23 13:29] (current) atluxity_idp.hackeriet.no
Line 14: Line 14:
   * [[infra:hosts:host007|host007]]   * [[infra:hosts:host007|host007]]
  
-A weekly vzdump job was observed from the cluster configuration while documenting host006. The observed job used:+A weekly vzdump job was observed from the cluster configuration while documenting host006 and host007. The observed job used:
  
   * Schedule: Sunday 03:00   * Schedule: Sunday 03:00
Line 25: Line 25:
 Treat this as observed state, not as a reviewed backup policy. Treat this as observed state, not as a reviewed backup policy.
  
-===== Known issue: host006 disk pressure =====+===== Host006 storage finding =====
  
-host006 had a nearly full root filesystem when documentedIts /var/lib/vz/dump directory was large and contained old vzdump backups and ISOs. Several recent vzdump failures on host006 had errors like:+host006 has enough physical storage, but the layout makes local backups fragile: 
 + 
 +  * Physical disk observed: about 1 TB NVMe. 
 +  * LVM volume group observed: about 953G, with 0G free. 
 +  * Root filesystem observed: about 94G usable, about 94% used, about 6.2G free. 
 +  * Proxmox local storage is on the root filesystem. 
 +  * /var/lib/vz/dump is not a separate large filesystem on host006; it resolves to the crowded root filesystem. 
 +  * local-lvm is the large thin pool for VM disks, not file-based backup dumps. 
 + 
 +Several recent vzdump failures on host006 had errors like:
  
   * vma_queue_write: write error - Broken pipe   * vma_queue_write: write error - Broken pipe
  
-Disk pressure on host006 is the first suspect for those failures. Investigate storage before changing guests.+Disk pressure on host006 local storage is the first suspect for these failures. Investigate storage before changing guests.
  
 ===== Contrast: host007 ===== ===== Contrast: host007 =====
  
-host007 had healthier storage when documentedIts /var/lib/vz/dump was a separate filesystem with substantial free space. This does not prove backups are healthy, but it makes the host006 disk-pressure failure mode less likely on host007.+host007 has the same cluster backup job, but a healthier storage layout: 
 + 
 +  * Root filesystem observed: about 18% used. 
 +  * /var/lib/vz/dump observed as a separate filesystem, about 503G total and about 331G free. 
 +  * Proxmox local storage observed around 17% used. 
 +  * local-lvm observed around 27% used. 
 + 
 +This makes host006 disk-pressure failure mode much less likely on host007
 + 
 + 
 +===== Remediation options ===== 
 + 
 +Possible ways to reduce recurrence risk: 
 + 
 +  * Short term: review and remove obsolete files from /var/lib/vz/dump on host006. 
 +  * Better medium term: add dedicated backup storage for host006, either mounted at /var/lib/vz/dump or added as a new Proxmox storage target. 
 +  * Longer term: use Proxmox Backup Server for clearer retention and deduplicated backups. 
 +  * Avoid casual in-place LVM reshaping; it can put VM disks at risk and should only be done with a maintenance window and recovery plan.
  
 ===== First checks ===== ===== First checks =====
/srv/hackeriet-wiki/dokuwiki/data/attic/infra/operations/proxmox-backups.1779537550.txt.gz · Last modified: by atluxity_idp.hackeriet.no