User Tools

Site Tools


infra:operations:proxmox-backups

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
infra:operations:proxmox-backups [2026/05/23 12:56] atluxity_idp.hackeriet.noinfra:operations:proxmox-backups [2026/05/23 13:29] (current) atluxity_idp.hackeriet.no
Line 14: Line 14:
   * [[infra:hosts:host007|host007]]   * [[infra:hosts:host007|host007]]
  
-A weekly vzdump job was observed from the cluster configuration while documenting host006. The observed job used:+A weekly vzdump job was observed from the cluster configuration while documenting host006 and host007. The observed job used:
  
   * Schedule: Sunday 03:00   * Schedule: Sunday 03:00
Line 25: Line 25:
 Treat this as observed state, not as a reviewed backup policy. Treat this as observed state, not as a reviewed backup policy.
  
-===== Temporary host006 diagnostics =====+===== Host006 storage finding =====
  
-Temporary increased logging was installed on host006 on 2026-05-23 to capture host-level storage and Proxmox context around backup failures.+host006 has enough physical storage, but the layout makes local backups fragile:
  
-Installed files:+  * Physical disk observedabout 1 TB NVMe. 
 +  * LVM volume group observed: about 953G, with 0G free. 
 +  * Root filesystem observed: about 94G usable, about 94% used, about 6.2G free. 
 +  * Proxmox local storage is on the root filesystem. 
 +  * /var/lib/vz/dump is not a separate large filesystem on host006; it resolves to the crowded root filesystem. 
 +  * local-lvm is the large thin pool for VM disks, not file-based backup dumps.
  
-  * /usr/local/sbin/hackeriet-vzdump-watch +Several recent vzdump failures on host006 had errors like:
-  * /etc/systemd/system/hackeriet-vzdump-watch.service +
-  * /etc/systemd/system/hackeriet-vzdump-watch.timer +
-  * /etc/systemd/system/hackeriet-vzdump-prebackup.timer +
-  * /etc/logrotate.d/hackeriet-vzdump-watch+
  
-Log file:+  * vma_queue_writewrite error - Broken pipe
  
-  * /var/log/hackeriet/vzdump-watch.log+Disk pressure on host006 local storage is the first suspect for these failures. Investigate storage before changing guests.
  
-Timer schedule:+===== Contrasthost007 =====
  
-  * Every 5 minuteswith small randomized delay. +host007 has the same cluster backup jobbut healthier storage layout:
-  * Sunday 02:45, shortly before the observed weekly 03:00 vzdump job.+
  
-The script captures df, pvesm status, du -sh /var/lib/vz/dump, recent dump files, failed systemd units, recent kernel storage/filesystem messages, recent Proxmox service journal, and recent Proxmox task log filenamesIt does not touch guests, backup retention, or Proxmox configuration.+  * Root filesystem observed: about 18% used. 
 +  * /var/lib/vz/dump observed as a separate filesystem, about 503G total and about 331G free. 
 +  * Proxmox local storage observed around 17% used. 
 +  * local-lvm observed around 27% used.
  
-Useful commands:+This makes host006 disk-pressure failure mode much less likely on host007.
  
-  * systemctl list-timers --all hackeriet-vzdump-* --no-pager 
-  * tail -200 /var/log/hackeriet/vzdump-watch.log 
-  * systemctl stop hackeriet-vzdump-watch.timer hackeriet-vzdump-prebackup.timer 
-  * systemctl disable hackeriet-vzdump-watch.timer hackeriet-vzdump-prebackup.timer 
  
-Review and remove this temporary monitoring after the backup issue is understood.+===== Remediation options =====
  
-===== Known issuehost006 disk pressure ===== +Possible ways to reduce recurrence risk:
- +
-host006 had a nearly full root filesystem when documented. Its /var/lib/vz/dump directory was large and contained old vzdump backups and ISOs. Several recent vzdump failures on host006 had errors like: +
- +
-  * vma_queue_write: write error - Broken pipe +
- +
-Disk pressure on host006 is the first suspect for those failures. Investigate storage before changing guests. +
- +
-===== Contrast: host007 =====+
  
-host007 had healthier storage when documentedIts /var/lib/vz/dump was separate filesystem with substantial free spaceThis does not prove backups are healthy, but it makes the host006 disk-pressure failure mode less likely on host007.+  * Short term: review and remove obsolete files from /var/lib/vz/dump on host006. 
 +  * Better medium term: add dedicated backup storage for host006, either mounted at /var/lib/vz/dump or added as new Proxmox storage target. 
 +  * Longer term: use Proxmox Backup Server for clearer retention and deduplicated backups
 +  * Avoid casual in-place LVM reshaping; it can put VM disks at risk and should only be done with a maintenance window and recovery plan.
  
 ===== First checks ===== ===== First checks =====
/srv/hackeriet-wiki/dokuwiki/data/attic/infra/operations/proxmox-backups.1779541008.txt.gz · Last modified: by atluxity_idp.hackeriet.no