====== Proxmox backups ====== This page documents the Proxmox backup context discovered while investigating host006. It is an operational orientation page, not a complete backup policy. ===== Scope ===== This page is about Proxmox guest backups on the Hackeriet Proxmox cluster. Do not confuse this with unrelated service-specific backups such as forum.hausmania.org backups. ===== Current cluster context ===== The Proxmox cluster documented so far is [[infra:clusters:klynge001|klynge001]], with verified nodes: * [[infra:hosts:host006|host006]] * [[infra:hosts:host007|host007]] A weekly vzdump job was observed from the cluster configuration while documenting host006 and host007. The observed job used: * Schedule: Sunday 03:00 * Mode: snapshot * Storage: local * Compression: zstd * Retention: keep last 1 * Failure mail: backupmail@hackeriet.no Treat this as observed state, not as a reviewed backup policy. ===== Host006 storage finding ===== host006 has enough physical storage, but the layout makes local backups fragile: * Physical disk observed: about 1 TB NVMe. * LVM volume group observed: about 953G, with 0G free. * Root filesystem observed: about 94G usable, about 94% used, about 6.2G free. * Proxmox local storage is on the root filesystem. * /var/lib/vz/dump is not a separate large filesystem on host006; it resolves to the crowded root filesystem. * local-lvm is the large thin pool for VM disks, not file-based backup dumps. Several recent vzdump failures on host006 had errors like: * vma_queue_write: write error - Broken pipe Disk pressure on host006 local storage is the first suspect for these failures. Investigate storage before changing guests. ===== Contrast: host007 ===== host007 has the same cluster backup job, but a healthier storage layout: * Root filesystem observed: about 18% used. * /var/lib/vz/dump observed as a separate filesystem, about 503G total and about 331G free. * Proxmox local storage observed around 17% used. * local-lvm observed around 27% used. This makes host006 disk-pressure failure mode much less likely on host007. ===== Remediation options ===== Possible ways to reduce recurrence risk: * Short term: review and remove obsolete files from /var/lib/vz/dump on host006. * Better medium term: add dedicated backup storage for host006, either mounted at /var/lib/vz/dump or added as a new Proxmox storage target. * Longer term: use Proxmox Backup Server for clearer retention and deduplicated backups. * Avoid casual in-place LVM reshaping; it can put VM disks at risk and should only be done with a maintenance window and recovery plan. ===== First checks ===== On the relevant Proxmox host: * df -h * du -sh /var/lib/vz/dump * pvesm status * systemctl --failed In Proxmox, check the backup job configuration and recent task logs before deleting files or changing retention. ===== Safety notes ===== * Do not delete backups or ISOs during an incident without understanding what they are. * Do not change guest VM state unless the incident requires it. * Do not mix up Proxmox guest backups with application-specific backup systems.