User Tools

Site Tools


infra:operations:proxmox-backups

This is an old revision of the document!


Proxmox backups

This page documents the Proxmox backup context discovered while investigating host006. It is an operational orientation page, not a complete backup policy.

Scope

This page is about Proxmox guest backups on the Hackeriet Proxmox cluster. Do not confuse this with unrelated service-specific backups such as forum.hausmania.org backups.

Current cluster context

The Proxmox cluster documented so far is klynge001, with verified nodes:

A weekly vzdump job was observed from the cluster configuration while documenting host006. The observed job used:

  • Schedule: Sunday 03:00
  • Mode: snapshot
  • Storage: local
  • Compression: zstd
  • Retention: keep last 1
  • Failure mail: backupmail@hackeriet.no

Treat this as observed state, not as a reviewed backup policy.

Temporary host006 diagnostics

Temporary increased logging was installed on host006 on 2026-05-23 to capture host-level storage and Proxmox context around backup failures.

Installed files:

  • /usr/local/sbin/hackeriet-vzdump-watch
  • /etc/systemd/system/hackeriet-vzdump-watch.service
  • /etc/systemd/system/hackeriet-vzdump-watch.timer
  • /etc/systemd/system/hackeriet-vzdump-prebackup.timer
  • /etc/logrotate.d/hackeriet-vzdump-watch

Log file:

  • /var/log/hackeriet/vzdump-watch.log

Timer schedule:

  • Every 5 minutes, with a small randomized delay.
  • Sunday 02:45, shortly before the observed weekly 03:00 vzdump job.

The script captures df, pvesm status, du -sh /var/lib/vz/dump, recent dump files, failed systemd units, recent kernel storage/filesystem messages, recent Proxmox service journal, and recent Proxmox task log filenames. It does not touch guests, backup retention, or Proxmox configuration.

Useful commands:

  • systemctl list-timers –all hackeriet-vzdump-* –no-pager
  • tail -200 /var/log/hackeriet/vzdump-watch.log
  • systemctl stop hackeriet-vzdump-watch.timer hackeriet-vzdump-prebackup.timer
  • systemctl disable hackeriet-vzdump-watch.timer hackeriet-vzdump-prebackup.timer

Review and remove this temporary monitoring after the backup issue is understood.

Known issue: host006 disk pressure

host006 had a nearly full root filesystem when documented. Its /var/lib/vz/dump directory was large and contained old vzdump backups and ISOs. Several recent vzdump failures on host006 had errors like:

  • vma_queue_write: write error - Broken pipe

Disk pressure on host006 is the first suspect for those failures. Investigate storage before changing guests.

Contrast: host007

host007 had healthier storage when documented. Its /var/lib/vz/dump was a separate filesystem with substantial free space. This does not prove backups are healthy, but it makes the host006 disk-pressure failure mode less likely on host007.

First checks

On the relevant Proxmox host:

  • df -h
  • du -sh /var/lib/vz/dump
  • pvesm status
  • systemctl –failed

In Proxmox, check the backup job configuration and recent task logs before deleting files or changing retention.

Safety notes

  • Do not delete backups or ISOs during an incident without understanding what they are.
  • Do not change guest VM state unless the incident requires it.
  • Do not mix up Proxmox guest backups with application-specific backup systems.
/srv/hackeriet-wiki/dokuwiki/data/attic/infra/operations/proxmox-backups.1779541008.txt.gz · Last modified: by atluxity_idp.hackeriet.no