Table of Contents

Proxmox maintenance

This page documents maintenance for the Hackeriet Proxmox hosts in klynge001. It is a runbook and maintenance log.

Current scope

Hosts currently covered by this procedure:

Last maintenance: 2026-05-31

Scope: host006 and host007 in the klynge001 Proxmox cluster.

Actions performed:

Final state after maintenance:

Follow-up actions completed after the maintenance:

What we learned

Maintenance procedure

Work one host at a time. Do not reboot both host006 and host007 at once.

Before making changes on either host:

hostname -f
pveversion -v
uname -r
pvecm status
systemctl --failed --no-pager
pvesm status
df -h -x tmpfs -x devtmpfs
qm list
cat /etc/pve/jobs.cfg
apt update
apt list --upgradable
test -f /var/run/reboot-required && cat /var/run/reboot-required || true

For host006, also check local storage pressure:

du -sh /var/lib/vz/dump /var/lib/vz/template/iso /var/log /var/cache/apt /root/proxmox-templates /var/lib/fail2ban

Update flow for each host:

  1. Confirm cluster health with pvecm status.
  2. Confirm storage health with pvesm status and df -h.
  3. Review running guests with qm list.
  4. Review failed units with systemctl –failed –no-pager.
  5. Simulate package changes if the update set is large or risky.
  6. Apply updates only after reviewing the package set.
  7. Reboot only if required or clearly useful, such as after a kernel update.
  8. After reboot, wait for the node to return and confirm cluster health before touching the next host.

Suggested commands after review:

apt-get -s full-upgrade
apt full-upgrade

Quorum during reboot

When one node is rebooted, the remaining node may temporarily lose quorum. If that happens during planned maintenance, set expected votes to 1 on the remaining node:

pvecm expected 1
pvecm status

After the rebooted node rejoins, confirm the cluster has returned to two nodes and expected votes 2:

pvecm status

Do not use this as an incident workaround without understanding which node has the correct cluster state.

Post-host checks

After each host update or reboot:

hostname -f
pveversion
uname -r
pvecm status
systemctl --failed --no-pager
pvesm status
df -h -x tmpfs -x devtmpfs
qm list
apt list --upgradable

Check guests affected by the touched host:

qm status <vmid>
qm agent <vmid> ping
qm guest cmd <vmid> network-get-interfaces
qm config <vmid> | sed -n '/^ipconfig/p;/^net/p'
ping -c 2 <ip>
nc -vz -w3 <ip> 22

Interpretation:

Monitoring / LibreNMS

As of 2026-06-01, host006 and host007 are monitored in LibreNMS as Proxmox hypervisors. This replaces the old Munin host monitoring for these nodes.

LibreNMS records:

Host-side setup:

extend proxmox /usr/bin/sudo /usr/local/libexec/librenms-proxmox

Firewall setup:

IN ACCEPT -source 10.10.50.51/32 -dest +klynge001 -p udp -dport 161 -log nolog # LibreNMS SNMP polling from app-01

LibreNMS setup:

Useful verification commands:

# On each Proxmox host
systemctl is-active snmpd
ss -lunp | grep ':161'
sudo -u Debian-snmp sudo /usr/local/libexec/librenms-proxmox

# From the LibreNMS container on app-01
snmpwalk -v3 -l authPriv -u librenms_klynge001 -a SHA -A '<auth password>' -x AES -X '<privacy password>' host006.hackeriet.no SNMPv2-MIB::sysName.0
snmpget -v3 -l authPriv -u librenms_klynge001 -a SHA -A '<auth password>' -x AES -X '<privacy password>' -Oqv host006.hackeriet.no .1.3.6.1.4.1.8072.1.3.2.3.1.2.7.112.114.111.120.109.111.120
lnms device:poll -m applications host006.hackeriet.no

Follow-up items

Other notes