Proxmox maintenance
This is a runbook for planned maintenance on the Hackeriet Proxmox hosts in klynge001. It is documentation and procedure, not inventory. Use NetBox for canonical device, IP, cabling, and VM placement data.
Current scope
Planned hosts:
Current goals:
Bring host006 and host007 up to date.
Review failed services and storage health.
Keep the cluster healthy while working one host at a time.
Avoid guest-level changes unless needed for recovery.
Announcement draft
Planned Proxmox maintenance for Hackeriet
I plan to do maintenance on the Proxmox hosts host006 and host007 in the klynge001 cluster one of the next days.
Scope:
Expected impact:
Some VMs and services may be briefly unavailable.
I will avoid guest-level changes unless needed for recovery.
I will work on one host at a time and check cluster health between steps.
DNS and service risk
Live DNS checks on 2026-05-23 showed that hackeriet.no has two authoritative nameservers:
ns0.hackeriet.no - Hackeriet hosted, resolves to blade at 185.35.202.202 and 2a02:ed06::202
ns.hyp.net - external nameserver, resolves to 194.63.248.53 and 2a01:5b40:0:248::53
Both authoritative nameservers served the same SOA serial when checked. DNS resolution should survive a short outage of ns0 because ns.hyp.net is external and synced. Do not treat this as service redundancy.
Important service dependencies observed:
-
wiki.hackeriet.no points to blade.
hackeriet.no MX points to blade.
ns0.hackeriet.no points to blade.
ip.hackeriet.no and nms.hackeriet.no point through ingress.
blade is currently documented as a VM on host007.
ingress is VM 510 on host006.
idp1 was observed on host007.
Maintenance implications:
Rebooting host007 can affect blade, public web, wiki, mail target, ns0, and likely IDP.
Rebooting host006 can affect ingress-routed services such as NetBox and LibreNMS.
Avoid
DNS zone edits during the maintenance window.
Keep this runbook available locally before starting, because wiki and NetBox may be affected.
Certificate automation for internal Proxmox hostnames is documented at Proxmox ACME DNS automation.
Pre-maintenance checks
Run on both host006 and host007 before making changes:
On host006, also check local storage pressure:
Before rebooting anything, check DNS redundancy:
The SOA serial should match.
Maintenance procedure
Work one host at a time. Do not reboot both host006 and host007 at once.
Suggested order:
Start with host006 if the main concern is storage and backup health.
Start with host007 if host006-hosted ingress services must stay stable first.
For each host:
Confirm cluster state with pvecm status.
Confirm storage state with pvesm status and df -h.
Review failed units with systemctl –failed –no-pager.
Run apt update.
Review apt list –upgradable.
Apply updates only after reviewing the package set.
Reboot only if required or clearly useful.
After reboot, wait for the node to return and confirm cluster health before touching the next host.
Suggested update commands, after review:
Do not change guest VM configuration as part of host maintenance unless needed for recovery.
Post-host checks
After each host update or reboot:
Check DNS and key service names:
Check actual services, not only DNS, when the relevant host has been touched.
Host006 notes
host006 has about 1 TB physical storage, but Proxmox local storage is on the root filesystem. The root filesystem was previously close to full, and local backups under /var/lib/vz/dump were the main pressure point.
Known cleanup/remediation context is documented at Proxmox backups.
During maintenance, avoid casual LVM reshaping. It can put VM disks at risk and should only be done with a maintenance window and recovery plan.
Safety notes
There is no plan, and we should avoid, to touch guest VMs unless required for recovery.
Do not change
DNS during the maintenance window unless
DNS itself is the incident.
There is no plan, and we should avoid, delete backups or ISOs without understanding what they are.
Keep notes locally while working; wiki, NetBox, IDP, and public services may be affected depending on which host is down.