Operational Verbs — Design Contract¶
Tracking: issue #376.
Source specs: reference/IMPLEMENTATION-ROADMAP.md
§5.5 and
reference/PROXMOX-AUTOMATION-VALID-TO-IMPLEMENT.md
§4.7.
This document pins the contract for the operational-verbs carve-out before any code lands. Once approved it is the single source of truth for sub-PRs B–G. Any later deviation requires a new design-doc PR; verb PRs that deviate without amending this document must be rejected on review.
1. Scope¶
Four explicit, operator-initiated, idempotent Proxmox-mutating verbs
exposed as REST endpoints on proxbox-api and surfaced as per-VM
action buttons in the NetBox plugin UI via template_extensions:
| Verb | QEMU endpoint | LXC sibling |
|---|---|---|
start |
POST /proxmox/qemu/{vmid}/start |
POST /proxmox/lxc/{vmid}/start |
stop |
POST /proxmox/qemu/{vmid}/stop |
POST /proxmox/lxc/{vmid}/stop |
snapshot |
POST /proxmox/qemu/{vmid}/snapshot |
POST /proxmox/lxc/{vmid}/snapshot |
migrate |
POST /proxmox/qemu/{vmid}/migrate |
POST /proxmox/lxc/{vmid}/migrate |
Out of scope for this carve-out: clone, destroy, resize, console attach, template conversion, backup-now. They follow the same shape and can be added later under the same contract; they are not part of the v0.0.20 milestone.
Also out of scope: importing the upstream netbox-proxmox-automation
event-rule + webhook architecture. §4.7 and §5.7/§5.8 of the source
docs reject that shape; this design does not revisit that decision.
2. Trust boundary and gating¶
The plugin's existing stance is observer: Proxmox is the source of truth, NetBox mirrors it. Operational verbs are the first feature that mutates Proxmox state from NetBox. The trust boundary is widened explicitly and per-endpoint, never globally.
2.1 Gate field¶
A new boolean column on ProxmoxEndpoint:
# netbox_proxbox/models/proxmox_endpoint.py
allow_writes = models.BooleanField(
default=False,
verbose_name=_("Allow Proxmox-side writes"),
help_text=_(
"When enabled, operational verbs (start, stop, snapshot, migrate) "
"may be dispatched against this endpoint. Default off. Enabling "
"this widens the trust boundary; restrict the new "
"core.run_proxmox_action permission to a small operator group."
),
)
Migration ships as a single IF NOT EXISTS column add. Per CLAUDE.md,
the canonical shape is SeparateDatabaseAndState with
ADD COLUMN IF NOT EXISTS — see
netbox_proxbox/migrations/0037_pluginsettings_runtime_tunables.py
for the template.
2.2 Why per-endpoint, not global¶
ProxboxPluginSettings.operational_verbs_enabled (the shape proposed
in the original #376 body) would unlock every cluster at once. With
per-endpoint gating, an operator can:
- Enable writes on a lab cluster while production stays read-only.
- Disable writes on a single endpoint during a scheduled freeze without disabling the feature globally.
- Audit which endpoints have ever had writes enabled via NetBox's
extras.object-changeslog onProxmoxEndpoint.
2.3 Enforcement points¶
Every verb request is gated at three layers, in this order:
- Plugin UI — the four buttons are hidden if the user lacks
core.run_proxmox_action. Buttons render disabled with a tooltip ("This endpoint does not allow writes") if the VM's endpoint hasallow_writes=False. - NetBox plugin endpoint — the backend-proxy view that forwards
the POST to
proxbox-apicheckscore.run_proxmox_action(viaContentTypePermissionRequiredMixin) and rejects with 403 if the user does not hold it. - proxbox-api route — the route handler looks up the target
VM's owning
ProxmoxEndpoint(via the existing endpoint registry) and rejects with 403 ifallow_writes=False. The 403 response body includes a structuredreasonfield ("endpoint_writes_disabled") so the UI can render an actionable error.
The third layer is the load-bearing one. A future external automation caller that bypasses the plugin must still pass it; the plugin UI checks are belt-and-braces.
3. Permission model¶
One permission, not four. The codename is
core.run_proxmox_action — content-type-scoped on
virtualization.virtualmachine and registered via the existing
ContentTypePermissionRequiredMixin pattern in
netbox_proxbox/views/proxbox_access.py.
3.1 Why one, not per-verb¶
An operator who can start a VM in production almost always also
needs to stop and snapshot it; the per-verb split adds permission
toggles without adding meaningful access control. The verbs share a
trust boundary (they all mutate Proxmox state); they share a single
permission.
If a future deployment needs per-verb granularity it can be added
without breaking the contract — add the new codenames alongside
run_proxmox_action, and grant run_proxmox_action by default to
keep existing operator groups working.
3.2 Permission helper¶
A new function in netbox_proxbox/views/proxbox_access.py:
def permission_run_proxmox_action() -> str:
return "core.run_proxmox_action"
Used by both the verb-button rendering in template_content.py and
the backend-proxy POST views.
3.3 Read-only NetBox users¶
A user with only virtualization.view_virtualmachine (no
run_proxmox_action):
- Sees the VM detail page exactly as today.
- Does not see the four action buttons.
- If they hit the backend-proxy POST URL directly, receives a 403
with
reason: "permission_denied".
4. Idempotency¶
Every verb route accepts an optional Idempotency-Key header
following the draft IETF Idempotency-Key for HTTP
semantics:
- Window: 60 seconds, sliding from first observed POST.
- Key scope: per
(endpoint_id, verb, vmid)tuple. The same key reused across different VMs / verbs does not collide. - Resolution: the second POST within the window returns the cached response of the first; the Proxmox API is called once.
- Storage: in-memory dict in proxbox-api keyed by
(endpoint_id, verb, vmid, key). Entries are cleared by a 60-second TTL; no SQLite write. Process restart clears the dict — acceptable for the 60-second window.
4.1 Plugin-side default¶
When the operator clicks a verb button, the plugin generates a
uuid4() and sends it as Idempotency-Key. This protects against
double-click in the confirmation modal.
4.2 State-based no-op (complementary)¶
In addition to key-based idempotency, every verb performs a pre-flight state check:
startagainst a running VM: no-op, returns 200 withresult: "already_running"and no Proxmox call.stopagainst a stopped VM: no-op, returns 200 withresult: "already_stopped".snapshotis always dispatched (operator-initiated; the operator knows they are creating one).migrateagainst a VM already on the target node: no-op, returns 200 withresult: "already_on_target_node".
State-based no-op runs before the Idempotency-Key check is
recorded. A double-clicked start on an already-running VM returns
already_running both times without consuming the key.
5. Cancellation semantics¶
Per-verb:
| Verb | Cancellable mid-flight? | Notes |
|---|---|---|
start |
No. | The Proxmox API call returns within seconds. No async progress. |
stop |
No. | Same as start. A "stop" issued during a "shutdown" can be retried; that is a Proxmox-side concern. |
snapshot |
No. | Snapshot creation is a Proxmox-side atomic operation. |
migrate |
Yes. | Long-running. The operator may cancel via a separate DELETE /proxmox/qemu/{vmid}/migrate/{task_upid} endpoint, which proxies to the Proxmox nodes/{node}/tasks/{upid} DELETE. Cancellation is best-effort: Proxmox decides whether the in-flight task can be torn down. |
The migrate cancel endpoint is part of sub-PR F (migrate verb), not a separate PR, since the migrate verb is meaningless without it.
6. Audit trail¶
Every verb invocation — successful, failed, or no-op — writes a
journal entry on the linked VirtualMachine via
POST /api/extras/journal-entries/. The journal entry is the
operator-visible audit trail on the VM detail page; it is not a
core.Job row (that model is for background-task state, not user
actions).
6.1 Payload shape¶
{
"assigned_object_type": "virtualization.virtualmachine",
"assigned_object_id": <netbox_vm_pk>,
"kind": "info",
"comments": "Proxbox operational verb dispatched.\n\n- verb: start\n- actor: alice@netbox\n- result: ok\n- proxmox_task_upid: UPID:pve-node-01:00012F4A:00...\n- idempotency_key: 7b3c9f4a-...\n- endpoint: my-prod-cluster (id=3)\n- dispatched_at: 2026-05-12T18:42:11Z"
}
The comments field is a structured Markdown block, not free text.
The bullet keys are stable and machine-parseable; release notes
document the parse grammar so downstream automation can rely on it.
6.2 Failure-mode invariant¶
If the Proxmox call fails, the journal entry MUST still be written
(with kind: "warning" and result: "failed"). Failure to audit is a
P0 bug — pin in tests with a simulated Proxmox 500 response.
If the journal-entry POST itself fails (NetBox unreachable), the
verb route returns 500 with reason: "audit_write_failed" and the
Proxmox state may have been mutated. This is the worst-case path; it
is acceptable because (a) Proxmox state is observable on the next
sync, (b) the operator sees a clear error, (c) NetBox-down is a
broader incident than a missed audit row.
6.3 Idempotent re-issue¶
A second POST with the same Idempotency-Key does not write a second
journal entry. The cached response (§4) carries the original entry's
URL in a journal_entry_url field.
7. SSE channel¶
Per §5.5: only migrate streams SSE. The other three verbs
return their result synchronously on the POST response.
7.1 Migrate SSE¶
The migrate POST returns 202 with task_upid and an
sse_url in the body. The client opens
GET /proxmox/qemu/{vmid}/migrate/{task_upid}/stream to receive
progress frames.
New event types added to netbox_proxbox/schemas/backend_proxy.py::SseEventType:
| Event | When emitted |
|---|---|
migrate_dispatched |
First frame; mirrors the 202 POST body. |
migrate_progress |
Repeating; carries the Proxmox task progress percentage and phase string. |
migrate_succeeded |
Final frame on success. |
migrate_failed |
Final frame on failure; carries the Proxmox error chain. |
These are migrate-specific names, not the original
verb_dispatched/verb_succeeded/verb_failed triple proposed in
the issue body. The other three verbs do not need an SSE channel; a
shared family of event names would be premature abstraction.
7.2 Schema mirror¶
The four new event types are added to
contracts/proxbox_api_sse_schema.json's StreamMessageType enum
in the same commit as the backend_proxy.py change. The
tests/test_sse_schema_mirror.py canary enforces the mirror; the
migrate PR (sub-PR F) MUST pass this test before merge.
7.3 Non-migrate response shape¶
start / stop / snapshot return 200 with:
{
"verb": "start",
"vmid": 101,
"vm_type": "qemu",
"endpoint_id": 3,
"result": "ok",
"proxmox_task_upid": "UPID:pve-node-01:00012F4A:...",
"journal_entry_url": "/api/extras/journal-entries/789/",
"dispatched_at": "2026-05-12T18:42:11Z"
}
A no-op (already-running / already-stopped) returns the same shape
with result: "already_running" etc. and no proxmox_task_upid.
8. URL surface and module layout¶
8.1 proxbox-api¶
New module: proxbox_api/routes/proxmox_actions.py. Registered in
proxbox_api/app/factory.py alongside the other routes/proxmox/*
routers, prefix /proxmox. The module exposes:
POST /proxmox/qemu/{vmid}/start
POST /proxmox/lxc/{vmid}/start
POST /proxmox/qemu/{vmid}/stop
POST /proxmox/lxc/{vmid}/stop
POST /proxmox/qemu/{vmid}/snapshot
POST /proxmox/lxc/{vmid}/snapshot
POST /proxmox/qemu/{vmid}/migrate
POST /proxmox/lxc/{vmid}/migrate
DELETE /proxmox/qemu/{vmid}/migrate/{task_upid}
DELETE /proxmox/lxc/{vmid}/migrate/{task_upid}
GET /proxmox/qemu/{vmid}/migrate/{task_upid}/stream
GET /proxmox/lxc/{vmid}/migrate/{task_upid}/stream
The QEMU and LXC variants share a single handler internally,
parameterised by vm_type.
8.2 New Proxmox-SDK helpers¶
Added to proxbox_api/services/proxmox_helpers.py with the existing
@_dual_mode async pattern:
| Helper | Wraps |
|---|---|
start_vm(session, node, vm_type, vmid) |
POST nodes/{node}/{vm_type}/{vmid}/status/start |
stop_vm(session, node, vm_type, vmid) |
POST nodes/{node}/{vm_type}/{vmid}/status/stop |
create_vm_snapshot(session, node, vm_type, vmid, snapname, description) |
POST nodes/{node}/{vm_type}/{vmid}/snapshot |
migrate_vm(session, node, vm_type, vmid, target, online) |
POST nodes/{node}/{vm_type}/{vmid}/migrate |
migrate_preflight(session, node, vm_type, vmid) |
GET nodes/{node}/{vm_type}/{vmid}/migrate |
get_vm_status(session, node, vm_type, vmid) |
GET nodes/{node}/{vm_type}/{vmid}/status/current (used for state-based no-op) |
cancel_task(session, node, upid) |
DELETE nodes/{node}/tasks/{upid} |
get_task_status(session, node, upid) |
already exists as get_node_task_status |
All helpers raise ProxmoxAPIError on ProxmoxTimeoutError /
ProxmoxConnectionError per the existing convention.
8.3 New NetBox-REST helper¶
Added to proxbox_api/netbox_rest.py:
async def write_journal_entry(
session: NetBoxSession,
*,
assigned_object_type: str,
assigned_object_id: int,
kind: Literal["info", "success", "warning", "danger"],
comments: str,
) -> dict:
"""POST to /api/extras/journal-entries/. Returns the created entry."""
Used by every verb route.
8.4 Plugin-side backend-proxy¶
New view in netbox_proxbox/views/operational.py (one per verb,
sharing a base class). Each view:
- Checks
core.run_proxmox_actionviaContentTypePermissionRequiredMixin. - Resolves the VM's
proxmox_endpoint(via the existing custom field onVirtualMachine). - Generates an
Idempotency-Keyif the request did not carry one. - POSTs to the matching proxbox-api route through the existing backend-proxy helper.
- Surfaces the response (or error) as an HTMX response that the confirmation modal renders.
9. Migrate pre-flight¶
Before dispatching a migrate, the route handler calls
migrate_preflight(session, node, vm_type, vmid) which wraps the
Proxmox GET nodes/{node}/qemu/{vmid}/migrate endpoint. The response
returns:
allowed_nodes— list of node names the VM can move to.local_disks— local-only disks blocking online migrate.local_resources— local resources (pci-passthrough, etc.).running— current VM running state.
The route rejects with 400 (not 500) if:
targetnot inallowed_nodes—reason: "target_not_allowed".online=Trueandlocal_disksis non-empty —reason: "local_disks_block_online_migrate".online=Trueandlocal_resourcesis non-empty —reason: "local_resources_block_online_migrate".
The 400 response body includes the full preflight payload in a
preflight field so the UI can render the reason chain.
10. Acceptance criteria (pinned)¶
These are the verifiable conditions for "the carve-out is shipped". Each verb PR (sub-PRs C–F) carries its own subset; the full set is the merge gate on sub-PR G.
- A read-only NetBox user (
virtualization.view_virtualmachine, nocore.run_proxmox_action) sees the VM detail page exactly as today; no action buttons appear. - A user with
core.run_proxmox_actionsees four buttons; clicking any button opens a confirmation modal that displays the VM identity (name,vmid,endpoint) before the POST fires. - With
ProxmoxEndpoint.allow_writes=Falseon the VM's endpoint, the proxbox-api route returns 403 withreason: "endpoint_writes_disabled"; the modal surfaces the error. - Two
startPOSTs with the sameIdempotency-Keywithin 60 s resolve to a single Proxmox call. Pin in tests. - Every successful verb invocation writes exactly one journal entry on the linked VM with the §6.1 payload shape. Pin in tests.
- A simulated Proxmox 500 still writes a journal entry (
kind: "warning",result: "failed"). Pin in tests. startagainst an already-running VM returnsresult: "already_running"with no Proxmox call (verify via mock-server call counter). Same forstop/already_stopped. Pin in tests.migrateto a non-existent or offline target node returns 400 withreason: "target_not_allowed"and the full preflight payload. Pin in tests.- The next sync after a verb call reflects the new state — no
verb-vs-sync race; pin via an end-to-end test on the
proxmox-mockHTTP service. tests/test_sse_schema_mirror.pypasses after sub-PR F adds the four migrate event types tocontracts/proxbox_api_sse_schema.json.
11. Sub-PR sequence¶
The seven sub-PRs ship in strict order. Each is a separate review surface; do not bundle.
| Sub-PR | Scope | Lands |
|---|---|---|
| A | This design doc | netbox-proxbox feat/issue-376-design-doc → v0.0.15 |
| B | ProxmoxEndpoint.allow_writes + run_proxmox_action permission + stubbed routes returning 403 |
both repos |
| C | start verb (qemu + lxc) |
both repos |
| D | stop verb (qemu + lxc) |
both repos |
| E | snapshot verb (qemu + lxc) |
both repos |
| F | migrate verb (qemu + lxc) + SSE + cancel + preflight |
both repos |
| G | Plugin-side button wiring (template_extensions) + Playwright e2e |
netbox-proxbox |
Sub-PRs B–F each ship their own permission test + idempotency test + journal-entry test + state-based no-op test (where applicable). Sub-PR G ships the Playwright end-to-end test exercising all four verbs through the UI.
12. Dependencies¶
Per §5.5 the verbs depend on:
- v0.0.15 item #1 (drift-detect helper) — the journal-entry write uses the same diff semantics the drift-detect helper introduces.
- v0.0.15 item #2 (NetBox-side bootstrap) — the
core.run_proxmox_actionpermission's content-type registration uses the bootstrap pass. - This design doc (sub-PR A) — non-negotiable per §5.5.
367 (run-UUID stamp, v0.0.18 item #11) is not in the dependency¶
chain. It is a useful correlation primitive that the verb work can adopt if it ships first, but it is not blocking.
13. Risks and open questions¶
- Trust boundary. Per-endpoint gating + content-type permission +
confirmation modal + journal-entry audit is the four-layer
defence. The risk is operator misconfiguration (enabling
allow_writeson a production endpoint without restrictingrun_proxmox_action). Mitigation: the migration that addsallow_writesships with a release-note security disclaimer; the field'shelp_textrepeats the warning. - Partial-success race. Proxmox call returns success but the response is lost in transit. The idempotency key plus the next sync's reconciliation closes the loop: retry is safe, and the observer pass corrects the NetBox-side view. Pin in tests via a simulated transport failure.
- Snapshot proliferation. Repeated
snapshotclicks create one snapshot per click. The verb takes an optionalnameparameter defaulting toproxbox-{idempotency_key_prefix}. Documented in the release note; cleanup is Proxmox's retention concern. - Migrate complexity. The most-complex verb. Pre-flight validation, dedicated SSE channel, cancel endpoint, target-node validation, online/offline distinction. The migrate sub-PR (F) is expected to be the largest of the verb PRs.
- External automation callers. The routes are reachable by any
caller with an
X-Proxbox-API-Keyonceallow_writesis on. The routes do not check NetBox permissions directly (the plugin does). An external caller bypasses the NetBox permission layer. This is documented as a known property of the API surface; restrict the API key to the same operator group that holdsrun_proxmox_action.
14. Non-goals¶
This carve-out does not:
- Invert the observer stance for any field Proxmox owns.
- Add NetBox
extras.event_rulesrows or webhook listeners. - Require AWX / Tower / AAP as a dispatch medium.
- Bundle clone / destroy / resize / console / template / backup-now verbs.
- Bump the
netbox-proxboxorproxbox-apiversion.
Operators who want the upstream event-rule + webhook shape can
continue to run netbox-proxmox-automation alongside this stack;
the two systems coexist per §10 of
PROXBOX-AND-PROXMOX-AUTOMATION.md.