Files
ansible/issues/004_add_error_handling.md
Tuan-Dat Tran 5a8c7f0248 feat(proxmox): add hosts config
Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>
2026-02-28 11:30:58 +01:00

3.6 KiB

Issue: Add Comprehensive Error Handling

Status: Open Priority: High Component: proxmox/tasks Assignee: Junior Dev

Description

The Proxmox role lacks comprehensive error handling, particularly for critical operations like API calls, vault operations, and file manipulations.

Current Issues

  • No error handling for Proxmox API failures
  • No validation of VM/LXC configurations before creation
  • No retries for network operations
  • No cleanup on failure

Required Changes

Step 1: Add validation tasks

Validate configurations before attempting creation.

Step 2: Add error handling blocks

Use block/rescue/always for critical operations.

Step 3: Add retries for network operations

Use retries and delay for API calls.

Implementation Steps

Example 1: VM Creation with Error Handling

- name: Create VM with error handling
  block:
    - name: Validate VM configuration
      ansible.builtin.assert:
        that:
          - vm.vmid is defined
          - vm.vmid | int > 0
          - vm.node is defined
          - vm.cores is defined and vm.cores | int > 0
          - vm.memory is defined and vm.memory | int > 0
        msg: "Invalid VM configuration for {{ vm.name }}"

    - name: Create VM
      community.proxmox.proxmox_kvm:
        # ... existing parameters ...
      register: vm_creation_result
      retries: 3
      delay: 10
      until: vm_creation_result is not failed

  rescue:
    - name: Handle VM creation failure
      ansible.builtin.debug:
        msg: "Failed to create VM {{ vm.name }}: {{ ansible_failed_result.msg }}"

    - name: Cleanup partial resources
      # Add cleanup tasks here
      when: cleanup_partial_resources | default(true)

  always:
    - name: Log VM creation attempt
      ansible.builtin.debug:
        msg: "VM creation attempt for {{ vm.name }} completed with status: {{ vm_creation_result is defined and vm_creation_result.changed | ternary('success', 'failed') }}"

Example 2: API Call with Retries

- name: Check Proxmox API availability
  ansible.builtin.uri:
    url: "https://{{ proxmox_api_host }}:8006/api2/json/version"
    validate_certs: no
    return_content: yes
  register: api_check
  retries: 5
  delay: 5
  until: api_check.status == 200
  ignore_errors: yes

- name: Fail if API unavailable
  ansible.builtin.fail:
    msg: "Proxmox API unavailable at {{ proxmox_api_host }}"
  when: api_check is failed

Example 3: File Operation Error Handling

- name: Manage vault file safely
  block:
    - name: Backup existing vault
      ansible.builtin.copy:
        src: "{{ proxmox_vault_file }}"
        dest: "{{ proxmox_vault_file }}.backup"
        remote_src: yes
      when: vault_file_exists.stat.exists

    - name: Perform vault operations
      # ... vault operations ...

  rescue:
    - name: Restore vault from backup
      ansible.builtin.copy:
        src: "{{ proxmox_vault_file }}.backup"
        dest: "{{ proxmox_vault_file }}"
        remote_src: yes
      when: vault_file_exists.stat.exists

    - name: Fail with error details
      ansible.builtin.fail:
        msg: "Vault operation failed: {{ ansible_failed_result.msg }}"

Testing Requirements

  • Test error scenarios (invalid configs, API unavailable)
  • Verify cleanup works on failure
  • Confirm retries work for transient failures
  • Validate error messages are helpful

Acceptance Criteria

  • All critical operations have error handling
  • Validation added for configurations
  • Retry logic implemented for network operations
  • Cleanup procedures in place for failures
  • Helpful error messages provided
  • No silent failures