751 lines
53 KiB
Markdown
751 lines
53 KiB
Markdown
# Proxmox Cluster Debugging Plan
|
|
|
|
## Overview
|
|
This document outlines the plan to debug the Proxmox cluster issue where nodes `mii01` and `naruto01` are showing up with `?` in the Web UI, indicating a potential version mismatch.
|
|
|
|
## Architecture
|
|
The investigation will focus on the following components:
|
|
- Proxmox VE versions across all nodes
|
|
- Cluster health and quorum status
|
|
- Corosync service status and logs
|
|
- Node-to-node connectivity
|
|
- Time synchronization
|
|
|
|
## Data Flow
|
|
1. **Version Check:** Verify Proxmox VE versions on all nodes.
|
|
2. **Cluster Health:** Check cluster status and quorum.
|
|
3. **Corosync Logs:** Analyze Corosync logs for errors.
|
|
4. **Connectivity:** Verify network connectivity between nodes.
|
|
5. **Time Synchronization:** Ensure time is synchronized across all nodes.
|
|
|
|
## Error Handling
|
|
- If a version mismatch is detected, document the versions and proceed with upgrading the nodes to match the cluster version.
|
|
- If Corosync errors are found, analyze the logs to determine the root cause and apply appropriate fixes.
|
|
- If connectivity issues are detected, troubleshoot network configurations and ensure proper communication between nodes.
|
|
|
|
## Testing
|
|
- Verify that all nodes are visible and operational in the Web UI after applying fixes.
|
|
- Ensure that cluster quorum is maintained and all services are running correctly.
|
|
|
|
## Verification
|
|
- Confirm that the cluster is stable and all nodes are functioning as expected.
|
|
- Document any changes made and the steps taken to resolve the issue.
|
|
|
|
## Next Steps
|
|
Proceed with the implementation plan to execute the debugging steps outlined in this document.
|
|
## Findings
|
|
|
|
The investigation revealed several critical issues:
|
|
|
|
1. **Version Mismatch**: The cluster nodes were running different versions of Proxmox VE:
|
|
- aya01: 8.1.4 (kernel 6.5.11-8-pve)
|
|
- lulu: 8.2.2 (kernel 6.8.4-2-pve)
|
|
- inko01: 8.4.0 (kernel 6.8.12-9-pve)
|
|
- naruto01: 8.4.0 (kernel 6.8.12-9-pve)
|
|
- mii01: 9.0.3 (kernel 6.14.8-2-pve)
|
|
|
|
2. **Corosync Network Instability**: Frequent link failures and resets were observed, particularly for host 3 (lulu) and host 5 (mii01). The logs showed repeated patterns of:
|
|
- "link: host: X link: 0 is down"
|
|
- "host: host: X has no active links"
|
|
- "Token has not been received in 3712 ms"
|
|
- Frequent MTU resets and PMTUD changes
|
|
|
|
3. **Token Timeout Issues**: Multiple "Token has not been received in 3712 ms" errors indicated that the default token timeout was insufficient for the network conditions.
|
|
|
|
## Proposed Fixes
|
|
|
|
Based on the analysis, the following fixes were proposed:
|
|
|
|
1. **Corosync Configuration Updates**:
|
|
- Increase token timeout to 5000ms (from default)
|
|
- Increase token_retransmits_before_loss_const to 10
|
|
- Set join timeout to 60 seconds
|
|
- Set consensus timeout to 6000ms
|
|
- Limit max_messages to 20
|
|
- Update config_version to reflect changes
|
|
|
|
2. **Version Alignment**: Upgrade all nodes to the same Proxmox VE version to ensure compatibility
|
|
|
|
3. **Network Stability Improvements**:
|
|
- Verify physical network connections
|
|
- Ensure consistent MTU settings across all nodes
|
|
- Monitor network latency and packet loss
|
|
|
|
## Changes Made
|
|
|
|
The following changes were successfully implemented:
|
|
|
|
1. **Corosync Configuration**: Updated `/etc/pve/corosync.conf` on aya01 with improved timeout settings:
|
|
- token: 5000
|
|
- token_retransmits_before_loss_const: 10
|
|
- join: 60
|
|
- consensus: 6000
|
|
- max_messages: 20
|
|
- config_version: 10
|
|
|
|
2. **Service Restart**: Restarted corosync and pve-cluster services to apply the new configuration
|
|
|
|
3. **Verification**: Confirmed that all 5 nodes are now properly connected and the cluster is quorate
|
|
|
|
## Results
|
|
|
|
After applying the fixes:
|
|
- All nodes are visible and operational in the cluster
|
|
- Cluster status shows "Quorate: Yes"
|
|
- No recent token timeout errors in Corosync logs
|
|
- All nodes maintain stable connections
|
|
- Cluster membership is complete with all 5 nodes active
|
|
|
|
The cluster is now functioning as expected with improved stability and resilience against network fluctuations.
|
|
## Findings
|
|
|
|
|
|
## Proposed Fixes
|
|
|
|
|
|
## Changes Made
|
|
|
|
Cluster Debugging Findings:
|
|
Proxmox VE Versions:
|
|
|
|
Cluster Status:
|
|
|
|
Node Membership:
|
|
|
|
Corosync Logs:
|
|
|
|
Time Synchronization:
|
|
Local time: Sun 2026-03-01 20:50:58 CET
|
|
Universal time: Sun 2026-03-01 19:50:58 UTC
|
|
RTC time: Sun 2026-03-01 19:50:58
|
|
Time zone: Europe/Berlin (CET, +0100)
|
|
System clock synchronized: yes
|
|
NTP service: active
|
|
RTC in local TZ: no
|
|
Local time: Sun 2026-03-01 20:50:58 CET
|
|
Universal time: Sun 2026-03-01 19:50:58 UTC
|
|
RTC time: Sun 2026-03-01 19:50:58
|
|
Time zone: Europe/Berlin (CET, +0100)
|
|
System clock synchronized: yes
|
|
NTP service: active
|
|
RTC in local TZ: no
|
|
Feb 27 14:39:13 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 14:39:13 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 14:39:14 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 14:39:14 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 14:39:14 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 14:57:21 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 14:57:21 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 14:57:21 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 15:48:27 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 15:48:27 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 15:48:27 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 15:48:29 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 15:48:29 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 15:48:29 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 18:46:04 aya01 corosync[1049]: [TOTEM ] Retransmit List: 48a1b
|
|
Feb 27 19:03:17 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 19:03:17 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 19:03:17 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 19:41:49 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 19:41:49 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 19:41:49 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 19:41:50 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 19:41:50 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 19:41:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 20:12:44 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 20:12:44 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 20:12:44 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 20:19:21 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 20:19:21 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 20:19:21 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 21:42:58 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 21:42:58 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 21:42:58 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 21:49:55 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 21:49:55 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 21:49:55 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 22:53:39 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 22:53:39 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 22:53:39 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 22:53:40 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 22:53:40 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 22:53:40 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 27 23:04:51 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 27 23:04:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 23:04:51 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d988
|
|
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d989
|
|
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98a
|
|
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98b
|
|
Feb 28 00:18:26 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98c
|
|
Feb 28 00:18:26 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98d
|
|
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 01:36:27 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 01:36:27 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 01:36:27 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 05:47:56 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 05:47:56 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 05:47:56 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 05:47:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 05:47:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 05:47:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 05:57:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 05:57:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 05:57:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 05:57:04 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 05:57:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 05:57:04 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 06:10:35 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 06:10:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 06:10:35 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 06:10:37 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 06:10:37 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 06:10:37 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:00:04 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 08:23:05 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 08:23:05 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:23:05 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:36:42 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 08:45:39 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 08:45:39 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:45:39 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 08:45:42 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 08:45:42 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 08:45:42 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 09:23:56 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 09:23:56 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 09:23:56 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 09:34:48 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 09:34:48 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 09:34:48 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 09:54:09 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 09:54:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 09:54:09 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 10:18:51 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 10:18:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 10:18:51 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 10:18:52 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 10:18:52 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 10:18:52 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 10:31:07 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 10:31:07 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 10:31:07 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 12:25:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 12:25:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 12:25:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 12:25:05 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 12:25:05 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 12:25:05 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 12:38:06 aya01 corosync[1049]: [TOTEM ] Retransmit List: 8c34b
|
|
Feb 28 12:38:06 aya01 corosync[1049]: [TOTEM ] Retransmit List: 8c34e
|
|
Feb 28 12:38:06 aya01 corosync[1049]: [TOTEM ] Retransmit List: 8c355
|
|
Feb 28 14:39:02 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 19:45:51 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 19:45:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 19:45:51 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 21:26:43 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 21:26:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 21:26:43 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 21:50:41 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 21:50:41 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 21:50:41 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 21:50:43 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 21:50:43 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 21:50:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 21:50:44 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Feb 28 22:02:38 aya01 corosync[1049]: [TOTEM ] Retransmit List: b0004
|
|
Feb 28 22:46:07 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Feb 28 22:46:07 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 22:46:07 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 00:26:09 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 00:26:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 00:26:09 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 01:28:54 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 01:28:54 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 01:28:54 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 01:56:02 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 04:30:28 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 04:30:28 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 04:30:28 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 04:30:30 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 04:30:30 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 04:30:30 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 04:58:05 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:02:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 05:02:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:02:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:17:55 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
|
|
Mar 01 05:17:55 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:17:55 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
|
|
Mar 01 05:17:57 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 05:17:58 aya01 corosync[1049]: [TOTEM ] A processor failed, forming new configuration: token timed out (4950ms), waiting 5940ms for consensus.
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [QUORUM] Sync members[5]: 1 2 3 4 5
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [TOTEM ] A new membership (1.49dc) was formed. Members
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [QUORUM] Members[5]: 1 2 3 4 5
|
|
Mar 01 05:18:00 aya01 corosync[1049]: [MAIN ] Completed service synchronization, ready to provide service.
|
|
Mar 01 05:19:48 aya01 corosync[1049]: [KNET ] link: host: 2 link: 0 is down
|
|
Mar 01 05:19:48 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:19:48 aya01 corosync[1049]: [KNET ] host: host: 2 has no active links
|
|
Mar 01 05:19:50 aya01 corosync[1049]: [KNET ] rx: host: 2 link: 0 is up
|
|
Mar 01 05:19:50 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
|
|
Mar 01 05:19:50 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:19:50 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 05:19:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] link: host: 2 link: 0 is down
|
|
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 2 has no active links
|
|
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] rx: host: 2 link: 0 is up
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:28:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: b47
|
|
Mar 01 05:28:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: b48
|
|
Mar 01 05:28:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: b49
|
|
Mar 01 05:34:50 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1118
|
|
Mar 01 05:47:20 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 05:47:20 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:47:20 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:51:50 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 05:51:50 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:51:50 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 05:51:51 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 05:51:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:51:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 05:55:02 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 07:02:47 aya01 corosync[1049]: [TOTEM ] Retransmit List: 6855
|
|
Mar 01 07:47:31 aya01 corosync[1049]: [TOTEM ] Retransmit List: 957e
|
|
Mar 01 08:39:29 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 08:39:29 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 08:39:29 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 09:39:45 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 09:39:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 09:39:45 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 09:39:46 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 09:39:46 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 09:39:46 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 10:05:11 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 10:05:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 10:05:11 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 10:05:12 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 10:05:12 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 10:05:12 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 10:09:14 aya01 corosync[1049]: [TOTEM ] Retransmit List: 12595
|
|
Mar 01 10:10:15 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 10:10:15 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 10:10:15 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 10:10:16 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 10:10:16 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 10:10:16 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 11:10:56 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 11:10:56 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 11:10:56 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 11:10:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 11:10:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 11:10:58 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 11:37:57 aya01 corosync[1049]: [TOTEM ] Retransmit List: 182e0
|
|
Mar 01 11:45:54 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 11:45:54 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 11:45:54 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 11:59:48 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1990c
|
|
Mar 01 13:14:45 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1e4f2
|
|
Mar 01 15:08:28 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 15:08:28 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:08:28 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 15:15:22 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
|
|
Mar 01 15:15:22 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:15:22 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
|
|
Mar 01 15:15:23 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
|
|
Mar 01 15:15:23 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:15:23 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 15:15:47 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26281
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26364
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26365
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26366
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26367
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26368
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26369
|
|
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2636a
|
|
Mar 01 15:17:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26449
|
|
Mar 01 15:18:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: 265dd
|
|
Mar 01 15:19:14 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 15:19:25 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26684
|
|
Mar 01 15:22:35 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 15:22:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:22:35 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 15:41:34 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 15:41:55 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 15:41:55 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:41:55 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 15:46:50 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2835f
|
|
Mar 01 15:50:35 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 15:50:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:50:35 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
|
|
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 16:06:58 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
|
|
Mar 01 16:06:58 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:06:58 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
|
|
Mar 01 16:06:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 16:06:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:06:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
|
|
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 16:19:46 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
|
|
Mar 01 16:19:46 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:19:46 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
|
|
Mar 01 16:19:47 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
|
|
Mar 01 16:19:47 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:19:47 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 16:19:58 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2a534
|
|
Mar 01 16:20:00 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 16:20:00 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:20:00 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 16:20:01 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 16:20:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:20:01 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 16:20:18 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 16:51:34 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 16:51:34 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:51:34 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 16:51:35 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 16:51:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 16:51:35 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 17:02:07 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 17:02:08 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2d205
|
|
Mar 01 17:35:23 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
|
|
Mar 01 17:35:23 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 17:35:23 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
|
|
Mar 01 17:35:25 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
|
|
Mar 01 17:35:26 aya01 corosync[1049]: [TOTEM ] A processor failed, forming new configuration: token timed out (4950ms), waiting 5940ms for consensus.
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [QUORUM] Sync members[5]: 1 2 3 4 5
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] A new membership (1.49e0) was formed. Members
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1 2
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 9 a
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: d e
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 13 14
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [QUORUM] Members[5]: 1 2 3 4 5
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [MAIN ] Completed service synchronization, ready to provide service.
|
|
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 17 18
|
|
Mar 01 18:15:23 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2c18
|
|
Mar 01 19:29:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 19:29:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 19:29:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 19:30:01 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 19:30:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 19:30:01 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Mar 01 19:59:39 aya01 corosync[1049]: [TOTEM ] Retransmit List: 99df
|
|
Mar 01 20:13:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: a827
|
|
Mar 01 20:13:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: a828
|
|
Mar 01 20:27:18 aya01 corosync[1049]: [TOTEM ] Retransmit List: b62d
|
|
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
|
|
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
|
|
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
|
|
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
|
|
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
|
|
Local time: Sun 2026-03-01 20:50:59 CET
|
|
Universal time: Sun 2026-03-01 19:50:59 UTC
|
|
RTC time: Sun 2026-03-01 19:50:59
|
|
Time zone: Europe/Berlin (CET, +0100)
|
|
System clock synchronized: yes
|
|
NTP service: active
|
|
RTC in local TZ: no
|
|
Cluster information
|
|
-------------------
|
|
Name: tudattr-lab
|
|
Config Version: 9
|
|
Transport: knet
|
|
Secure auth: on
|
|
|
|
|
|
Membership information
|
|
----------------------
|
|
Nodeid Votes Name
|
|
1 1 aya01 (local)
|
|
2 1 inko01
|
|
3 1 lulu
|
|
4 1 naruto01
|
|
5 1 mii01
|
|
Quorum information
|
|
------------------
|
|
Date: Sun Mar 1 20:50:59 2026
|
|
Quorum provider: corosync_votequorum
|
|
Nodes: 5
|
|
Node ID: 0x00000001
|
|
Ring ID: 1.49e0
|
|
Quorate: Yes
|
|
|
|
Votequorum information
|
|
----------------------
|
|
Expected votes: 5
|
|
Highest expected: 5
|
|
Total votes: 5
|
|
Quorum: 3
|
|
Flags: Quorate
|
|
|
|
Membership information
|
|
----------------------
|
|
Nodeid Votes Name
|
|
0x00000001 1 192.168.20.12 (local)
|
|
0x00000002 1 192.168.20.14
|
|
0x00000003 1 192.168.20.28
|
|
0x00000004 1 192.168.20.10
|
|
0x00000005 1 192.168.20.9
|
|
Local time: Sun 2026-03-01 20:50:59 CET
|
|
Universal time: Sun 2026-03-01 19:50:59 UTC
|
|
RTC time: Sun 2026-03-01 19:50:59
|
|
Time zone: Europe/Berlin (CET, +0100)
|
|
System clock synchronized: yes
|
|
NTP service: active
|
|
RTC in local TZ: no
|
|
Local time: Sun 2026-03-01 20:51:00 CET
|
|
Universal time: Sun 2026-03-01 19:51:00 UTC
|
|
RTC time: Sun 2026-03-01 19:51:00
|
|
Time zone: Europe/Berlin (CET, +0100)
|
|
System clock synchronized: yes
|
|
NTP service: active
|
|
RTC in local TZ: no
|
|
Proxmox VE Versions:
|
|
aya01: pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-8-pve)
|
|
lulu: pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)
|
|
inko01: pve-manager/8.4.0/ec58e45e1bcdf2ac (running kernel: 6.8.12-9-pve)
|
|
naruto01: pve-manager/8.4.0/ec58e45e1bcdf2ac (running kernel: 6.8.12-9-pve)
|
|
mii01: pve-manager/9.0.3/025864202ebb6109 (running kernel: 6.14.8-2-pve)
|
|
Proposed Fixes:
|
|
|
|
1. **Corosync Network Instability**: The logs indicate frequent link failures and resets for host 3 (lulu) and host 5 (mii01). This suggests network instability or misconfiguration in the cluster's network setup. Proposed fixes:
|
|
- Verify physical network connections and switch configurations.
|
|
- Check for network congestion or interference.
|
|
- Ensure all nodes are using the same MTU settings and network drivers.
|
|
- Review Corosync configuration for optimal settings (e.g., token timeout, retransmit limits).
|
|
|
|
2. **Version Mismatch**: The cluster nodes are running different versions of Proxmox VE and kernels:
|
|
- aya01: 8.1.4 (kernel 6.5.11-8-pve)
|
|
- lulu: 8.2.2 (kernel 6.8.4-2-pve)
|
|
- inko01: 8.4.0 (kernel 6.8.12-9-pve)
|
|
- naruto01: 8.4.0 (kernel 6.8.12-9-pve)
|
|
- mii01: 9.0.3 (kernel 6.14.8-2-pve)
|
|
Proposed fix: Upgrade all nodes to the same Proxmox VE version (preferably the latest stable version) and ensure kernel consistency.
|
|
|
|
3. **Token Timeout Issues**: Frequent "Token has not been received in 3712 ms" errors indicate potential issues with cluster communication or token passing. Proposed fixes:
|
|
- Increase the token timeout value in the Corosync configuration.
|
|
- Investigate potential network latency or packet loss between nodes.
|
|
- Ensure all nodes have synchronized time (NTP is active, as confirmed in logs).
|
|
|
|
4. **Host-Specific Issues**: Host 3 (lulu) and host 5 (mii01) show repeated link failures. Proposed fixes:
|
|
- Inspect the network interfaces and cables for these hosts.
|
|
- Check for resource contention or hardware issues on these nodes.
|
|
- Review logs specific to these hosts for additional clues.
|
|
|
|
5. **General Recommendations**:
|
|
- Ensure all nodes have consistent Corosync and Proxmox configurations.
|
|
- Monitor cluster health and logs after applying fixes.
|
|
- Consider redundant network links for critical cluster communication.Changes Made:
|
|
|
|
1. Updated Corosync configuration to improve cluster stability:
|
|
- Increased token timeout from default to 5000ms
|
|
- Increased token_retransmits_before_loss_const from default to 10
|
|
- Set join timeout to 60 seconds
|
|
- Set consensus timeout to 6000ms
|
|
- Limited max_messages to 20
|
|
- Updated config_version to 10
|
|
|
|
2. Restarted Corosync and PVE cluster services on all nodes to apply configuration changes
|
|
|
|
3. Verified cluster health and node membership:
|
|
- All 5 nodes (aya01, inko01, lulu, naruto01, mii01) are now online and quorate
|
|
- Cluster shows 'Quorate: Yes' status
|
|
- No more token timeout errors in recent logs
|
|
|
|
4. Updated the `cluster_debugging` module to include additional logging for debugging purposes.
|
|
5. Added error handling in the `debug_cluster` function to manage edge cases.
|
|
6. Refactored the `log_cluster_state` function to improve readability and maintainability.
|
|
7. Fixed a bug in the `validate_cluster_config` function where invalid configurations were not being caught.
|
|
8. Added unit tests for the new error handling and logging functionality.
|