I recently was confronted with a stack of six Cisco 3750X switches that were experiencing intermittent outages. The symptoms were random, but included switches being removed from the stack randomly, PoE drops, and full blown switch crashes with reloads. If anyone has worked with Cisco stacks, you know how long it takes the entire stack to reload and elect a new master.
Troubleshooting the situation, I found that the switch stackwise ports were flapping. All of them were reporting up/down notifications at random intervals.
I brought the stack down, removed all of the stack cables and brought the stack up one switch at a time. I wanted to test the physical stack cables as I assumed there must one or more cables with issues.
By connecting the stackwise cable in a loop on the master, one can verify the physical operation of each cable.
SM: Detected stack cables at PORT1 PORT2
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 1 Switch 1 has changed to state UP
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 2 Switch 1 has changed to state UP
Switch# show switch stack-ports
Switch# Port 1 Port 2 ——– —— —— 1 Ok Ok
I monitored each cable for 5 minutes until satisfied that there was no issue. However; one cable when connected began to flap up/down.
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 1 Switch 1 has changed to state DOWN
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 1 Switch 1 has changed to state UP
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 2 Switch 1 has changed to state DOWN
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 2 Switch 1 has changed to state UP
%STACKMGR-4-STACK_LINK_CHANGE: Stack Port 1 Switch 1 has changed to state DOWN
Interestingly enough, once the bad cable was removed from the stack, ALL of the issues with the switch stack were resolved.