None of this was supposed to happen, the AWS migration was supposed to be straightforward. I had just restarted the server to apply a last minute security patch per our protocol. We had successfully moved dozens of servers to the cloud using this same process over the past several months. But on May 9th, 2025 after a planned restarted, I noticed that one of our mission critical database servers had stalled during its pre-migration data cloning. The AWS Migration Service was throwing errors indicating failure to verify the server’s UUID. But the problem was nothing changed. This was a physical server and the only thing that happened was a restart. Or so I thought.
I sounded the alarm and even brought in our contracted AWS migration expert, and the rest of the migration team for that server. Even our migration expert was stumped, he told us, “I have no clue what’s causing this error. I don’t think it’s an issue with any hardware Because it is not a VM so it is a bit harder to modify.”
That surprised me, because I knew the UUID would only change if a major component of the hardware changed – and on a physical server, there were very limited options for what that could be.
I started working through the possibilities systematically. RAM doesn’t affect UUID. The server wasn’t virtual, so hypervisor changes weren’t the issue. The only thing that made sense was something with the network cards. I checked the network configuration remotely and confirmed my suspicion. The server was setup with an active-passive pair of teaming NICs. Looking through the event logs we found the culprit, a network blip a month prior caused the default NIC to change.
I realized that restarting the server changed the primary NIC to be the active NIC once again. This caused the current MAC address to change and therefore the UUID changed, and finally the migration service thought it was dealing with a completely different server.
The problem was that I couldn’t simply remote into the server and change the NIC configuration, because that could risk losing access to the server. If there were any potential problem we would have been delayed even further. I needed to utilize our Dell’s iDRAC port to access the servers remote KVM system. The only problem was, the iDRAC console on that machine hadn’t been updated in years and required a legacy Java version that wouldn’t run on modern operating systems.
I spun up a HyperV virtual machine running Windows Server 2012 – an old version we kept around specifically for accessing these outdated management consoles. Connecting through that ancient environment, I accessed the iDRAC interface and then I opened the remote screen, keyboard, and mouse of the physical server.
From there, I reconfigured the NIC bonding to force the secondary adapter active, which restored the original MAC address and UUID. The migration agent immediately reconnected to AWS, and within 10 minutes we saw the replication begin to catch up. Then in 30 minutes were were ready to do the upcoming cut over.
That was another enjoyable memory of 2am troubleshooting chaos.
Not only did I save the company a 1 or 2 week delay in the project, but I was able to help educate my team on how UUIDs are created and how the Windows NIC MAC address is handled in a switch-independent NIC teaming interface. Then I used the iDRACS connection to guarantee that we never lost contact with that server as we made the change to the network config and as we migrated the single most MISSION CRITICAL database the company had.