Up until recently I had very minimal exposure and experience with host profiles and had the pleasure of getting better acquainted with the feature. There is a well documented “bug” with the host profiles & auto deploy where the vMotion kernel port takes the vmk0 port and the host disconnects from vCenter (eg. here & here) however my problem was slightly different and the many blog posts I found on the problem did not solve it for me.
Here is what went down.
I was targeting a host to upgrade from 5.1 to 5.5 which is deployed via Auto deploy, I pointed the host at the 5.5 image, rebooted it – all good. I made some slight tweaks to the config, updated the answer file and applied the profile. On the subsequent reboot the host came back up with part of its new config and then I watched vCenter apply the remaining settings. What happened next was that I noticed the task was attempting to reconnect with the host. Flicking over to the IMM showed that the host had now been re-configured with the vMotion kernel port IP address and soon enough vCenter marked the host as disconnected.
Some head scratching ensued and many hours went by as I tried to get to the source of the problem. From furious googling to validating the host profiles against other clusters and a number of pointless reboots. I found myself at a bit of a “chicken or the egg” scenario as the host profile prevented me from modifying the management network IP and I couldn’t manage the host from vCenter :(. I soon discovered that I had a brief window in where the host was connected to vCenter and where I could remove the host profile from it. After two reboots I was back to a host without the host profile attached, where I could again manage the host.
I decided it was time to configure the host manually and create a new host profile. It all went swimmingly and my host was now complaint, woo hoo! Happy days… until I went to re-build the next host in the cluster. As I went to apply the profile, the summary page stated something along the lines of “Remove vmk0 from vSwitch 0”. Damn, same issue!
The same headaches eventuated and I wasted another couple of hours. After some deliberating with colleagues I figured I would create the vmkernel ports manually and then try applying the profile. This time around, no messages about removing vmk0, I felt a slight hint of confidence and rebooted the host. This time around the host came back complaint and configured as per the host profile, finally!
So in conclusion, create all your VM Kernel ports prior to applying the host profile.
This may be common knowledge to some, but it wasn’t to me and I didn’t see it called out anywhere through my searches, although I may have missed it.
Thanks for reading!
17/08/18 – Update: If the above does not work for you, try creating the VMkernel ports manually, updating the answer file and reboot the host (don’t apply or remediate the host manually). Applies for vSphere 6.x.