I had a rather frustrating experience the other day with a customers vCenter Upgrade. As many of you will know, there is a bit of prep work that goes into these upgrades, so having them not go to plan can be a little disheartening at times, especially when it’s something simple that you missed (Spoiler: it was not DNS).
To set the scene, I tend to do most of my work from an “admin workstation” which has all the tools I need installed. Here, I stepped through the wizard, verified and accepted the SHA1 thumbprints when presented and went onto stage 2 of the upgrade. Shortly after the pre-upgrade checks failed with an “internal error”. Upon checking the upgrade logs, I was presented with the following:
File "/usr/lib/vmware/cis_upgrade_runner/libs/pyVmomi.zip/pyVmomi/SoapAdapter.py", line 981, in _VerifyThumbprint
raise ThumbprintMismatchException(thumbprint, sha1Digest)
pyVmomi.SoapAdapter.ThumbprintMismatchException: Server has wrong SHA1 thumbprint:65abc597698285900c37f009a5c11ab45c03e123 (required) != ba4b9b745034c61785fdc33ee123d87397ea999c (server)
2019-07-12T23:16:36.999Z INFO root Exiting with exit-code 1
As most would, I did a sanity check in my browser to validate the certificate thumbprint and everything was matching up and I could not find the thumbprint that the installer was referencing. I did notice however that the proxy server had injected itself into the certificate chain, which reminded me of a new agent this customer had recently deployed to their fleet (this is getting more common, as it allows the company to inspect outbound SSL traffic). I checked the thumbprints again from a server that did not not have this agent and sure enough the thumbprint was what the installer was referencing (The correct thumbprint).
Logically, I closed the wizard and re-started stage 2 of the upgrade from the server that was getting the correct thumbprint. My frustration grew as I was presented with the same error message shortly after. There was a good hour spent on re-checking things and inspecting logs.
What I discovered was (may be obvious to some) that it appears that the upgrade wizard writes the thumbprints into the pre-check configuration from Stage 1 when the new appliance is first deployed. In stage two it then validates against this from the newly deployed appliance.
The fix is pretty obvious by now, restarting the upgrade from the server without the proxy agent saw the process go through smoothly. Pretty trivial, but something to be aware of.
TL;DR – Make sure there is nothing (enterprise proxy) injecting itself into your browsers certificate chain on the workstation you’re using the upgrade wizard on as it will throw out the SHA1 thumbprint.